Project Overview

Smart-Itinerary-Generator is a modular platform that builds personalized itineraries across Andalusian towns by combining hybrid retrieval (dense + sparse + reranking), geospatial filtering, and route optimization with Valhalla.

The system integrates scraping pipelines, semantic services, a FastAPI backend, and a React frontend to deliver explainable recommendations and practical route planning in one workflow.

Frontend Results Overview
Frontend Results Overview

Technologies and tools

  • React, Leaflet, TailwindCSS, Vite.
  • FastAPI, SQLModel/SQLAlchemy, PostgreSQL.
  • Valhalla for routing and isochrones.
  • SentenceTransformers + CrossEncoder for semantic retrieval and reranking.
  • scikit-learn TF-IDF + cosine for sparse lexical matching.
  • Prefect, Selenium, BeautifulSoup, httpx for ETL orchestration and ingestion.
  • Docker Compose and MinIO for deployment and artifact storage.

State of the Art: Semantic Understanding

The retrieval stack combines dense and sparse ranking. Dense similarity captures intent and context through embeddings, while sparse TF-IDF improves exact lexical hits (proper nouns, heritage-specific terms).

Both rankings are fused with Reciprocal Rank Fusion (RRF) and refined with a CrossEncoder reranker. This hybrid strategy reduces common failure modes of using only semantic vectors or only keywords.

State of the art summary by subsystem

  • Route optimization compares open and commercial engines, selecting Valhalla for native isochrones and time-aware routing.
  • Interactive mapping is built with React + Leaflet for practical UX and fast iteration.
  • Data acquisition mixes Selenium for dynamic pages and BeautifulSoup/httpx for static and async sources.
  • Backend patterns combine async FastAPI, typed data models, hybrid retrieval, and containerized deployment.
Contextual Embeddings
Contextual Embeddings

System Design

The architecture is organized around four services: scraper, semantic-embeddings, backend, and frontend, plus PostgreSQL, MinIO, and Valhalla.

The scraper ingests and normalizes data, semantic-embeddings centralizes inference endpoints (/embed, /rerank, /search-text/*), backend applies hybrid retrieval and route logic, and frontend renders maps, recommendations, and PDF export.

Global System Architecture
Global System Architecture

Backend Entities

The central towns model stores municipality metadata (INE, geography, province, descriptive fields, and embeddings) and relates to media and heritage assets.

This structure supports explainable recommendations by combining geospatial context with textual and semantic signals.

Entity Relationship Diagram
Entity Relationship Diagram

Scraper and ETL with Prefect

The ETL pipeline pulls data from tourism and cultural sources, normalizes records into staging models, requests canonical search texts and embeddings from the semantic service, and persists idempotent upserts in PostgreSQL.

I/O-heavy tasks run concurrently, while inference and persistence run in controlled batches for stability and reproducibility.

ETL Pipeline Overview
ETL Pipeline Overview

Pipeline mechanics and reliability strategy

  • Prefect flow config uses retries and delayed retry windows for resilience.
  • Selenium crawls dynamic content with explicit waits, cookie handling, and crawl-delay respect.
  • Batch crawling with thread pools improves throughput and mitigates long-session browser instability.
  • Wikipedia and IAPH acquisition uses lightweight parsing plus bounded async concurrency.
  • Persistence uses ON CONFLICT upserts for safe re-runs without duplicates.
  • Each run exports metadata reports to MinIO with timestamped keys.

Back-end (FastAPI)

The API exposes a health endpoint and an itinerary generation endpoint. The itinerary flow combines preference filters, optional isochrone constraints, hybrid ranking (dense + sparse + RRF), optional reranking, and route ordering via Valhalla.

This keeps recommendation quality and travel feasibility in the same decision pipeline.

Backend component responsibilities

ComponentResponsibilityKey Detail
Models and schemasDefine typed persistence and response contracts.SQLModel entities and output DTOs.
Health endpointExpose service readiness state.Checks API, database, and Valhalla availability.
Itinerary controllerOrchestrate recommendation and route generation.Filters, hybrid retrieval, rerank, Top-K selection, route optimization.
Helper layerProvide reusable ranking and geospatial logic.RRF fusion, TF-IDF sparse scoring, Valhalla polygon and route helpers.
Sequence Diagram: Itinerary Generation
Sequence Diagram: Itinerary Generation
Sequence Diagram: Scraping Pipeline
Sequence Diagram: Scraping Pipeline