Project Overview
Smart-Itinerary-Generator is a modular platform that builds personalized itineraries across Andalusian towns by combining hybrid retrieval (dense + sparse + reranking), geospatial filtering, and route optimization with Valhalla.
The system integrates scraping pipelines, semantic services, a FastAPI backend, and a React frontend to deliver explainable recommendations and practical route planning in one workflow.

Technologies and tools
- React, Leaflet, TailwindCSS, Vite.
- FastAPI, SQLModel/SQLAlchemy, PostgreSQL.
- Valhalla for routing and isochrones.
- SentenceTransformers + CrossEncoder for semantic retrieval and reranking.
- scikit-learn TF-IDF + cosine for sparse lexical matching.
- Prefect, Selenium, BeautifulSoup, httpx for ETL orchestration and ingestion.
- Docker Compose and MinIO for deployment and artifact storage.
State of the Art: Semantic Understanding
The retrieval stack combines dense and sparse ranking. Dense similarity captures intent and context through embeddings, while sparse TF-IDF improves exact lexical hits (proper nouns, heritage-specific terms).
Both rankings are fused with Reciprocal Rank Fusion (RRF) and refined with a CrossEncoder reranker. This hybrid strategy reduces common failure modes of using only semantic vectors or only keywords.
State of the art summary by subsystem
- Route optimization compares open and commercial engines, selecting Valhalla for native isochrones and time-aware routing.
- Interactive mapping is built with React + Leaflet for practical UX and fast iteration.
- Data acquisition mixes Selenium for dynamic pages and BeautifulSoup/httpx for static and async sources.
- Backend patterns combine async FastAPI, typed data models, hybrid retrieval, and containerized deployment.

System Design
The architecture is organized around four services: scraper, semantic-embeddings, backend, and frontend, plus PostgreSQL, MinIO, and Valhalla.
The scraper ingests and normalizes data, semantic-embeddings centralizes inference endpoints (/embed, /rerank, /search-text/*), backend applies hybrid retrieval and route logic, and frontend renders maps, recommendations, and PDF export.

Backend Entities
The central towns model stores municipality metadata (INE, geography, province, descriptive fields, and embeddings) and relates to media and heritage assets.
This structure supports explainable recommendations by combining geospatial context with textual and semantic signals.

Scraper and ETL with Prefect
The ETL pipeline pulls data from tourism and cultural sources, normalizes records into staging models, requests canonical search texts and embeddings from the semantic service, and persists idempotent upserts in PostgreSQL.
I/O-heavy tasks run concurrently, while inference and persistence run in controlled batches for stability and reproducibility.

Pipeline mechanics and reliability strategy
- Prefect flow config uses retries and delayed retry windows for resilience.
- Selenium crawls dynamic content with explicit waits, cookie handling, and crawl-delay respect.
- Batch crawling with thread pools improves throughput and mitigates long-session browser instability.
- Wikipedia and IAPH acquisition uses lightweight parsing plus bounded async concurrency.
- Persistence uses ON CONFLICT upserts for safe re-runs without duplicates.
- Each run exports metadata reports to MinIO with timestamped keys.
Back-end (FastAPI)
The API exposes a health endpoint and an itinerary generation endpoint. The itinerary flow combines preference filters, optional isochrone constraints, hybrid ranking (dense + sparse + RRF), optional reranking, and route ordering via Valhalla.
This keeps recommendation quality and travel feasibility in the same decision pipeline.
Backend component responsibilities
| Component | Responsibility | Key Detail |
|---|---|---|
| Models and schemas | Define typed persistence and response contracts. | SQLModel entities and output DTOs. |
| Health endpoint | Expose service readiness state. | Checks API, database, and Valhalla availability. |
| Itinerary controller | Orchestrate recommendation and route generation. | Filters, hybrid retrieval, rerank, Top-K selection, route optimization. |
| Helper layer | Provide reusable ranking and geospatial logic. | RRF fusion, TF-IDF sparse scoring, Valhalla polygon and route helpers. |
Frontend experience highlights
- Form sections capture town preferences, cultural interests, and desired experiences.
- Leaflet-based location picker sets geographic context for isochrone filtering.
- Results view presents ranked towns plus route overview for actionable planning.
- Town profile surfaces cultural and historical context to support explainability.
- PDF export generates portable itinerary documents for offline use.




