Geospatial / Maps — Designed in Stages
You don’t need to design for scale on day one.
Define what you need—map tiles, geocode and reverse geocode, search nearby (radius or bounds), point-to-point routing, and optionally heat maps or aggregation by region—then build the simplest thing that works and evolve as coverage, POI count, and traffic grow.
Here we use a maps or geospatial platform (Google Maps–style, or heat maps for drivers) as the running example: locations/points, regions/tiles, routes, and points of interest (POIs). The same staged thinking applies to any system that serves map imagery, location search, or routing: low latency for tiles and search, spatial indexing, and scale (global tiles, many POIs) are central.
Requirements and Constraints (no architecture yet)
Section titled “Requirements and Constraints (no architecture yet)”Functional Requirements
- Map tile fetch — client requests map imagery by zoom level and tile coordinates (e.g. z/x/y); return raster or vector tile; support multiple layers (base map, labels, traffic) if needed.
- Geocode — convert address or place name to coordinates (lat, lon); return one or more candidates with confidence.
- Reverse geocode — convert coordinates to address or place name; return human-readable location.
- Search nearby — find points (e.g. POIs) within a radius or within a bounding box; filter by type or keyword; sort by distance or relevance.
- Route (point-to-point) — compute path and duration between origin and destination; driving, walking, or transit; return polyline and ETA.
- Heat map or aggregation by region — optional; aggregate data (e.g. events, density) by geographic region or grid cell; serve as overlay or visualization.
Quality Requirements
- Low latency for tiles and search — tile and search responses should be fast (e.g. p95 < 200–500 ms); tiles are read-heavy and cacheable; search depends on index and query shape.
- Accuracy — geocode and reverse geocode should match user intent; routing should be plausible and up to date with road network.
- Scale — global or large-area tile coverage; many POIs (millions); high tile and search QPS; regional replication for latency.
- Expected scale — tile QPS, POI count, search QPS, geographic coverage (city vs country vs global).
Key Entities
- Location / Point — a position (lat, lon); optional altitude; used for POIs, search results, route waypoints.
- Region / Tile — a geographic partition; tile = (z, x, y) in standard web map tiling (e.g. XYZ); region = admin boundary or custom grid cell for aggregation.
- Route — path between two or more points; polyline (list of points), duration, distance; optional segments (turn-by-turn).
- POI (point of interest) — named place with location and metadata (name, type, address, id); searchable and displayable on map.
Primary Use Cases and Access Patterns
- Tile fetch — read path; key = (z, x, y) or (layer, z, x, y); highly cacheable; same tile requested many times.
- Geocode — read path; input = text; output = list of (lat, lon, label); may use search index or external API.
- Reverse geocode — read path; input = (lat, lon); output = address or place; may use spatial index or external API.
- Search nearby — read path; input = (lat, lon), radius or bounds, optional filters; query spatial index; return ranked POIs.
- Route — read path; input = origin, destination, mode; call routing engine; return route; cache by (origin, dest, mode) for hot pairs.
- Heat map / aggregation — read path; aggregate events or counts by region or grid; precompute or query on demand.
Given this, start with the simplest MVP: one API, one DB for points (lat/lon + metadata), a tile server or static tiles, simple nearby search (DB with spatial index or bounding box), and geocode via external API or DB—then add a dedicated spatial index, tile CDN, route service, and caching as load and coverage grow.
Stage 1 — MVP (simple, correct, not over-engineered)
Section titled “Stage 1 — MVP (simple, correct, not over-engineered)”Goal
Ship working maps: clients can load map tiles, search for places nearby (radius or bounds), and resolve addresses to/from coordinates. One API, one DB for points; tile server or static tiles; geocode via external API or simple DB lookup; single region or limited coverage.
Components
- API — REST or similar; get tile (z, x, y) or tile URL; search nearby (lat, lon, radius or bbox, optional type); geocode (address → lat, lon); reverse geocode (lat, lon → address). Auth optional (e.g. API key for third-party geocode).
- DB — store points/POIs (id, lat, lon, name, type, metadata); index by location. Use DB spatial index (e.g. PostGIS, or bounding box + application filter) for “points in bounds” or “points within radius”; simple query enough for moderate POI count.
- Tile server or static tiles — serve pre-rendered tiles (e.g. PNG) from disk or object storage keyed by (z, x, y); or minimal tile server that generates on demand from base data; single layer at MVP.
- Geocode / reverse geocode — use third-party API (e.g. Google, Mapbox) or store (address, lat, lon) in DB and match by text search; reverse geocode = nearest known address or external API. Keep cost and latency acceptable (cache results if allowed by provider).
- Simple nearby search — query DB: points where (lat, lon) in bounding box or within radius (e.g. Haversine or DB spatial function); limit results; sort by distance. Index on (lat, lon) or use spatial index type.
Minimal Diagram
Client | v+-----------------+| API |+-----------------+ | | | v v vDB (POIs, Tile store Geocode (external APIspatial (z/x/y) or DB lookup)index) | vNearby search (bounds / radius)Patterns and Concerns (don’t overbuild)
- Tile coordinate system: use standard web mercator tile scheme (z, x, y) so clients and caches interoperate; document zoom and extent.
- Spatial query: even simple bounding-box filter with (lat, lon) index reduces full scan; add radius or proper spatial index when POI count grows.
- Basic monitoring: tile and search latency, geocode success rate, error rate.
Why This Is a Correct MVP
- One API, one DB for points, tile server or static tiles, nearby search via DB, geocode via API or DB → enough to ship a map with search and place resolution; easy to reason about.
- Vertical scaling and limited coverage buy you time before you need tile CDN, dedicated spatial index, and route service.
Stage 2 — Growth Phase (spatial index, tile CDN, routing, cache)
Section titled “Stage 2 — Growth Phase (spatial index, tile CDN, routing, cache)”What Triggers the Growth Phase?
- POI count or search QPS grows; DB spatial queries are slow or don’t scale; need dedicated spatial index (e.g. R-tree, geohash grid).
- Tile traffic grows; need CDN to serve tiles with low latency and reduce origin load.
- Product needs routing (point-to-point, ETA); integrate routing service (internal or third-party).
- Geocode and search results are repeated; caching (hot tiles, hot geocode, hot routes) reduces cost and latency.
Components to Add (incrementally)
- Dedicated spatial index — index POIs by location for fast “in bounds” and “within radius” queries; options: R-tree (e.g. in DB or library), geohash grid (store geohash prefix, query by prefix), or managed service (e.g. Elasticsearch geo, Redis Geo). Sync or replicate from primary DB; search API queries index instead of DB for hot path.
- Tile CDN — put tiles behind CDN; cache by (z, x, y) or (layer, z, x, y); long TTL for static base tiles; shorter for traffic or dynamic layers; reduce load on tile origin.
- Route service — integrate routing engine (e.g. OSRM, GraphHopper, or third-party); API: origin, destination, mode → polyline, duration, distance; cache frequent (origin, dest) pairs; rate limit or quota if using paid provider.
- Caching for hot tiles and geocode — cache tile responses at edge or in-memory; cache geocode and reverse geocode results by input (e.g. address or rounded lat/lon); TTL per use case (tiles long, geocode medium, routes short).
Growth Diagram
Client | vCDN (tiles) | v+-----------------+| API |+-----------------+ | | | v v vSpatial Tile origin Geocode (cached)index (on miss) Route service (cached)(POIs) | | | v v v DB / storage External or internalSearchnearbyPatterns and Concerns to Introduce (practical scaling)
- Tile layers: separate base map (static, long cache) from overlays (POIs, traffic); different update frequency and cache policy.
- Search ranking: combine distance with relevance (name match, type); optional ranking model or simple score (distance + type boost).
- Monitoring: tile cache hit ratio, search latency (p50, p95), geocode and route cache hit rate, routing provider latency and errors.
Still Avoid (common over-engineering here)
- Multi-layer vector tiles and real-time traffic until product clearly needs them.
- Global multi-region replication until traffic and latency justify it.
- Custom routing graph and full in-house routing until third-party or OSS (e.g. OSRM) is insufficient.
Stage 3 — Advanced Scale (multi-layer, real-time, global, optimization)
Section titled “Stage 3 — Advanced Scale (multi-layer, real-time, global, optimization)”What Triggers Advanced Scale?
- Multiple tile layers (base, labels, traffic, custom overlays); vector tiles or raster; different update cadences.
- Real-time or near-real-time data: traffic, incidents, heat map (e.g. driver density); ingest and serve with low latency.
- Global scale: tiles and search across many regions; regional replication and routing for latency.
- Route optimization: traffic-aware routing, multiple waypoints, or fleet optimization; heavier compute and data.
Components (common advanced additions)
- Multi-layer tiles — separate pipelines per layer (base, labels, traffic, POIs); vector tiles for smaller payload and client-side styling; or raster per layer; composite at edge or client.
- Real-time traffic / heat data — ingest traffic or event stream (e.g. from sensors, drivers); aggregate by segment or grid; update tile overlay or heat layer on schedule (e.g. every 5 min); or serve via separate real-time API and overlay on client.
- Global scale and regional replication — replicate tile and index data by region; route user to nearest edge or region; search and tiles served from local replica; eventual consistency for POI updates across regions.
- Route optimization — traffic-aware routing (live or historical); multi-waypoint (e.g. delivery route); or batch optimization (fleet); may require custom graph and engine or premium provider; cache and rate limit.
Advanced Diagram (conceptual)
Clients (global) | vCDN / edge (tiles, regional routing) | v+-----------------+| API (per region)|+-----------------+ | | | v v vSpatial Tile pipeline Route serviceindex (multi-layer, (traffic-aware,(regional vector/raster) optimization)replica) | | v v v Real-time Geocode (cached,Search traffic/heat multi-region) overlayPatterns and Concerns at This Stage
- Consistency across regions: POI and tile updates propagate to replicas; define freshness SLO; search may be eventually consistent with primary.
- Real-time overlay: balance update frequency with storage and compute cost; pre-aggregate by time window (e.g. 5-min buckets) for heat or traffic.
- SLO-driven ops: tile latency and availability, search latency, geocode and route success rate; error budgets and on-call.
Summarizing the Evolution
Section titled “Summarizing the Evolution”MVP delivers maps with one API, one DB for points, a tile server or static tiles, simple nearby search (DB with spatial index or bounding box), and geocode via external API or DB. That’s enough to show a map and search places.
As you grow, you add a dedicated spatial index for fast search, a tile CDN for scale and latency, a route service (internal or third-party), and caching for tiles, geocode, and routes. You keep tile and search latency low and accuracy acceptable.
At advanced scale, you add multi-layer tiles (including vector), real-time traffic or heat overlays, global scale with regional replication, and route optimization. You scale coverage and features without over-building on day one.
This approach gives you:
- Start Simple — API + DB, tiles, nearby search, geocode via API or DB; ship and learn.
- Scale Intentionally — add spatial index and tile CDN when POI count and tile traffic demand it; add routing when product expects it.
- Add Complexity Only When Required — avoid multi-layer and real-time overlays until product and data justify them; keep tiles and search fast and correct first.