Game Backend — Designed in Stages
You don’t need to design for scale on day one.
Define what you need—create or join a game session, submit moves, get state, and optionally matchmaking—then build the simplest thing that works and evolve as latency and concurrency grow.
Here we use a game backend (turn-based or real-time: Chess, Poker, Snake, Boggle-style) as the running example: game sessions, players, game state, and moves/actions. The same staged thinking applies to any multiplayer game: low latency, consistency of game state, server authority (anti-cheat), and fairness are central.
Requirements and Constraints (no architecture yet)
Section titled “Requirements and Constraints (no architecture yet)”Functional Requirements
- Create / join session — player creates a new game (e.g. mode, rules) or joins an existing one (by code or matchmaking); session has unique id and holds players and state.
- Submit move — player sends an action (e.g. play card, move piece, type word); server validates, applies to state, and notifies other players.
- Get state — client fetches current game state (board, scores, whose turn); for turn-based, poll or push after each move; for real-time, stream state updates.
- Matchmaking (optional) — pair or group players into a session; queue by skill or mode; create session when enough players; notify when match found.
Quality Requirements
- Low latency — move submission to state update to opponent visibility should be fast; real-time games (e.g. 60 tick/s) need sub-frame latency where possible; turn-based can tolerate hundreds of ms.
- Consistency of game state — all clients see the same authoritative state; server is source of truth; no divergent or corrupted state from out-of-order or invalid moves.
- Anti-cheat (server authority) — server validates every move; client sends intent, server applies only valid actions; no trust of client state for scoring or win/loss.
- Fairness — no player gains advantage from latency or ordering; deterministic or well-defined ordering of concurrent moves; optional anti-cheat and replay for disputes.
- Expected scale — concurrent sessions, players per session, moves per second, turn-based vs real-time.
Key Entities
- Game session — a single game instance; session_id, game_type, status (waiting, in_progress, finished); list of player ids; created_at.
- Player — identity in the system; player_id, optional profile or rating; linked to user/auth.
- Game state — authoritative state of the game (board, hands, scores, current_turn, etc.); updated only by server after validating moves; serializable for persistence and sync.
- Move / Action — a single player action (e.g. “play card 7”, “e4”); has player_id, payload, sequence or timestamp; validated then applied to state.
Primary Use Cases and Access Patterns
- Create session — write path; create session row, add creator as player; return session_id; optional invite link or code.
- Join session — write path; add player to session; check capacity and status; notify existing players (poll or push).
- Submit move — write path; validate move against current state and rules; update state in DB or in-memory; persist; notify other players (push or they poll).
- Get state — read path; return current state for session; used on load and after each move (or streamed in real-time).
- Matchmaking — read + write; add player to queue; when N players in queue (or criteria met), create session and notify; remove from queue.
Given this, start with the simplest MVP: one API, one DB, create game (e.g. turn-based), store state in DB, submit move → validate → update state → notify opponent (poll or push)—then add a real-time layer (WebSocket), game server or dedicated process per session, state in memory with persistence, and matchmaking as product demands.
Stage 1 — MVP (simple, correct, not over-engineered)
Section titled “Stage 1 — MVP (simple, correct, not over-engineered)”Goal
Ship working multiplayer: players create or join a game, submit moves, and see updated state. One API, one DB; state in DB; move validated on server and applied; opponent notified via poll or simple push; single server.
Components
- API — REST or similar; auth (player); create game (game type, optional options) → session_id; join game (session_id or code); get game state (session_id); submit move (session_id, move payload). Single server so ordering is straightforward.
- DB — sessions (id, game_type, status, created_at); session_players (session_id, player_id, role or seat); game_state (session_id, state blob or structured, updated_at); optional moves table (session_id, sequence, player_id, move, applied_at) for audit or replay. Index by session_id; state can be one row per session updated in place.
- Submit move flow — client POSTs move; server loads current state, validates move (rules engine or game-specific logic), applies move to state, writes state (and optionally move row) in transaction; return new state or ack. Use optimistic lock (version) or row lock to prevent concurrent move conflicts.
- Notify opponent — after applying move: (a) client polls get state periodically, or (b) server sends push (e.g. webhook, Firebase, or simple long-poll). No WebSocket at MVP if turn-based and poll is acceptable.
- Matchmaking — optional; table or queue of “waiting” players; cron or on-join: if enough in queue, create session and assign; notify via poll or email. Or skip and use invite-by-link only.
Minimal Diagram
Player A Player B | | v v+-----------------------+| API |+-----------------------+ | | v vDB (sessions, game_state, moves) - validate move - update state - notify (poll or simple push)Patterns and Concerns (don’t overbuild)
- Server authority: never trust client state; every move is validated server-side; state is written only after valid apply.
- Ordering: assign sequence number or timestamp to each move; apply in order; turn-based naturally serializes; for concurrent input, define rule (e.g. first-write-wins, or server orders).
- Basic monitoring: move latency, validation failures, state consistency errors, session creation rate.
Why This Is a Correct MVP
- One API, one DB, state in DB, validate-and-apply moves, notify via poll or simple push → enough to ship turn-based multiplayer (Chess, cards, word games); easy to reason about.
- Vertical scaling and single server buy you time before you need WebSocket, in-memory state, and dedicated game servers.
Stage 2 — Growth Phase (real-time layer, in-memory state, matchmaking)
Section titled “Stage 2 — Growth Phase (real-time layer, in-memory state, matchmaking)”What Triggers the Growth Phase?
- Need lower latency; polling is too slow or product expects real-time (e.g. Snake, shooters); add WebSocket or long-lived connection for state sync.
- Many concurrent sessions; state in DB for every move doesn’t scale (write load, latency); keep hot state in memory, persist periodically or on key events.
- Matchmaking is first-class; players expect to find a game without invite link; queue and match logic; notify when matched.
Components to Add (incrementally)
- Real-time layer (WebSocket) — clients maintain connection; server pushes state updates (or move events) to all players in session; reduces latency and poll load; scale connections (e.g. connection manager, sticky session or pub/sub per session).
- Game server or dedicated process per session — optional: one process (or goroutine) per session holds state in memory; receives moves via API or message; validates, applies, pushes to clients; persists to DB on checkpoint or end of game. Or: API servers hold in-memory cache of state and push via WebSocket; DB still source of truth for persistence.
- State in memory with persistence — hot state in memory (per session); apply moves in memory for speed; persist to DB on move (async) or on timer/snapshot; on load, load from DB into memory; recovery from DB if process dies.
- Matchmaking (queue or pool) — queue by game type and optional skill/rating; when N players waiting, create session and notify (push or WebSocket); remove from queue; optional ranking and match quality.
Growth Diagram
Player A Player B | | v vWebSocket / connection manager | | v v+-----------------------+| API / game logic |+-----------------------+ | | v vIn-memory state (per session) Matchmaking queue | | v vPersist to DB (on move or snapshot) | vPush state to clientsPatterns and Concerns to Introduce (practical scaling)
- Connection to session: map WebSocket connection to player and session; on move, push to other connections in same session; handle disconnect (pause, forfeit, or reconnect with state).
- Persistence strategy: every move vs batched vs snapshot-only; trade durability for write load; at least persist on game end and optionally every N moves.
- Monitoring: WebSocket connection count, move-to-push latency, matchmaking wait time, state persistence lag.
Still Avoid (common over-engineering here)
- Dedicated game servers per session and 60-tick real-time until product requires it; turn-based or low-tick can stay on API + in-memory.
- Replay and cheat detection pipeline until trust or dispute issues justify it.
- Multi-region game servers until you have geographic latency requirements.
Stage 3 — Advanced Scale (dedicated game servers, replay, cheat detection)
Section titled “Stage 3 — Advanced Scale (dedicated game servers, replay, cheat detection)”What Triggers Advanced Scale?
- Real-time games (e.g. 60 tick/s); need dedicated game server per session or shard; fixed tick rate, input sampling, and low jitter.
- Replay and dispute: store move history; allow replay for viewing or cheat analysis; deterministic replay from moves.
- Cheat detection: detect impossible moves, speed hacks, or collusion; server-side validation plus optional post-game analysis; ban or flag accounts.
- Scale: many concurrent sessions across regions; game servers scale horizontally; regional routing for latency.
Components (common advanced additions)
- Dedicated game servers — one process (or container) per session or per shard of sessions; runs game loop at fixed tick (e.g. 60/s); receives inputs from API or message bus; updates state, sends state or delta to clients; persists at interval or on end; scale by spinning up servers on demand.
- Scale per region — deploy game servers per region; match players in same region when possible; low latency for real-time; session affinity.
- Replay — store every move (session_id, sequence, player_id, move, timestamp); replay = re-apply moves in order to reconstruct state; support watch-replay and deterministic verification; optional replay service for on-demand viewing.
- Cheat detection — server already validates moves; add: impossible move detection, rate limits, anomaly (e.g. reaction time, accuracy); post-game analysis job; flag or ban; optional client integrity checks (e.g. anti-tamper).
Advanced Diagram (conceptual)
Clients (per region) | vConnection / API layer (auth, matchmaking) | | v vMatchmaking Game server pool (per session or shard) | - tick loop, state in memory v - persist moves + snapshotsSession created - push to clients | | v vMove history (replay) Cheat detection (post-game or real-time)Patterns and Concerns at This Stage
- Real-time vs turn-based: real-time = fixed tick, input buffer, interpolation; turn-based = event-driven, no tick; design for one or the other per game type.
- Determinism: for replay and fairness, same moves must produce same state; avoid non-determinism (e.g. random with fixed seed per game).
- SLO-driven ops: move latency (submit to visible), tick jitter, matchmaking time, game server availability; error budgets and on-call.
Summarizing the Evolution
Section titled “Summarizing the Evolution”MVP delivers a game backend with one API, one DB, state in DB, validate-and-apply moves, and notify opponents via poll or simple push. Server is authoritative; no client trust. That’s enough for turn-based multiplayer.
As you grow, you add a real-time layer (WebSocket) for state sync, in-memory state with persistence for lower latency, and matchmaking (queue or pool). You keep server authority and state consistency.
At advanced scale, you add dedicated game servers for real-time (e.g. 60 tick), scale per region, replay from move history, and cheat detection. You scale sessions and latency without over-building on day one.
This approach gives you:
- Start Simple — API + DB, state in DB, validate move and notify; ship and learn.
- Scale Intentionally — add WebSocket and in-memory state when latency demands it; add matchmaking when product expects it.
- Add Complexity Only When Required — avoid dedicated game servers and replay until real-time and trust justify them; keep server authority and consistency first.