Game Backend — Designed in Stages

First PublishedFeb 7, 2026ByAtif Alam

You don’t need to design for scale on day one.

Define what you need—create or join a game session, submit moves, get state, and optionally matchmaking—then build the simplest thing that works and evolve as latency and concurrency grow.

Here we use a game backend (turn-based or real-time: Chess, Poker, Snake, Boggle-style) as the running example: game sessions, players, game state, and moves/actions. The same staged thinking applies to any multiplayer game: low latency, consistency of game state, server authority (anti-cheat), and fairness are central.

Requirements and Constraints (no architecture yet)

Functional Requirements

Create / join session — player creates a new game (e.g. mode, rules) or joins an existing one (by code or matchmaking); session has unique id and holds players and state.
Submit move — player sends an action (e.g. play card, move piece, type word); server validates, applies to state, and notifies other players.
Get state — client fetches current game state (board, scores, whose turn); for turn-based, poll or push after each move; for real-time, stream state updates.
Matchmaking (optional) — pair or group players into a session; queue by skill or mode; create session when enough players; notify when match found.

Quality Requirements

Low latency — move submission to state update to opponent visibility should be fast; real-time games (e.g. 60 tick/s) need sub-frame latency where possible; turn-based can tolerate hundreds of ms.
Consistency of game state — all clients see the same authoritative state; server is source of truth; no divergent or corrupted state from out-of-order or invalid moves.
Anti-cheat (server authority) — server validates every move; client sends intent, server applies only valid actions; no trust of client state for scoring or win/loss.
Fairness — no player gains advantage from latency or ordering; deterministic or well-defined ordering of concurrent moves; optional anti-cheat and replay for disputes.
Expected scale — concurrent sessions, players per session, moves per second, turn-based vs real-time.

Key Entities

Game session — a single game instance; session_id, game_type, status (waiting, in_progress, finished); list of player ids; created_at.
Player — identity in the system; player_id, optional profile or rating; linked to user/auth.
Game state — authoritative state of the game (board, hands, scores, current_turn, etc.); updated only by server after validating moves; serializable for persistence and sync.
Move / Action — a single player action (e.g. “play card 7”, “e4”); has player_id, payload, sequence or timestamp; validated then applied to state.

Primary Use Cases and Access Patterns

Create session — write path; create session row, add creator as player; return session_id; optional invite link or code.
Join session — write path; add player to session; check capacity and status; notify existing players (poll or push).
Submit move — write path; validate move against current state and rules; update state in DB or in-memory; persist; notify other players (push or they poll).
Get state — read path; return current state for session; used on load and after each move (or streamed in real-time).
Matchmaking — read + write; add player to queue; when N players in queue (or criteria met), create session and notify; remove from queue.

Given this, start with the simplest MVP: one API, one DB, create game (e.g. turn-based), store state in DB, submit move → validate → update state → notify opponent (poll or push)—then add a real-time layer (WebSocket), game server or dedicated process per session, state in memory with persistence, and matchmaking as product demands.

Stage 1 — MVP (simple, correct, not over-engineered)

Goal

Ship working multiplayer: players create or join a game, submit moves, and see updated state. One API, one DB; state in DB; move validated on server and applied; opponent notified via poll or simple push; single server.

Components

API — REST or similar; auth (player); create game (game type, optional options) → session_id; join game (session_id or code); get game state (session_id); submit move (session_id, move payload). Single server so ordering is straightforward.
DB — sessions (id, game_type, status, created_at); session_players (session_id, player_id, role or seat); game_state (session_id, state blob or structured, updated_at); optional moves table (session_id, sequence, player_id, move, applied_at) for audit or replay. Index by session_id; state can be one row per session updated in place.
Submit move flow — client POSTs move; server loads current state, validates move (rules engine or game-specific logic), applies move to state, writes state (and optionally move row) in transaction; return new state or ack. Use optimistic lock (version) or row lock to prevent concurrent move conflicts.
Notify opponent — after applying move: (a) client polls get state periodically, or (b) server sends push (e.g. webhook, Firebase, or simple long-poll). No WebSocket at MVP if turn-based and poll is acceptable.
Matchmaking — optional; table or queue of “waiting” players; cron or on-join: if enough in queue, create session and assign; notify via poll or email. Or skip and use invite-by-link only.

Minimal Diagram

Player A          Player B
   |                 |
   v                 v
+-----------------------+
| API                   |
+-----------------------+
   |                 |
   v                 v
DB (sessions, game_state, moves)
   - validate move
   - update state
   - notify (poll or simple push)

Patterns and Concerns (don’t overbuild)

Server authority: never trust client state; every move is validated server-side; state is written only after valid apply.
Ordering: assign sequence number or timestamp to each move; apply in order; turn-based naturally serializes; for concurrent input, define rule (e.g. first-write-wins, or server orders).
Basic monitoring: move latency, validation failures, state consistency errors, session creation rate.

Why This Is a Correct MVP

One API, one DB, state in DB, validate-and-apply moves, notify via poll or simple push → enough to ship turn-based multiplayer (Chess, cards, word games); easy to reason about.
Vertical scaling and single server buy you time before you need WebSocket, in-memory state, and dedicated game servers.

Stage 2 — Growth Phase (real-time layer, in-memory state, matchmaking)

What Triggers the Growth Phase?

Need lower latency; polling is too slow or product expects real-time (e.g. Snake, shooters); add WebSocket or long-lived connection for state sync.
Many concurrent sessions; state in DB for every move doesn’t scale (write load, latency); keep hot state in memory, persist periodically or on key events.
Matchmaking is first-class; players expect to find a game without invite link; queue and match logic; notify when matched.

Components to Add (incrementally)

Real-time layer (WebSocket) — clients maintain connection; server pushes state updates (or move events) to all players in session; reduces latency and poll load; scale connections (e.g. connection manager, sticky session or pub/sub per session).
Game server or dedicated process per session — optional: one process (or goroutine) per session holds state in memory; receives moves via API or message; validates, applies, pushes to clients; persists to DB on checkpoint or end of game. Or: API servers hold in-memory cache of state and push via WebSocket; DB still source of truth for persistence.
State in memory with persistence — hot state in memory (per session); apply moves in memory for speed; persist to DB on move (async) or on timer/snapshot; on load, load from DB into memory; recovery from DB if process dies.
Matchmaking (queue or pool) — queue by game type and optional skill/rating; when N players waiting, create session and notify (push or WebSocket); remove from queue; optional ranking and match quality.

Growth Diagram

Player A          Player B
   |                 |
   v                 v
WebSocket / connection manager
   |                 |
   v                 v
+-----------------------+
| API / game logic      |
+-----------------------+
   |                 |
   v                 v
In-memory state (per session)   Matchmaking queue
   |                 |
   v                 v
Persist to DB (on move or snapshot)
   |
   v
Push state to clients

Patterns and Concerns to Introduce (practical scaling)

Connection to session: map WebSocket connection to player and session; on move, push to other connections in same session; handle disconnect (pause, forfeit, or reconnect with state).
Persistence strategy: every move vs batched vs snapshot-only; trade durability for write load; at least persist on game end and optionally every N moves.
Monitoring: WebSocket connection count, move-to-push latency, matchmaking wait time, state persistence lag.

Still Avoid (common over-engineering here)

Dedicated game servers per session and 60-tick real-time until product requires it; turn-based or low-tick can stay on API + in-memory.
Replay and cheat detection pipeline until trust or dispute issues justify it.
Multi-region game servers until you have geographic latency requirements.

Stage 3 — Advanced Scale (dedicated game servers, replay, cheat detection)

What Triggers Advanced Scale?

Real-time games (e.g. 60 tick/s); need dedicated game server per session or shard; fixed tick rate, input sampling, and low jitter.
Replay and dispute: store move history; allow replay for viewing or cheat analysis; deterministic replay from moves.
Cheat detection: detect impossible moves, speed hacks, or collusion; server-side validation plus optional post-game analysis; ban or flag accounts.
Scale: many concurrent sessions across regions; game servers scale horizontally; regional routing for latency.

Components (common advanced additions)

Dedicated game servers — one process (or container) per session or per shard of sessions; runs game loop at fixed tick (e.g. 60/s); receives inputs from API or message bus; updates state, sends state or delta to clients; persists at interval or on end; scale by spinning up servers on demand.
Scale per region — deploy game servers per region; match players in same region when possible; low latency for real-time; session affinity.
Replay — store every move (session_id, sequence, player_id, move, timestamp); replay = re-apply moves in order to reconstruct state; support watch-replay and deterministic verification; optional replay service for on-demand viewing.
Cheat detection — server already validates moves; add: impossible move detection, rate limits, anomaly (e.g. reaction time, accuracy); post-game analysis job; flag or ban; optional client integrity checks (e.g. anti-tamper).

Advanced Diagram (conceptual)

Clients (per region)
   |
   v
Connection / API layer (auth, matchmaking)
   |                 |
   v                 v
Matchmaking         Game server pool (per session or shard)
   |                 - tick loop, state in memory
   v                 - persist moves + snapshots
Session created     - push to clients
   |                 |
   v                 v
Move history (replay)   Cheat detection (post-game or real-time)

Patterns and Concerns at This Stage

Real-time vs turn-based: real-time = fixed tick, input buffer, interpolation; turn-based = event-driven, no tick; design for one or the other per game type.
Determinism: for replay and fairness, same moves must produce same state; avoid non-determinism (e.g. random with fixed seed per game).
SLO-driven ops: move latency (submit to visible), tick jitter, matchmaking time, game server availability; error budgets and on-call.

Summarizing the Evolution

MVP delivers a game backend with one API, one DB, state in DB, validate-and-apply moves, and notify opponents via poll or simple push. Server is authoritative; no client trust. That’s enough for turn-based multiplayer.

As you grow, you add a real-time layer (WebSocket) for state sync, in-memory state with persistence for lower latency, and matchmaking (queue or pool). You keep server authority and state consistency.

At advanced scale, you add dedicated game servers for real-time (e.g. 60 tick), scale per region, replay from move history, and cheat detection. You scale sessions and latency without over-building on day one.

This approach gives you:

Start Simple — API + DB, state in DB, validate move and notify; ship and learn.
Scale Intentionally — add WebSocket and in-memory state when latency demands it; add matchmaking when product expects it.
Add Complexity Only When Required — avoid dedicated game servers and replay until real-time and trust justify them; keep server authority and consistency first.