Skip to content

Optimization Quick Reference

First PublishedLast UpdatedByAtif Alam

This page helps you choose which optimization to add when you hit a specific pain. Find your pain, get the answer.

How to use this page: Match your pain in the quick chooser below; click a fix to jump to that category’s details (Signals, Goal / Benefit, Tools, Risks).

Match your pain to the fix in the table below.

PainFix
Repeated reads are slowCaching
Can’t explain failures quicklyObservability
DB joins too slowReplication
Service goes down = everything failsAsynchronous Processing
Spiky writes / slow jobs on request pathAsynchronous Processing
Many consumers + replay neededStreaming / Event Log
Can’t trust downstream serviceRate Control
Queries too slow or need full-text / fuzzy searchSearch & Indexing
Single DB/shard too hot or write ceilingData Partitioning
Ops toil / manual recoveryCoordination & Orchestration
Noisy neighbors or mixed priority trafficTraffic Shaping
Analytics queries hurting prod DBStorage Specialization

Signals: DB/API latency high; read QPS heavy; repeated same queries; spike traffic
Goal / Benefit: Reduce latency + backend load
Tools: Redis/Memcached, CDN, local caches, TTL
Risks: Stale data, invalidation complexity, hot keys, cache stampede

Signals: Hard to debug; unknown bottlenecks; frequent incidents; blind deploys
Goal / Benefit: Reduce MTTR + safe change
Tools: Observability (metrics/logs/traces, dashboards, alerts, SLOs)
Risks: Noise/alert fatigue, cost, missing instrumentation

Signals: Need higher availability; read scaling; DR requirements
Goal / Benefit: Resilience + read scale
Tools: Read replicas, denormalization, multi-AZ DB, quorum systems
Risks: Replication lag, failover complexity, split-brain risk

Signals: Slow tasks in request path; timeouts; retries explode; dependencies flaky
Goal / Benefit: Decouple + move work off request path
Tools: Queues (SQS/RabbitMQ), decoupling, workers, retries, DLQ
Risks: Eventual consistency, duplication, ordering, failure handling

Signals: Many consumers; replay or backfill needed; audit trail; fan-out
Goal / Benefit: Fan-out, replay, durability
Tools: Stream / event log (Kafka, Pulsar, Kinesis, Redpanda)
Risks: Ordering guarantees, retention vs cost, operational complexity

When to use stream vs queue: Redis vs Kafka: when to use which.

Signals: Traffic spikes; abuse; downstream overload; SLOs degrade under load
Goal / Benefit: Protect system + fairness
Tools: Rate limits, quotas, circuit breakers, retries, bulkheads
Risks: False positives, client friction, tuning thresholds

Signals: DB queries too slow/complex; full-text needed; filtering+ranking
Goal / Benefit: Fast query experience
Tools: Search index (Elasticsearch/OpenSearch), secondary indexes
Risks: Dual writes, index lag, reindexing cost, relevance tuning

Signals: Single DB/table shard too hot; write throughput ceiling; large datasets
Goal / Benefit: Horizontal scaling (throughput)
Tools: Sharding by key, partitions, consistent hashing
Risks: Cross-shard queries, rebalancing, skew/hot partitions

Signals: Many services; manual deployments; failovers/scale require humans
Goal / Benefit: Automate lifecycle + coordination
Tools: Orchestration (Kubernetes/Nomad), service discovery, leader election
Risks: Operational complexity, misconfig outages, learning curve

Signals: Multiple request classes; noisy neighbors; need graceful degradation
Goal / Benefit: Prioritize critical traffic
Tools: Traffic shaping (priority queues, load shedding, admission control). Common tech: Envoy, NGINX, Kong, Redis (rate limits / priority queues), Kubernetes ResourceQuota, AWS API Gateway / Cloudflare (edge throttling).
Risks: Starving lower tiers, policy complexity

Signals: One DB can’t meet mixed needs (latency vs analytics vs blobs)
Goal / Benefit: Use the right store per workload
Tools: OLTP DB, separate OLAP store/warehouse, object store, time-series DB
Risks: Data duplication, consistency, ETL complexity

For a stage-by-stage view (MVP → Growth → Advanced) and links to staged examples, see Staged Design Examples.

Next: Redis vs Kafka: when to use which — a worked example of choosing between two common building blocks.