Infrastructure Building Blocks
Infrastructure building blocks are the categories of technology you use to run, store, deliver, and operate a system.
Each technology you reach for—a VM, a database, Redis, Kafka, a load balancer—belongs to a type that serves a specific purpose.
This page maps types to technologies, then explains why each type exists: its purpose, main benefits, and when you’d add it.
For “when do I introduce what?” signals and decision matrices, see the Optimization Quick Reference.
Type → Technology
Section titled “Type → Technology”| # | Type | Technology |
|---|---|---|
| 1 | Compute / application hosting | EC2, GCE, Azure VMs, Lambda, Cloud Functions, Cloud Run, Fargate, App Engine, Elastic Beanstalk, Azure App Service, Render, Railway |
| 2 | Relational database (RDBMS) | PostgreSQL, MySQL, SQL Server, Oracle, MariaDB |
| 3 | Traffic distribution layer | Load Balancer, Nginx, Envoy, HAProxy, ALB/NLB, Traefik |
| 4 | Observability | Prometheus, Grafana, ELK, Datadog, OpenTelemetry, Jaeger, Loki |
| 5 | Container orchestration | Kubernetes, EKS, AKS, GKE, Nomad, Docker Swarm, ECS |
| 6 | Object storage | S3, GCS, Azure Blob, MinIO, Cloudflare R2 |
| 7 | In-memory cache / data store | Redis, Memcached, KeyDB, Dragonfly, Couchbase, Hazelcast |
| 8 | Edge cache / content delivery | Cloudflare, Akamai, Fastly, CloudFront, Bunny |
| 9 | Message queue | RabbitMQ, SQS, Azure Service Bus, ActiveMQ, Redis Streams |
| 10 | Distributed log / event streaming | Kafka, Pulsar, Kinesis, Redpanda |
| 11 | NoSQL Database (you operate) | MongoDB, Cassandra, ScyllaDB, HBase, CockroachDB |
| 12 | NoSQL Database (provider-managed) | MongoDB Atlas, DynamoDB, Cosmos DB, Firestore |
| 13 | Search & indexing engine | Elasticsearch, OpenSearch, Solr, Meilisearch, Typesense |
| 14 | Rate limiting & API control | AWS API Gateway / WAF, Apigee, Azure API Management, Kong, Tyk, Cloudflare Rate Limiting, Nginx limit_req |
| 15 | Coordination (leader election, service discovery) | ZooKeeper, etcd, Consul |
Why Each Type Exists
Section titled “Why Each Type Exists”It starts with compute and the typical baseline (relational database), then describes the other building blocks you add when you hit scale or reliability limits.
1. Compute / Application Hosting
Section titled “1. Compute / Application Hosting”Purpose: Run your application code — the most fundamental building block.
Examples: EC2, GCE, Azure VMs (VMs/instances); Lambda, Cloud Functions, Azure Functions, Cloudflare Workers (serverless); Cloud Run, Fargate (managed containers); App Engine, Elastic Beanstalk, Azure App Service, Render, Railway (managed platforms)
Benefit:
- Runs your application logic
- Multiple models: full control (VMs) vs managed (PaaS) vs event-driven (serverless)
- Scale from single instance to global fleet
Used when: Always — you need compute before anything else. Choose VMs for full control, managed platforms for simplicity, serverless for event-driven or bursty workloads, and managed containers when you want container packaging without managing orchestration.
2. Relational Databases
Section titled “2. Relational Databases”Purpose: ACID transactions, joins, and a well-understood relational model—excellent for structured data with complex relationships and strong consistency.
Examples: PostgreSQL, MySQL, SQL Server
Benefit:
- ACID transactions
- SQL and well-understood data model
- Mature operational and tooling ecosystem
Used when: Default choice for structured, transactional data; strong consistency and relational modeling are the main needs.
3. Load Balancing & Traffic Management
Section titled “3. Load Balancing & Traffic Management”Purpose: Route traffic across instances with health checks and failover—enables fault tolerance, horizontal scaling, and zero-downtime deploys.
Examples: ALB, NLB (AWS), Cloud Load Balancing (GCP), Azure Load Balancer / Application Gateway, Nginx, Envoy, HAProxy, Traefik
Benefit:
- Fault tolerance
- Horizontal scaling
- Zero-downtime deploys
Used when: You have more than one instance.
4. Observability
Section titled “4. Observability”Purpose: See what the system is doing
Examples: Prometheus, Grafana, ELK, Datadog, OpenTelemetry, Jaeger, Loki
Benefit:
- Faster debugging
- Safer scaling
- Informed decisions
Used when: As soon as you have a production path (compute + database + traffic); before adding cache, queues, or other optimizations.
5. Orchestration & Scheduling
Section titled “5. Orchestration & Scheduling”Purpose: Automate compute lifecycle
Examples: Kubernetes, EKS (Amazon), AKS (Azure), GKE (Google), Nomad, Docker Swarm, ECS
Benefit:
- Auto-recovery
- Scaling
- Safer deployments
Used when: Many teams adopt containers (Kubernetes, EKS, AKS, GKE) early; add when you standardize how you run and deploy workloads.
6. Object & Blob Storage
Section titled “6. Object & Blob Storage”Purpose: Cheap, durable storage for files and unstructured data.
Examples: S3, GCS, Azure Blob, MinIO, Cloudflare R2
Benefit:
- Virtually infinite scale
- High durability
- Offloads large data from DBs
Used when: Almost every app needs it early—uploads, static assets, backups, logs, media.
7. In-Memory Cache / Data Store
Section titled “7. In-Memory Cache / Data Store”Purpose: Reduce latency and load for application data (sessions, hot keys, computed results).
Examples: Redis, Memcached, KeyDB, Dragonfly, Couchbase, Hazelcast
Benefit:
- Faster reads
- Fewer DB hits
- Absorbs traffic spikes
Used when: Reads dominate or latency matters; you need sub-ms or low-ms access to hot data.
8. Edge Cache / Content Delivery
Section titled “8. Edge Cache / Content Delivery”Purpose: Serve static or cacheable content close to users and reduce origin load.
Examples: Cloudflare, Akamai, Fastly, CloudFront, Bunny
Benefit:
- Lower latency for global users
- Offload traffic from origin
- DDoS and edge security
Used when: Static assets, media, or cacheable API responses need to be fast worldwide.
9. Message Queue
Section titled “9. Message Queue”Purpose: Decouple producers and consumers with at-most-once or at-least-once delivery—async work queues that move slow or flaky tasks off the request path.
When simplicity and ops matter more than replay and fan-out, queues like RabbitMQ and SQS are easier to reason about and run: offload slow work from the request path or decouple service A from B with at-least-once delivery.
Examples: RabbitMQ, SQS, Azure Service Bus, ActiveMQ, Redis Streams
Benefit:
- Async processing
- Backpressure handling
- Decoupling and reliability
Used when: Tasks are slow or flaky and should not block the request; you need work queues, not replay.
10. Distributed Log / Event Streaming
Section titled “10. Distributed Log / Event Streaming”Purpose: Ordered, durable event log with replay and fan-out—multiple consumers can read the same stream independently, ideal for event sourcing and audit trails.
Examples: Kafka, Pulsar, Kinesis, Redpanda
Benefit:
- Replay & backfill
- Multiple consumer groups
- Event sourcing and audit trail
Used when: You need ordering per key, replay, or many consumers reading the same stream.
11. NoSQL Database (You Operate)
Section titled “11. NoSQL Database (You Operate)”Purpose: Flexible-schema, partition-tolerant storage optimized for high write throughput, document/key-value/wide-column access patterns, and predictable performance at scale—you operate or host the cluster.
Examples: MongoDB, Cassandra, ScyllaDB, HBase, CockroachDB
Benefit:
- High write throughput
- Partition tolerance
- Predictable performance at scale
Used when: Your RDBMS hits scale limits; you need horizontal scaling, different access patterns, and are willing to operate the cluster.
12. NoSQL Database (Provider-Managed)
Section titled “12. NoSQL Database (Provider-Managed)”Purpose: Managed document or key-value store with flexible schemas, serverless or pay-per-use scaling, and global replication—provider runs and operates it so you avoid cluster management.
Examples: MongoDB Atlas, DynamoDB, Cosmos DB, Firestore
Benefit:
- Serverless or managed scaling
- No cluster operations
- Global replication and SLAs from the provider
Used when: You need scale and flexible schemas but want to avoid running your own distributed database.
13. Search & Indexing Engines
Section titled “13. Search & Indexing Engines”Purpose: Fast queries over large datasets
Examples: Elasticsearch, OpenSearch, Solr
Benefit:
- Full-text search
- Aggregations
- Low-latency reads
Used when: Queries become complex or user-facing.
14. Rate Limiting & API Control
Section titled “14. Rate Limiting & API Control”Purpose: Enforce rate limits, quotas, and access policies at the API boundary—prevents overload, ensures fair usage, and controls abuse.
Examples: AWS API Gateway / WAF, GCP Apigee / Cloud Endpoints, Azure API Management, Cloudflare Rate Limiting, Kong, Tyk, Redis-based limiters, Nginx limit_req
Benefit:
- Prevent overload
- Fair usage
- Abuse control
Used when: Traffic becomes unpredictable or external.
15. Coordination (leader election, service discovery)
Section titled “15. Coordination (leader election, service discovery)”Purpose: Leader election, consensus, service discovery, and distributed configuration—lets many components agree on who leads and where things live.
Examples: ZooKeeper, etcd, Consul
Benefit:
- Leader election
- Configuration consistency
- Service discovery
Used when: Many distributed components must agree.
Next: Optimization Quick Reference — when to add what. Or Redis vs Kafka: when to use which for a concrete example.