File-Sharing System — Designed in Stages
You don’t need to design for scale on day one.
Define what you need—upload, download, list, share (link or with users), and permissions—then build the simplest thing that works and evolve as storage, traffic, and multi-device needs grow.
Here we use a file-sharing system (Dropbox-, Google Drive–style) as the running example: users, files, optional folders, shares/permissions, and optional versioning. The same staged thinking applies to any system that stores blobs and metadata: durability, availability, access control, and metadata consistency are central.
Requirements and Constraints (no architecture yet)
Section titled “Requirements and Constraints (no architecture yet)”Functional Requirements
- Upload — user uploads a file; store blob durably and record metadata (name, size, owner, path or folder, checksum).
- Download — user or recipient downloads a file by id or path; enforce permissions; serve blob.
- List — list files (and optional folders) for a user; by folder or root; pagination and sorting.
- Share — share a file (or folder) via link (e.g. public link with token) or with specific users; set permission (view, edit, etc.).
- Permissions — who can view or edit; check on every read/write; consistent with share settings.
Quality Requirements
- Durability — uploaded data must not be lost; object storage with replication or erasure coding; backups if required.
- Availability — files should be readable when needed; redundancy and health checks; consider multi-region as scale grows.
- Access control — enforce permissions on download, list, and update; tokens or auth for links; no unauthorized access.
- Metadata consistency — file and folder metadata (name, parent, permissions) should be consistent with user view; avoid listing deleted or inaccessible items.
- Expected scale — number of users, total storage, files per user, download bandwidth, concurrent uploads/downloads.
Key Entities
- User — identity; owns files and folders; has quota (optional at MVP).
- File — blob reference (object key or path), metadata (name, size, content_type, owner_id, parent_id or path, created_at, updated_at); optional version_id.
- Folder (optional) — hierarchy; folder has name, parent_id, owner_id; files have parent_id pointing to folder or root.
- Share / Permission — link share (token, expiry, permission level) or user share (user_id, resource_id, role); check on access.
- Version (optional) — same file, multiple revisions; version_id or timestamp; restore previous version.
Primary Use Cases and Access Patterns
- Upload — write path; accept file stream or multipart; store in object storage; write metadata to DB; idempotent by key (e.g. path + version) if needed.
- Download — read path; resolve file by id or path; check permission; stream blob from object storage; support range requests for large files.
- List — read path; query metadata by user and parent (folder); filter by permission; paginate.
- Share — write path; create share record (link token or user permission); optional expiry; invalidate or refresh tokens.
- Permissions check — on every read/write; owner, shared-with, or link token with correct scope.
Given this, start with the simplest MVP: one API, one DB for metadata, object storage for blobs, upload/download, and simple sharing (link with token or permission record)—then add CDN/cache, sync or webhooks, quota/billing, and versioning as usage grows.
Stage 1 — MVP (simple, correct, not over-engineered)
Section titled “Stage 1 — MVP (simple, correct, not over-engineered)”Goal
Ship working file sharing: users upload and download files, list their files (and optional folders), and share via link or with other users. One API, one DB for metadata, object storage for blobs; minimal moving parts.
Components
- API — REST or similar; auth; upload file (multipart or stream); download file (by id or path); list files (by user, optional folder); create share (link or user); resolve share link (token → file, check expiry). Validate size limits and content type.
- DB (metadata) — store users, files (id, name, size, owner_id, parent_id or path, object_key, created_at, updated_at), optional folders, and shares (token or user_id, resource_id, permission, expiry). Index by owner, parent, and share token.
- Object storage (blobs) — store file contents; key by file_id or stable path; durable (replication or erasure coding per provider). Do not store blobs in DB.
- Upload / download — upload: receive file, write to object storage, then write metadata to DB (transactional where possible: metadata only after blob success, or accept eventual consistency). Download: read metadata, check permission, stream from object storage.
- Simple sharing — link: generate token, store in shares table with expiry and permission; download/list via token. Or share with user: store (user_id, file_id, role); check on access. No complex ACL engine at MVP.
Minimal Diagram
Client | v+-----------------+| API |+-----------------+ | | v vDB (metadata) Object storage (blobs) - files - blob by key - folders - sharesPatterns and Concerns (don’t overbuild)
- Metadata vs blob: always separate metadata (DB) from blob (object storage); reference blob by object_key or path; avoid storing content in DB.
- Permission check: on every download and list; owner always has access; else check shares table (link token or user_id + resource_id).
- Idempotency for uploads: optional; use content hash or upload_id so retries don’t create duplicates; or accept overwrite by path.
- Basic monitoring: upload/download success and latency, storage growth, share link usage, error rate.
Why This Is a Correct MVP
- One API, one DB for metadata, object storage for blobs, upload/download, simple sharing → enough to ship a usable file-sharing product; easy to reason about.
- Vertical scaling and single-region object storage buy you time before you need CDN, sync layer, and multi-region.
Stage 2 — Growth Phase (CDN, sync, quota, versioning)
Section titled “Stage 2 — Growth Phase (CDN, sync, quota, versioning)”What Triggers the Growth Phase?
- Download traffic grows; need CDN or cache for hot files to reduce latency and object-storage egress.
- Users expect multi-device sync or notifications (file changed elsewhere); need sync protocol or webhooks.
- Need quota and billing (storage per user, bandwidth); enforce limits and track usage.
- Versioning or conflict handling (e.g. overwrite vs new version); product expects “previous versions” or conflict resolution.
Components to Add (incrementally)
- CDN or cache for hot files — put popular or recently accessed blobs behind CDN; cache by object key or signed URL; reduce load on object storage and improve download latency.
- Sync or webhooks for multi-device — sync: client polls or long-poll for changes (delta by timestamp or version); or push via webhook/notification when file is updated so clients refresh. Simple “list changes since X” API can be enough before full sync protocol.
- Quota / billing — track storage per user (sum of file sizes); enforce quota on upload; optional bandwidth or request limits; integrate with billing if monetizing.
- Versioning or conflict handling — store multiple versions per file (version_id or timestamp); upload can create new version; download can specify version; optional “restore” as copy of previous version. Or last-write-wins with conflict detection (e.g. ETag) and user prompt.
Growth Diagram
Client (multi-device) | v+-----------------+| API |+-----------------+ | | +------------------+ v v | CDN / cache |DB (metadata) Object storage | (hot files) | | | +------------------+ v vQuota tracking Versions (optional) | vSync: "changes since X" or webhooksPatterns and Concerns to Introduce (practical scaling)
- Large file uploads: multipart upload for big files; resumable uploads if needed; don’t block API with single large request.
- Range requests: support HTTP Range for download so clients can resume or stream chunks; object storage usually supports this.
- Metadata consistency: list and permission checks must reflect current state; use DB transactions for share/create/delete; eventual consistency for blob vs metadata only if acceptable.
- Monitoring: CDN hit ratio, sync latency, quota usage, version count and storage impact.
Still Avoid (common over-engineering here)
- Multi-region storage and dedicated sync layer until traffic and geography justify them.
- Full audit logging and compliance pipeline until product or legal requires it.
- Complex conflict resolution (e.g. merge) unless product explicitly needs it; versioning or last-write-wins often enough.
Stage 3 — Advanced Scale (multi-region, sync layer, audit)
Section titled “Stage 3 — Advanced Scale (multi-region, sync layer, audit)”What Triggers Advanced Scale?
- Users and files are global; need multi-region storage or replication for latency and availability.
- Sync is a first-class feature (desktop/mobile clients); need dedicated sync layer (delta sync, conflict resolution, offline support).
- Compliance or security requires audit logging and access logs; who accessed what and when.
- Scale for very large files (e.g. video) and many concurrent uploads/downloads; optimize throughput and cost.
Components (common advanced additions)
- Multi-region storage — replicate blobs to multiple regions or use provider multi-region buckets; route read to nearest region; write to primary and replicate, or multi-write with conflict handling. Metadata may be global or regional with sync.
- Dedicated sync layer — service that handles client sync protocol: delta (changes since cursor), conflict resolution (e.g. rename or version), and offline queue; talks to metadata API and object storage; scales independently.
- Audit / logging — log access (who downloaded what, when); log share creation and permission changes; retain for compliance; consider append-only log or dedicated audit store.
- Scale for large files and many users — chunked or multipart upload at scale; rate limiting and backpressure; object storage tuning (part size, concurrency); separate read path (CDN) from write path (upload pipeline).
Advanced Diagram (conceptual)
Clients (many devices, many regions) | v+------------------+| API (metadata, || auth, share) |+------------------+ | +------------------+ v | Sync layer |DB (metadata, | (delta, conflict)|quota, audit) +------------------+ | | | v v vObject storage (multi-region or replicated) | vCDN (download path) | vAudit log (access, share events)Patterns and Concerns at This Stage
- Consistency across regions: metadata and blob replication; define read-your-writes and list consistency; often eventual consistency for list, strong for single-file read after write.
- Sync protocol: cursor or token for “changes since”; include creates, updates, deletes; conflict resolution (server wins, or last-write-wins with version); offline queue and retry.
- Cost and performance: storage and egress cost by region; CDN cost; audit log retention; optimize hot path and archive cold data.
- SLO-driven ops: upload/download latency and success rate, sync delay, availability per region; error budgets and on-call.
Summarizing the Evolution
Section titled “Summarizing the Evolution”MVP delivers file sharing with one API, one DB for metadata, object storage for blobs, upload/download, and simple sharing (link or user permission). That’s enough to ship a usable product.
As you grow, you add CDN or cache for hot files, sync or webhooks for multi-device, quota/billing, and versioning or conflict handling. You keep metadata consistent and enforce access control at every step.
At advanced scale, you add multi-region storage, a dedicated sync layer for first-class sync, and audit/logging for compliance. You scale for large files and many users without over-building on day one.
This approach gives you:
- Start Simple — API + DB + object storage, upload/download, simple sharing; ship and learn.
- Scale Intentionally — add CDN and sync when traffic and multi-device demand it; add versioning when product expects it.
- Add Complexity Only When Required — avoid multi-region and dedicated sync until scale and product justify them; keep durability and access control front and center.