References
This document serves as a curated directory of real-world system architectures, foundational distributed systems papers, and official company engineering blogs.
How to Study These Architectures
Do not get bogged down in the nitty-gritty details of each article. Instead, focus on the following:
- Identify Patterns: Look for shared principles, common technologies, and recurring patterns (e.g., how often Redis is used for caching, or Kafka for decoupling).
- Understand the "Why": Study the specific problems solved by each component, where the architecture excels, and where it falls short.
- Extract Lessons Learned: Pay attention to the engineering trade-offs and retrospective lessons shared by the authors.
Foundational Systems
These papers and presentations detail the core distributed systems and databases that power modern infrastructure.
| Type | System | Reference(s) |
|---|---|---|
| Data processing | MapReduce - Distributed data processing from Google | research.google.com |
| Data processing | Spark - Distributed data processing from Databricks | slideshare.net |
| Data processing | Storm - Distributed data processing from Twitter | slideshare.net |
| Data store | Bigtable - Distributed column-oriented database from Google | harvard.edu |
| Data store | HBase - Open source implementation of Bigtable | slideshare.net |
| Data store | Cassandra - Distributed column-oriented database from Facebook | slideshare.net |
| Data store | DynamoDB - Document-oriented database from Amazon | harvard.edu |
| Data store | MongoDB - Document-oriented database | slideshare.net |
| Data store | Spanner - Globally-distributed database from Google | research.google.com |
| Data store | Memcached - Distributed memory caching system | slideshare.net |
| Data store | Redis - Distributed memory caching system with persistence and value types | slideshare.net |
| File system | Google File System (GFS) - Distributed file system | research.google.com |
| File system | Hadoop File System (HDFS) - Open source implementation of GFS | apache.org |
| Misc | Chubby - Lock service for loosely-coupled distributed systems from Google | research.google.com |
| Misc | Dapper - Distributed systems tracing infrastructure | research.google.com |
| Misc | Kafka - Pub/sub message queue from LinkedIn | slideshare.net |
| Misc | Zookeeper - Centralized infrastructure and services enabling synchronization | slideshare.net |
Company Architectures
Deep dives into how major tech companies handle massive scale, traffic spikes, and data persistence.
Engineering Blogs
Stay up to date with how industry leaders tackle evolving architectural challenges. These blogs are highly recommended reading before interviewing with the respective companies, as interview questions are frequently drawn from their real-world engineering hurdles.
- Airbnb Engineering
- Atlassian Developers
- AWS Blog
- Bitly Engineering Blog
- Box Blogs
- Cloudera Developer Blog
- Dropbox Tech Blog
- Ebay Tech Blog
- Engineering at Quora
- Etsy Code as Craft
- Evernote Tech Blog
- Facebook Engineering
- Flickr Code
- Foursquare Engineering Blog
- GitHub Engineering Blog
- Google Research Blog
- Groupon Engineering Blog
- Heroku Engineering Blog
- High Scalability
- Hubspot Engineering Blog
- Instagram Engineering
- Intel Software Blog
- Jane Street Tech Blog
- LinkedIn Engineering
- Microsoft Engineering
- Microsoft Python Engineering
- Netflix Tech Blog
- Paypal Developer Blog
- Pinterest Engineering Blog
- Reddit Blog
- Salesforce Engineering Blog
- Slack Engineering Blog
- Spotify Labs
- Stripe Engineering Blog
- Twilio Engineering Blog
- Twitter Engineering
- Uber Engineering Blog
- Yahoo Engineering Blog
- Yelp Engineering Blog
- Zynga Engineering Blog