Database systems have transformed dramatically from their inception in the 1960s to the cloud-based platforms of today. This evolution has been driven by growing demands for scalability, performance, flexibility, and the ability to handle ever-increasing volumes of data. Over the decades, the dominant database models have shifted from early navigational (hierarchical and network) databases, to the relational model, and more recently to NoSQL and cloud-based solutions. Each stage in this journey was influenced by the limitations of the previous generation and the emerging needs of the time.
Over the course of this 3-hour interactive session, we will learn:
- modern storage technologies and
- apply them to the designed system.
- A Laptop
- Basic Command-Line Skills: Comfortably navigate directories and execute simple terminal commands.
- Git Fundamentals: Familiarity with cloning repositories, committing changes, and pushing to remote branches.
- GitHub Account: If you don’t already have one, create a free account on GitHub.
- Access to draw.io: A tool for creating and collaborating on diagrams.
Below is a high-level breakdown of the 3-hour bootcamp session:
- Storage: A Historical overview and evolution
- Key characteristics of different storage technologies
- Modern storage solutions
- Designing the Nerby Friends application
| Time (Approx.) | Activity |
|---|---|
| 16:00 (20 min) | Short intro & ice breaker |
| 16:20 (40 min) | Historical overview & Modern storages |
| 17:00 (30 min) | Designing the app, part 1 |
| 17:30 (15 min) | Pizza time 🍕 |
| 17:45 (60 min) | Designing the app, part 2 |
| 18:45 (15 min) | Summary & Closing Remarks |
Have you ever wondered how our data storage capabilities transformed from bulky magnetic tapes and basic hard disk drives (HDDs) to the lightning-fast NVMe solutions we rely on today? Since the dawn of computing, storage has been both a fundamental challenge and a catalyst for technological advancement. Early systems were constrained by expensive hardware and limited capacity - so engineers turned to Redundant Array of Independent Disks (RAID) to increase reliability, availability, and performance. By uniting multiple hard drives into a single logical unit, RAID laid the groundwork for more robust and fault-tolerant storage architectures.
As demand for speed and efficiency soared, Solid-State Drives (SSDs) heralded a major breakthrough with their superior data access, lower latency, and improved durability. The unceasing pursuit of higher performance then inspired the creation of Non-Volatile Memory Express (NVMe), a protocol that capitalizes on high-speed PCIe connections. This development propelled storage technology into a new era of capacity, throughput, and reliability — one that continues to evolve today.
When you think about the countless apps, services, and systems in use today, all of them rely on one unifying element: storage. It’s the bedrock that preserves the “state” of any application or platform, enabling everything from tiny web services to massive enterprise data centers. Building on the foundations of RAID, SSDs, and NVMe, today’s storage ecosystem has evolved into a diverse toolkit designed to meet every conceivable need. Whether you’re working with object storage for unstructured data, block storage for virtual machines, file storage for shared directories, or one of the many flavors of databases - SQL, NoSQL, or distributed - you’re tapping into decades of innovations in performance and reliability. Even data warehouses that drive analytics at scale and queue systems that orchestrate event-driven architectures owe their existence to these foundational breakthroughs. This talk will explore how the remarkable ways data is stored, managed, and transformed today.
System design is a structured process for planning and implementing software solutions that meet user needs while accommodating growth, performance, and security considerations. It involves translating requirements into a cohesive architecture, determining how to break the system into logical components, choosing appropriate data stores (relational vs. NoSQL), and selecting communication protocols (REST, gRPC, etc.). Each decision carries its own trade-offs around cost, scalability, complexity, and reliability.
Imagine you are a software engineer at a big tech company. You’ve been presented with an exciting challenge: implement a Nearby Friends feature in an existing messaging application. This functionality must aggregate and display real-time location data for users who have opted in and granted permission. The system must be robust enough to handle large volumes of updates, efficient enough to provide near-instant feedback to users, and secure enough to protect sensitive location data.
- User Visibility: Users should be able to see a list of nearby friends on their mobile apps.
- Distance & Timestamp: Each friend in the list displays a distance value and a timestamp indicating when that distance was last updated.
- Frequent Updates: The nearby friends list should refresh every few seconds to maintain up-to-date location information.
- How quickly after a friend moves or changes location should that be reflected in the app?
- Are we aiming for updates within a second, or is a few seconds delay acceptable?
- Does it need “five nines” 99.999% uptime since it’s user-facing?
- What is the impact if it’s temporarily unavailable?
- What scale must the system handle?
- Approximately how many users do we expect to use Nearby Friends?
- What percentage might use this feature concurrently?
- How frequently will users send location updates?
- How often will users query the nearby friends list?
- How much data would be generated per day by all updates?
- What are the main components/services in your design?
- How will responsibilities be divided?
- What does the mobile client do vs. the server?
- Does the client do any filtering or distance calculation on its own?
- What APIs or endpoints are needed?
- What would the request/response look like for each?
- What architecture style will you use?
- What protocols and patterns will services and clients use to communicate?
- How will the system deliver frequent updates to clients?
- Are there existing technologies or SDKs that could help?
- Where will these services run and how?
- What kind of storage technology is appropriate for storing users’ location data?
- How will we implement geo-spatial queries in the data store?
- Is caching worthwhile given data changes so rapidly?
- If the data grows large?
- How will the system ingest the continuous stream of location updates from users?
- How to handle out-of-order updates?
- What are the trade-off of using asynchronous processing?
| Characteristic | Description |
|---|---|
| performance | The amount of time it takes for the system to process a business request. |
| responsiveness | The amount of time it takes to get a response to the user. |
| availability | The amount of uptime of a system; usually measured in 9's (e.g., 99.9%). |
| fault tolerance | When fatal errors occur, other parts of the system continue to function. |
| scalability | A function of system capacity and growth over time; as the number of users or requests increase in the system, responsiveness, performance, and error rates remain constant. |
| elasticity | The system can expand and respond quickly to unexpected or anticipated extreme loads (e.g., going from 20 to 250,000 users instantly). |
| data integrity | The data across the system is correct and there is no data loss in the system. |
| data consistency | The data across the system is in sync and consistent across databases and tables. |
| adaptability | The ease in which a system can adapt to changes in environment and functionality. |
| concurrency | The ability of the system to process simultaneous requests, in most cases in the same order in which they were received; implied when scalability and elasticity are supported. |
| interoperability | The ability of the system to interface and interact with other systems to complete a business request. |
| extensibility | The ease in which a system can be extended with additional features and functionality. |
| deployability | The amount of ceremony involved with releasing the software, the frequency in which releases occur, and the overall risk of deployment. |
| testability | The ease of and completeness of testing. |
| abstraction | The level at which parts of the system are isolated from other parts of the system (both internal and external system interactions). |
| workflow | The ability of the system to manage complex workflows that require multiple parts (services) of the system to complete a business request. |
| configurability | The ability of the system to support multiple configurations, as well as support custom on-demand configurations and configuration updates. |
| recoverability | The ability of the system to start where it left off in the event of a system crash. |
| feasibility (implicit) | Considering timeframes, budgets, and developer skills when making architectural choices; tight timeframes and budgets make this a driving architectural characteristic. |
| security (implicit) | The ability of the system to restrict access to sensitive information or functionality. |
| maintainability (implicit) | The level of effort required to locate and apply changes to the system. |
| observability (implicit) | The ability of a system or a service to make available and stream metrics such as overall health, uptime, response times, performance, etc. |

