Skip to content

Pega-ModernDataStorageTechnologies/slides

Repository files navigation

pega

Modern Data Storage Technologies

Database systems have transformed dramatically from their inception in the 1960s to the cloud-based platforms of today. This evolution has been driven by growing demands for scalability, performance, flexibility, and the ability to handle ever-increasing volumes of data. Over the decades, the dominant database models have shifted from early navigational (hierarchical and network) databases, to the relational model, and more recently to NoSQL and cloud-based solutions. Each stage in this journey was influenced by the limitations of the previous generation and the emerging needs of the time.

Over the course of this 3-hour interactive session, we will learn:

  • modern storage technologies and
  • apply them to the designed system.

Prerequisites

  • A Laptop
  • Basic Command-Line Skills: Comfortably navigate directories and execute simple terminal commands.
  • Git Fundamentals: Familiarity with cloning repositories, committing changes, and pushing to remote branches.
  • GitHub Account: If you don’t already have one, create a free account on GitHub.
  • Access to draw.io: A tool for creating and collaborating on diagrams.

Agenda

High-level

Below is a high-level breakdown of the 3-hour bootcamp session:

  • Storage: A Historical overview and evolution
  • Key characteristics of different storage technologies
  • Modern storage solutions
  • Designing the Nerby Friends application

Schedule

Time (Approx.) Activity
16:00 (20 min) Short intro & ice breaker
16:20 (40 min) Historical overview & Modern storages
17:00 (30 min) Designing the app, part 1
17:30 (15 min) Pizza time 🍕
17:45 (60 min) Designing the app, part 2
18:45 (15 min) Summary & Closing Remarks

Storage historical overview and its evolution

Have you ever wondered how our data storage capabilities transformed from bulky magnetic tapes and basic hard disk drives (HDDs) to the lightning-fast NVMe solutions we rely on today? Since the dawn of computing, storage has been both a fundamental challenge and a catalyst for technological advancement. Early systems were constrained by expensive hardware and limited capacity - so engineers turned to Redundant Array of Independent Disks (RAID) to increase reliability, availability, and performance. By uniting multiple hard drives into a single logical unit, RAID laid the groundwork for more robust and fault-tolerant storage architectures.

As demand for speed and efficiency soared, Solid-State Drives (SSDs) heralded a major breakthrough with their superior data access, lower latency, and improved durability. The unceasing pursuit of higher performance then inspired the creation of Non-Volatile Memory Express (NVMe), a protocol that capitalizes on high-speed PCIe connections. This development propelled storage technology into a new era of capacity, throughput, and reliability — one that continues to evolve today.

Modern storages

When you think about the countless apps, services, and systems in use today, all of them rely on one unifying element: storage. It’s the bedrock that preserves the “state” of any application or platform, enabling everything from tiny web services to massive enterprise data centers. Building on the foundations of RAID, SSDs, and NVMe, today’s storage ecosystem has evolved into a diverse toolkit designed to meet every conceivable need. Whether you’re working with object storage for unstructured data, block storage for virtual machines, file storage for shared directories, or one of the many flavors of databases - SQL, NoSQL, or distributed - you’re tapping into decades of innovations in performance and reliability. Even data warehouses that drive analytics at scale and queue systems that orchestrate event-driven architectures owe their existence to these foundational breakthroughs. This talk will explore how the remarkable ways data is stored, managed, and transformed today.

Designing the Nearby Friends application

System design is a structured process for planning and implementing software solutions that meet user needs while accommodating growth, performance, and security considerations. It involves translating requirements into a cohesive architecture, determining how to break the system into logical components, choosing appropriate data stores (relational vs. NoSQL), and selecting communication protocols (REST, gRPC, etc.). Each decision carries its own trade-offs around cost, scalability, complexity, and reliability.

Imagine you are a software engineer at a big tech company. You’ve been presented with an exciting challenge: implement a Nearby Friends feature in an existing messaging application. This functionality must aggregate and display real-time location data for users who have opted in and granted permission. The system must be robust enough to handle large volumes of updates, efficient enough to provide near-instant feedback to users, and secure enough to protect sensitive location data.

Functional requirements

  • User Visibility: Users should be able to see a list of nearby friends on their mobile apps.
  • Distance & Timestamp: Each friend in the list displays a distance value and a timestamp indicating when that distance was last updated.
  • Frequent Updates: The nearby friends list should refresh every few seconds to maintain up-to-date location information.

Further steps

Non-functional requirements

  • How quickly after a friend moves or changes location should that be reflected in the app?
  • Are we aiming for updates within a second, or is a few seconds delay acceptable?
  • Does it need “five nines” 99.999% uptime since it’s user-facing?
  • What is the impact if it’s temporarily unavailable?
  • What scale must the system handle?

Back-of-the-envelope estimation

  • Approximately how many users do we expect to use Nearby Friends?
  • What percentage might use this feature concurrently?
  • How frequently will users send location updates?
  • How often will users query the nearby friends list?
  • How much data would be generated per day by all updates? 

High-level system decomposition (components & APIs)

  • What are the main components/services in your design?
  • How will responsibilities be divided?
  • What does the mobile client do vs. the server?
  • Does the client do any filtering or distance calculation on its own?
  • What APIs or endpoints are needed?
  • What would the request/response look like for each?

Architectural & technology choices

  • What architecture style will you use?
  • What protocols and patterns will services and clients use to communicate?
  • How will the system deliver frequent updates to clients?
  • Are there existing technologies or SDKs that could help?
  • Where will these services run and how?

Data storage & modeling

  • What kind of storage technology is appropriate for storing users’ location data?
  • How will we implement geo-spatial queries in the data store?
  • Is caching worthwhile given data changes so rapidly?
  • If the data grows large?

Asynchronous updates & messaging

  • How will the system ingest the continuous stream of location updates from users?
  • How to handle out-of-order updates?
  • What are the trade-off of using asynchronous processing?

Characteristics to choose from

Characteristic Description
performance The amount of time it takes for the system to process a business request.
responsiveness The amount of time it takes to get a response to the user.
availability The amount of uptime of a system; usually measured in 9's (e.g., 99.9%).
fault tolerance When fatal errors occur, other parts of the system continue to function.
scalability A function of system capacity and growth over time; as the number of users or requests increase in the system, responsiveness, performance, and error rates remain constant.
elasticity The system can expand and respond quickly to unexpected or anticipated extreme loads (e.g., going from 20 to 250,000 users instantly).
data integrity The data across the system is correct and there is no data loss in the system.
data consistency The data across the system is in sync and consistent across databases and tables.
adaptability The ease in which a system can adapt to changes in environment and functionality.
concurrency The ability of the system to process simultaneous requests, in most cases in the same order in which they were received; implied when scalability and elasticity are supported.
interoperability The ability of the system to interface and interact with other systems to complete a business request.
extensibility The ease in which a system can be extended with additional features and functionality.
deployability The amount of ceremony involved with releasing the software, the frequency in which releases occur, and the overall risk of deployment.
testability The ease of and completeness of testing.
abstraction The level at which parts of the system are isolated from other parts of the system (both internal and external system interactions).
workflow The ability of the system to manage complex workflows that require multiple parts (services) of the system to complete a business request.
configurability The ability of the system to support multiple configurations, as well as support custom on-demand configurations and configuration updates.
recoverability The ability of the system to start where it left off in the event of a system crash.
feasibility (implicit) Considering timeframes, budgets, and developer skills when making architectural choices; tight timeframes and budgets make this a driving architectural characteristic.
security (implicit) The ability of the system to restrict access to sensitive information or functionality.
maintainability (implicit) The level of effort required to locate and apply changes to the system.
observability (implicit) The ability of a system or a service to make available and stream metrics such as overall health, uptime, response times, performance, etc.

Meet your hosts

Michal Łukowicz
Fellow DevOps Engineer
Humble SRE engineer in love with Kafka, observability, automation and pretending that I can do Python.
Michal Łukowicz
Jakub Słyś
Senior Principal Software Engineer
From Telco to AI powering millisecond‑scale platforms: built LTE/LTE Advanced control‑plane software at Nokia, accelerated Sabre’s real‑time flight‑shopping engine, now shaping Pega’s AI‑driven Customer Decision Hub for Fortune 500 personalization.
Jakub Słyś
Michał Woźniak
Principal Software Engineer
Started IT journey with pre AI Voicebots at a startup called Vivic Labs partnered with Asseco. Moved to Pegasystems to learn a bit more about Big Data processing, distributed databases and see a bit of the big corporate culture.
Michał Woźniak

Contributors 3

  •  
  •  
  •