Disk-Backed Message Processing System (Kafka-Inspired)

Overview

This project implements a Kafka-inspired message processing system in pure Java.
It focuses on backpressure, bounded concurrency, disk persistence, and graceful shutdown.

The purpose of this project is to explore and demonstrate real-world system design problems:

How to handle producer–consumer imbalance
How to prevent unbounded memory growth
How to use disk as a safety buffer under load
How to shut down concurrent systems without data loss

A simple in-memory implementation is included for comparison.

Technology Stack

Backend

Java
Maven
RESTful API design

Frontend

JavaScript / TypeScript
Modern frontend tooling (via npm)

Infrastructure

Docker
Docker Compose

Prerequisites

Ensure the following tools are installed:

Java 17 or higher
Maven
Node.js 18 or higher
npm
Docker & Docker Compose

Running the Project

Option 1: Run with Docker (Recommended)

This is the easiest and most consistent way to run the entire system.

docker-compose up --build

Once started:

Backend API: http://localhost:8080
Frontend UI: http://localhost:5173

Option 2: Run Locally (Without Docker)

Start the Backend

cd messaging-api
mvn clean package
java -jar target/*.jar

Start the Frontend

cd messaging-ui
npm install
npm run dev

Project Structure

.
├── messaging-api/        # Backend REST API (Java)
├── messaging-engine/     # Backend Message Processing Engine (Java)
├── messaging-ui/         # Frontend Web UI
├── docker-compose.yml    # Unified setup for local development
├── .project
└── README.md


messaging-engine/
└── src/
  └── org.main.project
  ├── kafka_style_sys
  │ ├── MainExecutorClass.java
  │ ├── worker
  │ │ └── WorkerThreadPool.java
  │ ├── service
  │ │ ├── DiskQueue.java
  │ │ ├── FileDiskQueue.java
  │ │ └── IndexMapService.java
  │ └── record
  │ └── DiskRecord.java
  │
  └── simple_msg_sys
  ├── MainClass.java
  └── inmemsys
  └── InMemorySystem.java

API Usage

The backend exposes REST endpoints that can be accessed via:

The provided Web UI
Tools like Postman or curl
Custom client applications

Typical use cases include:

Sending messages
Retrieving messages
Testing API behavior and flows

Key Concepts

Backpressure using Semaphore
Bounded ThreadPoolExecutor
Disk-backed queue with file persistence
Explicit locking with ReentrantLock and Condition
Graceful shutdown without task loss
Producer / consumer decoupling

Architecture Overview

Producer submits a task
A Semaphore enforces global capacity
If memory is available → task goes to the executor
If memory is full → task is written to disk
Drainer thread moves tasks from disk back into memory
Workers process tasks and release permits

Design Decisions

Disk-Backed Queue (Overflow Buffer)

Disk acts as a pressure buffer, not the primary queue (RAM-first, disk-overflow)
Append-only write path for high throughput and low fragmentation
Uses a record format (length-prefix + payload) to support deterministic replay
Crash-safe recovery:
- On startup, the system replays unread records from the last known read offset
- Partial/corrupt trailing records are detected and ignored safely
Thread-safe by design:
- Single-writer append path (or explicit locking if multiple producers)
- Separate read pointer for the drainer
Guarantees (typical configuration):
- Preserves FIFO ordering per queue
- Supports at-least-once delivery (exactly-once requires idempotency at consumer)

Backpressure via Semaphore (Producer Throttling)

Every enqueue must acquire a permit
A permit is released only after the task is fully processed (completion-ack)
Prevents unbounded memory growth and naturally throttles producers under load
Works as a single, central capacity control across:
- in-memory queue
- disk buffer
- executor backlog
Failure handling:
- If enqueue fails after acquiring a permit, the permit is released immediately
- Timeout-based acquisition can be used to avoid blocking forever

Bounded ThreadPoolExecutor (Controlled Concurrency)

Fixed worker count (predictable throughput)
Fixed in-memory queue size (bounded latency and RAM)
Avoids dangerous unbounded executors (newCachedThreadPool, unbounded queues)
Rejection strategy is explicit and intentional:
- AbortPolicy for strict fail-fast
- or a custom handler that routes overflow to disk buffer
Supports clean instrumentation:
- active threads, queue depth, completed task count, rejection count

Drainer Thread (Disk → Memory / Executor)

Independent from workers (separation of concerns)
Moves tasks from disk buffer into the bounded in-memory pipeline
Uses Condition.await() / signaling instead of polling:
- Signals when disk has new data
- Signals when capacity permits become available
CPU-efficient:
- Sleeps when there is nothing to drain or no capacity
- Wakes up only on meaningful events (new record / permit released / shutdown)
Safe batching (optional):
- Reads in small batches for throughput while respecting capacity bounds
- Avoids starving producers or executor queue

Graceful Shutdown (Deterministic, No Surprise Loss)

Shutdown sequence (recommended):

Stop accepting new tasks (flip an accepting=false gate)
Signal drainer to stop waiting and enter shutdown mode
Drain disk queue (respecting permits and executor capacity)
Stop drainer thread (join with timeout, then hard-stop fallback if needed)
Shut down executor:
- shutdown() then awaitTermination(...)
- shutdownNow() only as last resort
Flush/close file resources safely:
- ensure buffers are flushed (e.g., FileChannel.force(true) if required)
- persist read-offset/checkpoint
Final consistency check:
- permits should return to full capacity (no leaks)
- disk segments should be fully acknowledged or remain replayable

Result: deterministic shutdown, predictable behavior under load, and no silent data loss (when using at-least-once semantics + replayable disk buffer).

Possible Extensions

This project is intentionally minimal and can be extended in many directions, such as:

Persistent storage (e.g. PostgreSQL, MongoDB)
Authentication and authorization
Message queues or async processing
Rate limiting or throttling
Monitoring and logging
WebSocket support

Contributing

Contributions are welcome. To contribute:

Fork the repository
Create a feature branch
Make your changes with clear commits
Open a pull request with a concise description

Please ensure:

Code is clean and readable
Changes are well-scoped
Commits are meaningful

Feedback and Support

If you find an issue, have a suggestion, or want to discuss improvements, please open an issue in the repository.

Automated testing (unit and integration)
UI/UX improvements

Name		Name	Last commit message	Last commit date
Latest commit History 128 Commits
.github/workflows		.github/workflows
assets		assets
messaging-api		messaging-api
messaging-engine		messaging-engine
messaging-ui		messaging-ui
.gitignore		.gitignore
Dockerfile.backend		Dockerfile.backend
README.md		README.md
docker-compose.release.yml		docker-compose.release.yml
docker-compose.yml		docker-compose.yml
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Disk-Backed Message Processing System (Kafka-Inspired)

Overview

Technology Stack

Prerequisites

Running the Project

Project Structure

API Usage

Key Concepts

Architecture Overview

Design Decisions

Disk-Backed Queue (Overflow Buffer)

Backpressure via Semaphore (Producer Throttling)

Bounded ThreadPoolExecutor (Controlled Concurrency)

Drainer Thread (Disk → Memory / Executor)

Graceful Shutdown (Deterministic, No Surprise Loss)

Possible Extensions

Contributing

Feedback and Support

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Languages

ankitx-sharma/disk-backed-task-processing-sys

Folders and files

Latest commit

History

Repository files navigation

Disk-Backed Message Processing System (Kafka-Inspired)

Overview

Technology Stack

Prerequisites

Running the Project

Project Structure

API Usage

Key Concepts

Architecture Overview

Design Decisions

Disk-Backed Queue (Overflow Buffer)

Backpressure via Semaphore (Producer Throttling)

Bounded ThreadPoolExecutor (Controlled Concurrency)

Drainer Thread (Disk → Memory / Executor)

Graceful Shutdown (Deterministic, No Surprise Loss)

Possible Extensions

Contributing

Feedback and Support

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Languages

Packages