Skip to content

ChiragJS/DFS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

14 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Distributed File System (DFS)

A simplified Distributed File System inspired by GFS and HDFS. Features a master node for metadata management and multiple chunk servers for distributed data storage with automatic replication.

Quick Start

# Build all binaries
go build -o bin/dfs-master ./cmd/master
go build -o bin/dfs-chunkserver ./cmd/chunkserver
go build -o bin/dfs-client ./cmd/client

# Start the master server
./bin/dfs-master

# Start chunk servers (in separate terminals)
./bin/dfs-chunkserver -port :9001 -master localhost:8000 -dir ./data/cs1
./bin/dfs-chunkserver -port :9002 -master localhost:8000 -dir ./data/cs2

# Upload a file
./bin/dfs-client put /path/to/local/file.txt remote-name.txt

# Download a file
./bin/dfs-client get remote-name.txt /path/to/download/dir

Features

βœ… Implemented

Feature Description
File Upload Chunked upload with automatic allocation
File Download Reassembly from distributed chunks
Chunk Replication Configurable replication factor (default: 2)
Heartbeat System Bidirectional streaming for health monitoring
Dead Server Detection Automatic detection and replication triggering
Replica Placement Storage-aware server selection
Structured Logging slog-based logging with levels
Client SDK High-level API for file operations
CLI Tool Command-line interface for put/get operations

πŸ”œ Planned

  • Write pipeline (chain replication)
  • Lease management for primary writes
  • Checksum verification
  • Graceful shutdown handling

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                           Client                                 β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚  DFSClient SDK                                           β”‚    β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚    β”‚
β”‚  β”‚  β”‚ Uploader  β”‚  β”‚ Downloader β”‚  β”‚ MasterClient       β”‚  β”‚    β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
                              β”‚ gRPC
                              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        Master Server                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ Metadata    β”‚  β”‚ Heartbeat    β”‚  β”‚ Chunk Placement        β”‚  β”‚
β”‚  β”‚ Manager     β”‚  β”‚ Manager      β”‚  β”‚ Manager                β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
                              β”‚ gRPC (Heartbeat + Tasks)
                              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                       Chunk Servers                              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”‚
β”‚  β”‚ ChunkServer 1 β”‚  β”‚ ChunkServer 2 β”‚  β”‚ ChunkServer N β”‚        β”‚
β”‚  β”‚ :9001         β”‚  β”‚ :9002         β”‚  β”‚ :900N         β”‚        β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Component Diagram

flowchart TB
    classDef client fill:#fef08a,stroke:#eab308,color:#1c1917
    classDef master fill:#c4b5fd,stroke:#8b5cf6,color:#1c1917
    classDef chunk fill:#93c5fd,stroke:#3b82f6,color:#1c1917

    Client["Client<br/><br/>CLI / SDK<br/>File operations"]:::client --> |gRPC| Master

    subgraph Master["Master Server"]
        MM["Metadata Manager<br/>File β†’ Chunk mapping"]:::master
        HM["Heartbeat Manager<br/>Monitor ChunkServers"]:::master
        CM["Chunk Placement<br/>Replica selection"]:::master
    end

    subgraph CSGroup["ChunkServers"]
        CS1["ChunkServer 1<br/>:9001"]:::chunk
        CS2["ChunkServer 2<br/>:9002"]:::chunk
        CSN["ChunkServer N<br/>:900N"]:::chunk
    end

    Master --> |Heartbeat| CS1
    Master --> |Heartbeat| CS2
    Master --> |Heartbeat| CSN

    Client --> |Read/Write| CS1
    Client --> |Read/Write| CS2
    Client --> |Read/Write| CSN
Loading

Write Sequence

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#c4b5fd', 'secondaryColor': '#93c5fd', 'tertiaryColor': '#fef08a', 'primaryTextColor': '#1c1917', 'lineColor': '#8b5cf6'}}}%%
sequenceDiagram
    participant C as Client
    participant M as Master
    participant CS1 as ChunkServer 1
    participant CS2 as ChunkServer 2

    C->>M: AllocateChunk(filename, index)
    M-->>C: ChunkID + ReplicaServers
    C->>CS1: UploadChunk(stream)
    CS1-->>C: Success
    
    Note over M,CS2: Replication via heartbeat
    M->>CS1: ReplicationTask
    CS1->>CS2: ReplicateChunk
    CS2-->>CS1: Ack
Loading

Read Sequence

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#c4b5fd', 'secondaryColor': '#93c5fd', 'tertiaryColor': '#fef08a', 'primaryTextColor': '#1c1917', 'lineColor': '#8b5cf6'}}}%%
sequenceDiagram
    participant C as Client
    participant M as Master
    participant CS as ChunkServer

    C->>M: GetFileInfo(filename)
    M-->>C: ChunkIDs + Locations
    C->>CS: DownloadChunk(chunkID)
    CS-->>C: ChunkData stream
Loading

Project Structure

dfs/
β”œβ”€β”€ cmd/                    # Entry points
β”‚   β”œβ”€β”€ master/main.go      # Master server binary
β”‚   β”œβ”€β”€ chunkserver/main.go # Chunk server binary
β”‚   └── client/main.go      # CLI client binary
β”‚
β”œβ”€β”€ internal/               # Private implementation
β”‚   β”œβ”€β”€ master/             # Master server logic
β”‚   β”œβ”€β”€ chunkserver/        # Chunk server logic
β”‚   └── client/             # Client SDK
β”‚       β”œβ”€β”€ dfsclient.go    # High-level SDK
β”‚       β”œβ”€β”€ uploader/       # Upload handling
β”‚       β”œβ”€β”€ downloader/     # Download handling
β”‚       └── masterclient/   # Master communication
β”‚
β”œβ”€β”€ pkg/                    # Shared packages
β”‚   └── logger/             # Structured logging
β”‚
β”œβ”€β”€ dfs/                    # Generated protobuf code
β”‚   β”œβ”€β”€ masterpb/
β”‚   └── chunkpb/
β”‚
└── proto/                  # Protobuf definitions

CLI Usage

# Upload a file
./bin/dfs-client put <local-file> <remote-name>

# Download a file
./bin/dfs-client get <remote-name> <local-directory>

# Specify custom master address
./bin/dfs-client -master localhost:9000 put file.txt myfile.txt

Chunk Server Flags

./bin/dfs-chunkserver [flags]

Flags:
  -port string    Chunk server address (default ":9001")
  -master string  Master server address (default ":8000")
  -dir string     Storage directory (default "./data")

Configuration

Parameter Default Description
REPLICATION_FACTOR 2 Number of replicas per chunk
CHUNK_SIZE 64 MB Size of each chunk
LIVE_THRESHOLD 30s Server considered dead after this
Heartbeat Interval 5s Chunk server heartbeat frequency

Data Flow

Write Path

  1. Client requests chunk allocation from Master
  2. Master returns ChunkID and replica servers
  3. Client uploads directly to chunk server
  4. Master triggers replication via heartbeat

Read Path

  1. Client requests file info from Master
  2. Master returns chunk locations
  3. Client downloads chunks directly from chunk servers
  4. Client reassembles file from chunks

Heartbeat System

Chunk servers maintain a bidirectional gRPC stream with the master:

ChunkServer β†’ Master:

  • Server address
  • Free storage (MB)
  • List of stored chunks

Master β†’ ChunkServer:

  • Replication tasks
  • Delete tasks

Development

# Build all
go build ./...

# Run tests
go test ./...

# Lint
go vet ./...

License

MIT

About

Simple Implementation of a Distributed File Storage

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages