A simplified Distributed File System inspired by GFS and HDFS. Features a master node for metadata management and multiple chunk servers for distributed data storage with automatic replication.
# Build all binaries
go build -o bin/dfs-master ./cmd/master
go build -o bin/dfs-chunkserver ./cmd/chunkserver
go build -o bin/dfs-client ./cmd/client
# Start the master server
./bin/dfs-master
# Start chunk servers (in separate terminals)
./bin/dfs-chunkserver -port :9001 -master localhost:8000 -dir ./data/cs1
./bin/dfs-chunkserver -port :9002 -master localhost:8000 -dir ./data/cs2
# Upload a file
./bin/dfs-client put /path/to/local/file.txt remote-name.txt
# Download a file
./bin/dfs-client get remote-name.txt /path/to/download/dir| Feature | Description |
|---|---|
| File Upload | Chunked upload with automatic allocation |
| File Download | Reassembly from distributed chunks |
| Chunk Replication | Configurable replication factor (default: 2) |
| Heartbeat System | Bidirectional streaming for health monitoring |
| Dead Server Detection | Automatic detection and replication triggering |
| Replica Placement | Storage-aware server selection |
| Structured Logging | slog-based logging with levels |
| Client SDK | High-level API for file operations |
| CLI Tool | Command-line interface for put/get operations |
- Write pipeline (chain replication)
- Lease management for primary writes
- Checksum verification
- Graceful shutdown handling
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Client β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β DFSClient SDK β β
β β βββββββββββββ ββββββββββββββ ββββββββββββββββββββββ β β
β β β Uploader β β Downloader β β MasterClient β β β
β β βββββββββββββ ββββββββββββββ ββββββββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
β gRPC
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Master Server β
β βββββββββββββββ ββββββββββββββββ ββββββββββββββββββββββββββ β
β β Metadata β β Heartbeat β β Chunk Placement β β
β β Manager β β Manager β β Manager β β
β βββββββββββββββ ββββββββββββββββ ββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
β gRPC (Heartbeat + Tasks)
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Chunk Servers β
β βββββββββββββββββ βββββββββββββββββ βββββββββββββββββ β
β β ChunkServer 1 β β ChunkServer 2 β β ChunkServer N β β
β β :9001 β β :9002 β β :900N β β
β βββββββββββββββββ βββββββββββββββββ βββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
flowchart TB
classDef client fill:#fef08a,stroke:#eab308,color:#1c1917
classDef master fill:#c4b5fd,stroke:#8b5cf6,color:#1c1917
classDef chunk fill:#93c5fd,stroke:#3b82f6,color:#1c1917
Client["Client<br/><br/>CLI / SDK<br/>File operations"]:::client --> |gRPC| Master
subgraph Master["Master Server"]
MM["Metadata Manager<br/>File β Chunk mapping"]:::master
HM["Heartbeat Manager<br/>Monitor ChunkServers"]:::master
CM["Chunk Placement<br/>Replica selection"]:::master
end
subgraph CSGroup["ChunkServers"]
CS1["ChunkServer 1<br/>:9001"]:::chunk
CS2["ChunkServer 2<br/>:9002"]:::chunk
CSN["ChunkServer N<br/>:900N"]:::chunk
end
Master --> |Heartbeat| CS1
Master --> |Heartbeat| CS2
Master --> |Heartbeat| CSN
Client --> |Read/Write| CS1
Client --> |Read/Write| CS2
Client --> |Read/Write| CSN
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#c4b5fd', 'secondaryColor': '#93c5fd', 'tertiaryColor': '#fef08a', 'primaryTextColor': '#1c1917', 'lineColor': '#8b5cf6'}}}%%
sequenceDiagram
participant C as Client
participant M as Master
participant CS1 as ChunkServer 1
participant CS2 as ChunkServer 2
C->>M: AllocateChunk(filename, index)
M-->>C: ChunkID + ReplicaServers
C->>CS1: UploadChunk(stream)
CS1-->>C: Success
Note over M,CS2: Replication via heartbeat
M->>CS1: ReplicationTask
CS1->>CS2: ReplicateChunk
CS2-->>CS1: Ack
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#c4b5fd', 'secondaryColor': '#93c5fd', 'tertiaryColor': '#fef08a', 'primaryTextColor': '#1c1917', 'lineColor': '#8b5cf6'}}}%%
sequenceDiagram
participant C as Client
participant M as Master
participant CS as ChunkServer
C->>M: GetFileInfo(filename)
M-->>C: ChunkIDs + Locations
C->>CS: DownloadChunk(chunkID)
CS-->>C: ChunkData stream
dfs/
βββ cmd/ # Entry points
β βββ master/main.go # Master server binary
β βββ chunkserver/main.go # Chunk server binary
β βββ client/main.go # CLI client binary
β
βββ internal/ # Private implementation
β βββ master/ # Master server logic
β βββ chunkserver/ # Chunk server logic
β βββ client/ # Client SDK
β βββ dfsclient.go # High-level SDK
β βββ uploader/ # Upload handling
β βββ downloader/ # Download handling
β βββ masterclient/ # Master communication
β
βββ pkg/ # Shared packages
β βββ logger/ # Structured logging
β
βββ dfs/ # Generated protobuf code
β βββ masterpb/
β βββ chunkpb/
β
βββ proto/ # Protobuf definitions
# Upload a file
./bin/dfs-client put <local-file> <remote-name>
# Download a file
./bin/dfs-client get <remote-name> <local-directory>
# Specify custom master address
./bin/dfs-client -master localhost:9000 put file.txt myfile.txt./bin/dfs-chunkserver [flags]
Flags:
-port string Chunk server address (default ":9001")
-master string Master server address (default ":8000")
-dir string Storage directory (default "./data")| Parameter | Default | Description |
|---|---|---|
REPLICATION_FACTOR |
2 | Number of replicas per chunk |
CHUNK_SIZE |
64 MB | Size of each chunk |
LIVE_THRESHOLD |
30s | Server considered dead after this |
| Heartbeat Interval | 5s | Chunk server heartbeat frequency |
- Client requests chunk allocation from Master
- Master returns ChunkID and replica servers
- Client uploads directly to chunk server
- Master triggers replication via heartbeat
- Client requests file info from Master
- Master returns chunk locations
- Client downloads chunks directly from chunk servers
- Client reassembles file from chunks
Chunk servers maintain a bidirectional gRPC stream with the master:
ChunkServer β Master:
- Server address
- Free storage (MB)
- List of stored chunks
Master β ChunkServer:
- Replication tasks
- Delete tasks
# Build all
go build ./...
# Run tests
go test ./...
# Lint
go vet ./...MIT