Skip to content

bayesianinstitute/Decentralized-RAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

18 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Decentralized Retrieval-Augmented Generation (DRAG)

DRAG

This repository presents a decentralized extension of Retrieval-Augmented Generation (RAG), addressing privacy, scalability, and security challenges in traditional RAG systems using IPFS, MQTT, and blockchain technologies.

This work is grounded in our research on decentralized learning systems, particularly Decentralized Retrieval-Augmented Generation (DRAG), as presented in [3].

DRAG enables users to interact with local knowledge bases while contributing to a global shared databaseβ€”promoting knowledge democratization, trustless collaboration, and incentivized participation.


πŸ“„ Research Publication

This repository is based on our published research work:

Continuous Learning in Decentralized Retrieval-Augmented Generation (DRAG) and Data Management F. A. Khan, C. Peiper, A. Jaberzadeh, M. A. Shaikh, et al. Proceedings of the 4th Blockchain and Cryptocurrency Conference (B2C'25), 2025, pp. 45–48

πŸ”— Read the paper: https://www.researchgate.net/profile/Sergey-Yurish/publication/398276090_Blockchain_and_Cryptocurrency_B2C'_2025_Edited_by_Sergey_Y_Yurish/links/6930219f0e91876082c0d022/Blockchain-and-Cryptocurrency-B2C-2025-Edited-by-Sergey-Y-Yurish.pdf#page=46


πŸš€ DRAG Overview

DRAG enhances traditional RAG by decentralizing storage, communication, and computation layers, ensuring:

  • Privacy β†’ Secure, decentralized data storage using IPFS
  • Scalability β†’ Distributed knowledge contribution without central bottlenecks
  • Security β†’ Blockchain-backed transparency and tamper-proof records
  • Incentivization β†’ Reward mechanisms for contributors
  • Collaborative Learning β†’ Continuous improvement from distributed nodes

🧠 Key Technologies

  • IPFS β†’ Decentralized storage layer
  • MQTT β†’ Lightweight, low-latency communication protocol
  • Blockchain β†’ Trustless validation and reward system
  • Qdrant β†’ High-performance vector database for semantic retrieval

βš–οΈ Traditional RAG vs. DRAG

The core difference lies in centralization vs decentralization.

Traditional RAG

Traditional RAG Centralized architecture with a single knowledge base

DRAG

DRAG Decentralized architecture with multiple nodes contributing to a global knowledge base


πŸ—οΈ Architecture

The system consists of two primary node types:

1. Data Nodes

  • Provide domain-specific knowledge
  • Generate embeddings and contribute to the global vector database

2. Evaluator Nodes

  • Validate incoming contributions
  • Maintain data quality and integrity

πŸ”— Blockchain-Based Incentive Layer

  • Records contributions transparently
  • Rewards high-quality data providers
  • Penalizes malicious or low-quality inputs

This mechanism ensures:

  • Higher data reliability
  • Sustainable ecosystem growth
  • Trustless collaboration

βš™οΈ Setup and Installation

1. Clone and Build

git clone https://github.com/bayesianinstitute/Decentralized-RAG
cd Decentralized-RAG
python setup.py sdist bdist_wheel
pip install .

2. Run Using Docker

Start all services:

docker compose up -d

Download model and run:

bash run.sh

🧩 Qdrant Setup (Vector Database)

Pull Image

docker pull qdrant/qdrant

Run Container

docker run -d -p 6333:6333 -p 6334:6334 \
    -v ./qdrant_data:/qdrant/storage \
    qdrant/qdrant

Windows:

docker run -d --name qdrant_container -p 6333:6333 -p 6334:6334 \
    -v C:/path/to/qdrant_data:/qdrant/storage \
    qdrant/qdrant:latest

πŸ€– Model Setup

Install Ollama

Follow: https://ollama.ai/

Pull LLM

ollama pull llama3:8b

Pull Embedding Model

ollama pull nomic-embed-text:latest

▢️ Running the Application

Configure Node Type

Edit main.py:

  • admin β†’ Global coordinator node
  • data β†’ Knowledge contributor node

Start Application

python main.py --data-dir data --nodetype admin

🌐 IPFS Installation

Follow official docs: https://docs.ipfs.tech/install/

Windows Example:

wget https://dist.ipfs.tech/kubo/v0.23.0/kubo_v0.23.0_windows-amd64.zip -Outfile kubo_v0.23.0.zip
Expand-Archive -Path kubo_v0.23.0.zip -DestinationPath .\kubo
cd .\kubo
.\install.bat

πŸ“Œ Conclusion

DRAG demonstrates how decentralized architectures can significantly enhance Retrieval-Augmented Generation systems by:

  • Reducing hallucination and retrieval errors
  • Preserving user privacy
  • Enabling continuous decentralized learning
  • Incentivizing high-quality knowledge contributions

By integrating blockchain, distributed storage, and vector search, DRAG provides a scalable and secure foundation for next-generation AI systems.


πŸ”— Repository

GitHub: https://github.com/bayesianinstitute/Decentralized-RAG


πŸ“š References

[1] Ollama Docker Hub: https://hub.docker.com/r/ollama/ollama [2] IPFS Documentation: https://docs.ipfs.tech

[3] F. A. Khan, C. Peiper, A. Jaberzadeh, M. A. Shaikh, et al., "Continuous learning in decentralized retrieval-augmented generation (DRAG) and data management," in Proceedings of the 4th Blockchain and Cryptocurrency Conference (B2C'25), 2025, pp. 45–48. Available: https://www.researchgate.net/profile/Sergey-Yurish/publication/398276090_Blockchain_and_Cryptocurrency_B2C'_2025_Edited_by_Sergey_Y_Yurish/links/6930219f0e91876082c0d022/Blockchain-and-Cryptocurrency-B2C-2025-Edited-by-Sergey-Y-Yurish.pdf#page=46