This project is a Retrieval Augmented Generation (RAG) as a Service backend. It's built with Node.js, Express.js, and TypeScript. MongoDB (via Mongoose) is used for data persistence. The service is designed to manage collections, resources (documents), and chunks of information, facilitating RAG operations. It integrates with external services for tasks like data crawling, vector embeddings (Langchain), and potentially agent-based processing.
The core entities are:
- Collection: A logical grouping of resources. Each collection can have its own settings, such as the encoder to use for vectorizing its resources, chunk size, and chunk overlap.
- Resource: A document or piece of content within a collection. Resources are broken down into multiple chunks for processing.
- Chunk: Smaller, digestible parts of a resource, used for vector storage and retrieval.
The project uses TypeScript and ts-node-dev for development.
To run the project in development mode (with live reloading):
npm run devTo compile the TypeScript code to JavaScript:
npm run buildTo start the compiled JavaScript application:
npm startTo run unit tests:
npm testTo run tests in watch mode:
npm run test:watchTo run a specific RAG synchronization job:
npm run rag-sync- Language: TypeScript
- Framework: Express.js
- ORM/ODM: Mongoose (for MongoDB)
- Project Structure: Follows a typical Node.js/Express project structure with separate directories for
config,consumer,error,job,middleware,models,route,service,type, andutility. - Authentication: API key based authentication is used for routes.
- Environment Variables: Uses
dotenvfor managing environment variables.QDRANT_URL: The URL for the Qdrant service.QDRANT_API_KEY: The API key for authenticating with Qdrant.QDRANT_COLLECTION_NAME: The name of the collection to use in Qdrant.
- Logging: Uses
winstonfor logging. - Queueing: Uses
amqplibfor RabbitMQ integration. - HTTP Client: Uses
axios. - Web Scraping: Uses
puppeteer-extraandcheerio. - Vector Embeddings/LLM Integration: Uses
langchainandopenailibraries.
- Implement additional features related to RAG (e.g., actual embedding generation, retrieval logic).
- Add comprehensive tests for the newly created services and routes.