# RAG System Setup Guide ## Overview The Edge LLM platform now includes a simple RAG (Retrieval-Augmented Generation) system that allows you to upload documents to enhance AI responses with relevant context. ## Features - 📁 **Document Upload**: Support for PDF, TXT, DOCX, and MD files - 🔍 **Semantic Search**: Find relevant information from your documents - ⚙️ **Configurable Retrieval**: Adjust how many document chunks to use for context - 🎯 **Easy Integration**: Toggle RAG on/off in the Assistant Studio ## Installation ### Backend Dependencies Install the required Python packages: ```bash pip install -r requirements.txt ``` The RAG system requires these additional packages: - `langchain`: LangChain framework - `pypdf`: PDF processing - `python-docx`: Word document processing - `faiss-cpu`: Vector similarity search - `sentence-transformers`: Text embeddings - `unstructured`: Document parsing ### Frontend No additional frontend dependencies needed. The Documents tab is included in the main build. ## Usage ### 1. Access the Documents Tab 1. Open Assistant Studio 2. Navigate to the **Documents** tab (next to Parameters and Instructions) ### 2. Upload Documents 1. Click "Click to upload documents" in the upload area 2. Select PDF, TXT, DOCX, or MD files 3. Files will be processed and chunked automatically 4. Uploaded documents appear in the "Uploaded Documents" section ### 3. Configure RAG 1. **Enable RAG**: Toggle the "Enable RAG" switch (only available when documents are uploaded) 2. **Retrieval Count**: Adjust the slider to set how many document chunks to retrieve (1-10) - 1-3: Focused responses with minimal context - 4-7: Balanced responses with moderate context - 8-10: Comprehensive responses with extensive context ### 4. Chat with RAG Enhancement Once RAG is enabled: 1. Ask questions normally in the chat 2. The system will automatically search your uploaded documents 3. Relevant information will be added to the AI's context 4. The AI will incorporate document information into responses when relevant ## API Endpoints ### Document Management - `POST /rag/upload` - Upload multiple documents - `GET /rag/documents` - List uploaded documents - `DELETE /rag/documents/{doc_id}` - Delete a document - `POST /rag/search` - Search through documents ### Enhanced Generation The existing `/generate` endpoint now supports RAG when: - Documents are uploaded to the RAG system - The request includes RAG configuration (handled automatically by frontend) ## Technical Details ### Document Processing 1. Files are uploaded and temporarily stored 2. LangChain loaders extract text content 3. Text is split into chunks (1000 chars with 200 char overlap) 4. Chunks are embedded using `sentence-transformers/all-MiniLM-L6-v2` 5. Embeddings are stored in FAISS vector database ### RAG Pipeline 1. User query is embedded using the same model 2. Similarity search finds relevant document chunks 3. Retrieved chunks are added to the system prompt 4. AI generates response with document context ## Limitations & Notes - **Memory Storage**: Documents are stored in memory (not persistent across restarts) - **CPU Only**: Uses CPU-based embeddings for compatibility - **File Size**: Large files may take time to process - **Language**: Optimized for English content ## Troubleshooting ### "RAG system not available" Error - Ensure LangChain dependencies are installed - Check that `rag_system.py` is in the correct location - Verify embeddings model downloaded successfully ### Documents Not Uploading - Check file format (PDF, TXT, DOCX, MD supported) - Ensure file size is reasonable (<50MB recommended) - Check browser console for error messages ### Poor RAG Performance - Try adjusting retrieval count - Ensure documents contain relevant information - Check that document text was extracted correctly ## Future Improvements - Persistent vector storage (ChromaDB, Pinecone) - GPU acceleration for embeddings - More document formats (PPT, HTML, etc.) - Advanced chunking strategies - Custom embedding models - Query expansion and reranking