--- title: OSINT Investigation Assistant emoji: 🔍 colorFrom: blue colorTo: purple sdk: gradio sdk_version: 5.49.1 app_file: app.py pinned: false short_description: RAG-powered OSINT investigation assistant with 344+ tools license: mit --- # 🔍 OSINT Investigation Assistant A RAG-powered AI assistant that helps investigators develop structured methodologies for open-source intelligence (OSINT) investigations. Built with LangChain, Supabase PGVector, and Hugging Face Inference Providers. ## ✨ Features - **🎯 Structured Methodologies**: Generate step-by-step investigation plans tailored to your query - **🛠️ 344+ OSINT Tools**: Access recommendations from a comprehensive database of curated OSINT tools - **🔍 Context-Aware Retrieval**: Semantic search finds the most relevant tools for your investigation - **🚀 API Access**: Built-in REST API for integration with external applications - **💬 Chat Interface**: User-friendly conversational interface - **🔌 MCP Support**: Can be extended to work with AI agents via MCP protocol ## 🏗️ Architecture ``` ┌──────────────────────────────────────┐ │ Gradio UI + API Endpoints │ └──────────────┬───────────────────────┘ │ ┌──────────────▼───────────────────────┐ │ LangChain RAG Pipeline │ │ • Query Understanding │ │ • Tool Retrieval (PGVector) │ │ • Response Generation (LLM) │ └──────────────┬───────────────────────┘ │ ┌──────────┴──────────┐ │ │ ┌───▼───────────┐ ┌─────▼────────────┐ │ Supabase │ │ HF Inference │ │ PGVector DB │ │ Providers │ │ (344 tools) │ │ (Llama 3.1) │ └───────────────┘ └──────────────────┘ ``` ## 🚀 Quick Start ### Local Development 1. **Clone the repository** ```bash git clone cd osint-llm ``` 2. **Install dependencies** ```bash pip install -r requirements.txt ``` 3. **Set up environment variables** ```bash cp .env.example .env # Edit .env with your credentials ``` Required variables: - `SUPABASE_CONNECTION_STRING`: Your Supabase PostgreSQL connection string - `HF_TOKEN`: Your Hugging Face API token 4. **Run the application** ```bash python app.py ``` The app will be available at `http://localhost:7860` ### Hugging Face Spaces Deployment 1. **Create a new Space** on Hugging Face 2. **Push this repository** to your Space 3. **Set environment variables** in Space settings: - `SUPABASE_CONNECTION_STRING` - `HF_TOKEN` 4. **Deploy** - The Space will automatically build and launch ## 📚 Usage ### Chat Interface Simply ask your investigation questions: ``` "How do I investigate a suspicious domain?" "What tools can I use to verify an image's authenticity?" "How can I trace the origin of a social media account?" ``` The assistant will provide: 1. Investigation overview 2. Step-by-step methodology 3. Recommended tools with descriptions and URLs 4. Best practices and safety considerations 5. Expected outcomes ### Tool Search Use the "Tool Search" tab to directly search for OSINT tools by category or purpose. ### API Access This app automatically exposes REST API endpoints for external integration. **Python Client:** ```python from gradio_client import Client client = Client("your-space-url") result = client.predict( "How do I investigate a domain?", api_name="/investigate" ) print(result) ``` **JavaScript Client:** ```javascript import { Client } from "@gradio/client"; const client = await Client.connect("your-space-url"); const result = await client.predict("/investigate", { message: "How do I investigate a domain?" }); console.log(result.data); ``` **cURL:** ```bash curl -X POST "https://your-space.hf.space/call/investigate" \ -H "Content-Type: application/json" \ -d '{"data": ["How do I investigate a domain?"]}' ``` **Available Endpoints:** - `/call/investigate` - Main investigation assistant - `/call/search_tools` - Direct tool search - `/gradio_api/openapi.json` - OpenAPI specification ## 🗄️ Database The app uses Supabase with PGVector extension to store and retrieve OSINT tools. **Database Schema:** ```sql CREATE TABLE bellingcat_tools ( id BIGINT PRIMARY KEY, name TEXT, category TEXT, content TEXT, url TEXT, cost TEXT, details TEXT, embedding VECTOR, created_at TIMESTAMP WITH TIME ZONE ); ``` **Tool Categories:** - Archiving & Preservation - Social Media Investigation - Image & Video Analysis - Domain & Network Investigation - Geolocation - Data Extraction - Verification & Fact-Checking - And more... ## 🛠️ Technology Stack - **UI/API**: [Gradio](https://gradio.app/) - Automatic API generation - **RAG Framework**: [LangChain](https://langchain.com/) - Retrieval pipeline - **Vector Database**: [Supabase](https://supabase.com/) with PGVector extension - **Embeddings**: HuggingFace sentence-transformers - **LLM**: [Hugging Face Inference Providers](https://huggingface.co/docs/inference-providers/) - Llama 3.1 - **Language**: Python 3.9+ ## 📁 Project Structure ``` osint-llm/ ├── app.py # Main Gradio application ├── requirements.txt # Python dependencies ├── .env.example # Environment variables template ├── README.md # This file └── src/ ├── __init__.py ├── vectorstore.py # Supabase PGVector connection ├── rag_pipeline.py # LangChain RAG logic ├── llm_client.py # Inference Provider client └── prompts.py # Investigation prompt templates ``` ## ⚙️ Configuration ### Environment Variables See `.env.example` for all available configuration options. **Required:** - `SUPABASE_CONNECTION_STRING` - PostgreSQL connection string - `HF_TOKEN` - Hugging Face API token **Optional:** - `LLM_MODEL` - Model to use (default: meta-llama/Llama-3.1-8B-Instruct) - `LLM_TEMPERATURE` - Generation temperature (default: 0.7) - `LLM_MAX_TOKENS` - Max tokens to generate (default: 2000) - `RETRIEVAL_K` - Number of tools to retrieve (default: 5) - `EMBEDDING_MODEL` - Embedding model (default: sentence-transformers/all-MiniLM-L6-v2) ### Supported LLM Models - `meta-llama/Llama-3.1-8B-Instruct` (recommended) - `meta-llama/Meta-Llama-3-8B-Instruct` - `Qwen/Qwen2.5-72B-Instruct` - `mistralai/Mistral-7B-Instruct-v0.3` ## 💰 Cost Considerations ### Hugging Face Inference Providers - Free tier: $0.10/month credits - PRO tier: $2.00/month credits + pay-as-you-go - Typical cost: ~$0.001-0.01 per query - Recommended budget: $10-50/month for moderate usage ### Supabase - Free tier sufficient for most use cases - PGVector operations are standard database queries ### Hugging Face Spaces - Free CPU hosting available - GPU upgrade: ~$0.60/hour (optional, not required) ## 🔮 Future Enhancements - [ ] MCP server integration for AI agent tool use - [ ] Multi-turn conversation with memory - [ ] User authentication and query logging - [ ] Additional tool databases and sources - [ ] Export methodologies as PDF/markdown - [ ] Tool usage examples and tutorials - [ ] Community-contributed tool reviews ## 🤝 Contributing Contributions are welcome! Please feel free to submit issues or pull requests. ## 📄 License MIT License - See LICENSE file for details ## 🙏 Acknowledgments - Tool data sourced from [Bellingcat's Online Investigation Toolkit](https://www.bellingcat.com/) - Built with support from the OSINT community ## 📞 Support For issues or questions: - Open an issue on GitHub - Check the [Hugging Face Spaces documentation](https://huggingface.co/docs/hub/spaces) - Review the [Gradio documentation](https://gradio.app/docs/) --- Built with ❤️ for the OSINT community