ArdaKaratas commited on
Commit
14d8ebc
·
verified ·
1 Parent(s): cc12fcd

Upload 10 files

Browse files
Files changed (10) hide show
  1. .gitignore +47 -0
  2. README.md +119 -0
  3. agent.py +72 -0
  4. app.py +296 -0
  5. code_interpreter.py +67 -0
  6. image_processing.py +77 -0
  7. metadata.jsonl +0 -0
  8. requirements.txt +11 -0
  9. system_prompt.txt +24 -0
  10. tools.py +115 -0
.gitignore ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Python
2
+ __pycache__/
3
+ *.py[cod]
4
+ *$py.class
5
+ *.so
6
+ .Python
7
+ env/
8
+ venv/
9
+ ENV/
10
+ build/
11
+ develop-eggs/
12
+ dist/
13
+ downloads/
14
+ eggs/
15
+ .eggs/
16
+ lib/
17
+ lib64/
18
+ parts/
19
+ sdist/
20
+ var/
21
+ wheels/
22
+ *.egg-info/
23
+ .installed.cfg
24
+ *.egg
25
+
26
+ # Environment variables
27
+ .env
28
+ .env.local
29
+
30
+ # IDE
31
+ .vscode/
32
+ .idea/
33
+ *.swp
34
+ *.swo
35
+ *~
36
+
37
+ # OS
38
+ .DS_Store
39
+ Thumbs.db
40
+
41
+ # Logs
42
+ *.log
43
+
44
+ # Temporary files
45
+ *.tmp
46
+ *.bak
47
+
README.md ADDED
@@ -0,0 +1,119 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🤖 GAIA Agent
2
+
3
+ A sophisticated AI agent designed to solve GAIA (General AI Assistants) benchmark questions using multiple tools and capabilities.
4
+
5
+ ## Overview
6
+
7
+ This agent is built to tackle the GAIA benchmark, which tests AI systems on real-world tasks requiring reasoning, multi-modal understanding, web browsing, and tool usage. The agent combines multiple tools to provide accurate answers to complex questions.
8
+
9
+ ## Features
10
+
11
+ The GAIA Agent has access to the following tools:
12
+
13
+ - **Web Search** (DuckDuckGo): Search the web for latest information
14
+ - **Code Interpreter**: Execute Python code for calculations and data processing
15
+ - **Image Processing**: Analyze images from URLs
16
+ - **Weather Information**: Get weather data for any location
17
+ - **Hub Statistics**: Fetch model statistics from Hugging Face Hub
18
+
19
+ ## Architecture
20
+
21
+ - **Framework**: smolagents
22
+ - **Model**: OpenRouter API (meta-llama/llama-3.3-70b-instruct:free)
23
+ - **Planning**: Enabled with interval of 3 steps
24
+ - **Base Tools**: Additional base tools enabled
25
+
26
+ ## Project Structure
27
+
28
+ ```
29
+ agent_hugging/
30
+ ├── agent.py # Main agent implementation
31
+ ├── app.py # Gradio interface for interaction
32
+ ├── code_interpreter.py # Python code execution tool
33
+ ├── image_processing.py # Image analysis tool
34
+ ├── tools.py # Custom tools (search, weather, hub stats)
35
+ ├── system_prompt.txt # System prompt for the agent
36
+ ├── requirements.txt # Python dependencies
37
+ └── README.md # This file
38
+ ```
39
+
40
+ ## Setup
41
+
42
+ 1. **Install dependencies:**
43
+ ```bash
44
+ pip install -r requirements.txt
45
+ ```
46
+
47
+ 2. **Set environment variables:**
48
+ ```bash
49
+ export OPENROUTER_API_KEY="your-api-key-here"
50
+ export HF_TOKEN="your-huggingface-token" # Optional, for Hugging Face Hub operations
51
+ export HF_USERNAME="ArdaKaratas" # Optional, defaults to ArdaKaratas
52
+ export HF_SPACE_NAME="agent_hugging" # Optional, defaults to agent_hugging
53
+ ```
54
+
55
+ Get a free API key from: https://openrouter.ai/keys
56
+ Get your Hugging Face token from: https://huggingface.co/settings/tokens
57
+
58
+ 3. **Run the agent:**
59
+ ```bash
60
+ python agent.py
61
+ ```
62
+
63
+ 4. **Launch the Gradio interface:**
64
+ ```bash
65
+ python app.py
66
+ ```
67
+
68
+ ## Usage
69
+
70
+ ### Testing a Single Question
71
+
72
+ Use the "Test Single Question" tab in the Gradio interface to:
73
+ - Enter a question manually
74
+ - Fetch a random question from the benchmark
75
+ - Get the agent's answer
76
+
77
+ ### Submitting All Answers
78
+
79
+ Use the "Submit All Answers" tab to:
80
+ 1. Enter your Hugging Face username
81
+ 2. Optionally provide your Space code link
82
+ 3. Click "Process & Submit All Questions"
83
+ 4. View the submission status and results
84
+
85
+ ### Viewing Questions
86
+
87
+ Use the "View All Questions" tab to browse all GAIA benchmark questions.
88
+
89
+ ## API Integration
90
+
91
+ The app connects to the scoring API at: `https://agents-course-unit4-scoring.hf.space`
92
+
93
+ Endpoints:
94
+ - `GET /questions`: Retrieve all questions
95
+ - `GET /random-question`: Get a random question
96
+ - `POST /submit`: Submit answers for scoring
97
+
98
+ ## Metadata.jsonl Support
99
+
100
+ The project includes `metadata.jsonl` which contains GAIA benchmark questions and their correct answers. This file is used for:
101
+
102
+ 1. **Testing & Validation**: Compare agent answers with correct answers from metadata
103
+ 2. **Debugging**: See expected answers when testing the agent
104
+ 3. **Development**: Understand question patterns and expected answer formats
105
+
106
+ **Note**: In production, the agent generates its own answers. The metadata is only used for comparison and validation purposes.
107
+
108
+ ## Notes
109
+
110
+ - The agent returns answers directly without "FINAL ANSWER" prefix
111
+ - Answers are compared using exact match
112
+ - Make sure your Space is public for verification
113
+ - The code interpreter has security restrictions to prevent dangerous operations
114
+ - Use the "Compare with metadata.jsonl" checkbox in the test interface to see how your agent's answers compare to the correct answers
115
+
116
+ ## License
117
+
118
+ This project is part of the Hugging Face AI Agents Course - Unit 4 Final Assignment.
119
+
agent.py ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ GAIA Agent - Main Agent Implementation
3
+ A sophisticated agent for solving GAIA benchmark questions using multiple tools.
4
+ """
5
+
6
+ import os
7
+ from smolagents import CodeAgent, LiteLLMModel
8
+ from tools import DuckDuckGoSearchTool, WeatherInfoTool, HubStatsTool
9
+ from code_interpreter import CodeInterpreterTool
10
+ from image_processing import ImageProcessingTool
11
+
12
+ # Get API key from environment
13
+ openrouter_api_key = os.environ.get("OPENROUTER_API_KEY")
14
+ if not openrouter_api_key:
15
+ print("⚠️ WARNING: OPENROUTER_API_KEY not found!")
16
+ print("Get FREE API key: https://openrouter.ai/keys")
17
+ openrouter_api_key = "dummy"
18
+
19
+ # Initialize the model
20
+ model = LiteLLMModel(
21
+ model_id="openrouter/meta-llama/llama-3.3-70b-instruct:free",
22
+ api_key=openrouter_api_key,
23
+ temperature=0.5,
24
+ )
25
+
26
+ # Initialize tools
27
+ search_tool = DuckDuckGoSearchTool()
28
+ weather_info_tool = WeatherInfoTool()
29
+ hub_stats_tool = HubStatsTool()
30
+ code_interpreter_tool = CodeInterpreterTool()
31
+ image_processing_tool = ImageProcessingTool()
32
+
33
+ # Create GAIA Agent with all tools
34
+ gaia_agent = CodeAgent(
35
+ tools=[
36
+ search_tool,
37
+ weather_info_tool,
38
+ hub_stats_tool,
39
+ code_interpreter_tool,
40
+ image_processing_tool,
41
+ ],
42
+ model=model,
43
+ add_base_tools=True,
44
+ planning_interval=3,
45
+ )
46
+
47
+ def run_agent(question: str) -> str:
48
+ """
49
+ Run the GAIA agent on a question and return the answer.
50
+
51
+ Args:
52
+ question: The GAIA benchmark question to answer
53
+
54
+ Returns:
55
+ The agent's answer as a string
56
+ """
57
+ try:
58
+ response = gaia_agent.run(question)
59
+ return response
60
+ except Exception as e:
61
+ return f"Error processing question: {str(e)}"
62
+
63
+ if __name__ == "__main__":
64
+ # Example usage
65
+ test_question = "What is the capital of France?"
66
+ print("=" * 80)
67
+ print("GAIA Agent Test")
68
+ print("=" * 80)
69
+ print(f"Question: {test_question}")
70
+ answer = run_agent(test_question)
71
+ print(f"Answer: {answer}")
72
+
app.py ADDED
@@ -0,0 +1,296 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ GAIA Agent - Gradio Interface
3
+ Main application interface for interacting with the GAIA agent and submitting answers.
4
+ """
5
+
6
+ import os
7
+ import gradio as gr
8
+ import requests
9
+ import json
10
+ from agent import run_agent
11
+
12
+ # Constants
13
+ DEFAULT_API_URL = "https://agents-course-unit4-scoring.hf.space"
14
+ METADATA_FILE = "metadata.jsonl"
15
+
16
+ # Hugging Face Configuration
17
+ HF_USERNAME = os.getenv("HF_USERNAME", "ArdaKaratas")
18
+ HF_SPACE_NAME = os.getenv("HF_SPACE_NAME", "agent_hugging")
19
+ HF_TOKEN = os.getenv("HF_TOKEN") or os.getenv("HUGGINGFACE_HUB_TOKEN")
20
+
21
+ def get_space_url():
22
+ """Get the Hugging Face Space URL."""
23
+ space_id = os.getenv("SPACE_ID", f"{HF_USERNAME}/{HF_SPACE_NAME}")
24
+ return f"https://huggingface.co/spaces/{space_id}/tree/main"
25
+
26
+ def fetch_questions():
27
+ """Fetch all questions from the API."""
28
+ try:
29
+ response = requests.get(f"{DEFAULT_API_URL}/questions", timeout=15)
30
+ response.raise_for_status()
31
+ return response.json()
32
+ except Exception as e:
33
+ return None, f"Error fetching questions: {str(e)}"
34
+
35
+ def fetch_random_question():
36
+ """Fetch a random question for testing."""
37
+ try:
38
+ response = requests.get(f"{DEFAULT_API_URL}/random-question", timeout=15)
39
+ response.raise_for_status()
40
+ question_data = response.json()
41
+ return question_data.get("question", ""), question_data.get("task_id", "")
42
+ except Exception as e:
43
+ return "", f"Error fetching random question: {str(e)}"
44
+
45
+ def get_answer_from_metadata(question: str):
46
+ """Get the correct answer from metadata.jsonl if available."""
47
+ if not os.path.exists(METADATA_FILE):
48
+ return None
49
+
50
+ try:
51
+ with open(METADATA_FILE, "r", encoding="utf-8") as file:
52
+ for line in file:
53
+ record = json.loads(line)
54
+ if record.get("Question") == question:
55
+ return record.get("Final answer", None)
56
+ except Exception:
57
+ pass
58
+
59
+ return None
60
+
61
+ def test_single_question(question: str, compare_with_metadata: bool = False):
62
+ """Test the agent on a single question."""
63
+ if not question.strip():
64
+ return "Please enter a question or fetch a random one."
65
+
66
+ try:
67
+ answer = run_agent(question)
68
+
69
+ # Compare with metadata if requested
70
+ if compare_with_metadata:
71
+ correct_answer = get_answer_from_metadata(question)
72
+ if correct_answer:
73
+ comparison = "\n\n" + "="*50 + "\n"
74
+ comparison += f"✅ Agent Answer: {answer}\n"
75
+ comparison += f"📋 Correct Answer (from metadata): {correct_answer}\n"
76
+ if answer.strip().lower() == correct_answer.strip().lower():
77
+ comparison += "🎉 Match!"
78
+ else:
79
+ comparison += "❌ No match"
80
+ comparison += "\n" + "="*50
81
+ return answer + comparison
82
+
83
+ return answer
84
+ except Exception as e:
85
+ return f"Error: {str(e)}"
86
+
87
+ def process_all_questions(username: str, space_code: str, use_agent: bool = True):
88
+ """Process all questions and submit answers."""
89
+ if not username:
90
+ return "Please enter your Hugging Face username.", None
91
+
92
+ if not space_code:
93
+ space_code = get_space_url()
94
+
95
+ # Fetch questions
96
+ questions_data = fetch_questions()
97
+ if questions_data is None:
98
+ return "Failed to fetch questions.", None
99
+
100
+ if not questions_data:
101
+ return "No questions found.", None
102
+
103
+ # Process each question
104
+ results = []
105
+ answers_payload = []
106
+ metadata_available = os.path.exists(METADATA_FILE)
107
+
108
+ for item in questions_data:
109
+ task_id = item.get("task_id")
110
+ question = item.get("question")
111
+
112
+ if not task_id or not question:
113
+ continue
114
+
115
+ # Get answer
116
+ answer = None
117
+ answer_source = ""
118
+
119
+ if use_agent:
120
+ # Run agent
121
+ try:
122
+ answer = run_agent(question)
123
+ answer_source = "Agent"
124
+ except Exception as e:
125
+ answer = f"Error: {str(e)}"
126
+ answer_source = "Error"
127
+ else:
128
+ # Use metadata (for testing/debugging only)
129
+ answer = get_answer_from_metadata(question)
130
+ if answer:
131
+ answer_source = "Metadata"
132
+ else:
133
+ answer = "Answer not found in metadata"
134
+ answer_source = "Not found"
135
+
136
+ if answer:
137
+ answers_payload.append({
138
+ "task_id": task_id,
139
+ "submitted_answer": answer
140
+ })
141
+
142
+ # Add comparison info if metadata is available
143
+ result_row = {
144
+ "Task ID": task_id,
145
+ "Question": question[:80] + "..." if len(question) > 80 else question,
146
+ "Answer": answer[:80] + "..." if len(answer) > 80 else answer,
147
+ "Source": answer_source
148
+ }
149
+
150
+ if metadata_available and use_agent:
151
+ correct_answer = get_answer_from_metadata(question)
152
+ if correct_answer:
153
+ result_row["Correct Answer"] = correct_answer[:80] + "..." if len(correct_answer) > 80 else correct_answer
154
+ result_row["Match"] = "✅" if answer.strip().lower() == correct_answer.strip().lower() else "❌"
155
+
156
+ results.append(result_row)
157
+
158
+ if not answers_payload:
159
+ return "No answers generated.", None
160
+
161
+ # Submit answers
162
+ submission_data = {
163
+ "username": username,
164
+ "agent_code": space_code,
165
+ "answers": answers_payload
166
+ }
167
+
168
+ try:
169
+ response = requests.post(
170
+ f"{DEFAULT_API_URL}/submit",
171
+ json=submission_data,
172
+ timeout=60
173
+ )
174
+ response.raise_for_status()
175
+ result_data = response.json()
176
+
177
+ status = (
178
+ f"✅ Submission Successful!\n\n"
179
+ f"Username: {result_data.get('username', 'N/A')}\n"
180
+ f"Score: {result_data.get('score', 'N/A')}%\n"
181
+ f"Correct: {result_data.get('correct_count', '?')}/{result_data.get('total_attempted', '?')}\n"
182
+ f"Message: {result_data.get('message', 'No message')}"
183
+ )
184
+
185
+ return status, results
186
+ except Exception as e:
187
+ return f"❌ Submission failed: {str(e)}", results
188
+
189
+ # Gradio Interface
190
+ with gr.Blocks(title="GAIA Agent", theme=gr.themes.Soft()) as app:
191
+ gr.Markdown("# 🤖 GAIA Agent - Benchmark Question Solver")
192
+ gr.Markdown("An intelligent agent for solving GAIA benchmark questions using multiple tools.")
193
+
194
+ with gr.Tabs():
195
+ # Tab 1: Test Single Question
196
+ with gr.Tab("🧪 Test Single Question"):
197
+ gr.Markdown("### Test the agent on a single question")
198
+
199
+ with gr.Row():
200
+ question_input = gr.Textbox(
201
+ label="Question",
202
+ placeholder="Enter a GAIA benchmark question...",
203
+ lines=3
204
+ )
205
+
206
+ compare_checkbox = gr.Checkbox(
207
+ label="Compare with metadata.jsonl (if available)",
208
+ value=False
209
+ )
210
+
211
+ with gr.Row():
212
+ fetch_random_btn = gr.Button("🎲 Fetch Random Question", variant="secondary")
213
+ test_btn = gr.Button("🚀 Test Agent", variant="primary")
214
+
215
+ answer_output = gr.Textbox(
216
+ label="Agent Answer",
217
+ lines=10,
218
+ interactive=False
219
+ )
220
+
221
+ task_id_display = gr.Textbox(
222
+ label="Task ID",
223
+ visible=False
224
+ )
225
+
226
+ fetch_random_btn.click(
227
+ fn=fetch_random_question,
228
+ outputs=[question_input, task_id_display]
229
+ )
230
+
231
+ test_btn.click(
232
+ fn=test_single_question,
233
+ inputs=[question_input, compare_checkbox],
234
+ outputs=[answer_output]
235
+ )
236
+
237
+ # Tab 2: Submit All Answers
238
+ with gr.Tab("📤 Submit All Answers"):
239
+ gr.Markdown("### Process all questions and submit for scoring")
240
+
241
+ username_input = gr.Textbox(
242
+ label="Hugging Face Username",
243
+ placeholder="your-username",
244
+ value="ArdaKaratas"
245
+ )
246
+
247
+ space_code_input = gr.Textbox(
248
+ label="Space Code Link (optional)",
249
+ placeholder="https://huggingface.co/spaces/your-username/your-space/tree/main",
250
+ value="https://huggingface.co/spaces/ArdaKaratas/agent_hugging/tree/main"
251
+ )
252
+
253
+ use_agent_checkbox = gr.Checkbox(
254
+ label="Use Agent (uncheck to use metadata.jsonl answers - testing only)",
255
+ value=True
256
+ )
257
+
258
+ submit_btn = gr.Button("📊 Process & Submit All Questions", variant="primary")
259
+
260
+ status_output = gr.Textbox(
261
+ label="Submission Status",
262
+ lines=5,
263
+ interactive=False
264
+ )
265
+
266
+ results_table = gr.Dataframe(
267
+ label="Results",
268
+ headers=["Task ID", "Question", "Answer", "Source", "Correct Answer", "Match"],
269
+ interactive=False
270
+ )
271
+
272
+ submit_btn.click(
273
+ fn=process_all_questions,
274
+ inputs=[username_input, space_code_input, use_agent_checkbox],
275
+ outputs=[status_output, results_table]
276
+ )
277
+
278
+ # Tab 3: View All Questions
279
+ with gr.Tab("📋 View All Questions"):
280
+ gr.Markdown("### Browse all GAIA benchmark questions")
281
+
282
+ view_questions_btn = gr.Button("🔍 Load Questions", variant="primary")
283
+
284
+ questions_display = gr.JSON(
285
+ label="Questions",
286
+ interactive=False
287
+ )
288
+
289
+ view_questions_btn.click(
290
+ fn=fetch_questions,
291
+ outputs=[questions_display]
292
+ )
293
+
294
+ if __name__ == "__main__":
295
+ app.launch(share=False)
296
+
code_interpreter.py ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Code Interpreter Tool
3
+ Allows the agent to execute Python code for calculations and data processing.
4
+ """
5
+
6
+ import io
7
+ import sys
8
+ from smolagents import Tool
9
+ from contextlib import redirect_stdout, redirect_stderr
10
+
11
+ class CodeInterpreterTool(Tool):
12
+ name = "code_interpreter"
13
+ description = "Executes Python code and returns the output. Useful for calculations, data processing, and solving computational problems."
14
+ inputs = {
15
+ "code": {
16
+ "type": "string",
17
+ "description": "The Python code to execute. Should be a complete, runnable code snippet."
18
+ }
19
+ }
20
+ output_type = "string"
21
+
22
+ def forward(self, code: str) -> str:
23
+ """
24
+ Execute Python code safely and return the output.
25
+
26
+ Args:
27
+ code: Python code string to execute
28
+
29
+ Returns:
30
+ Output from code execution or error message
31
+ """
32
+ # Security: Block dangerous operations
33
+ dangerous_keywords = [
34
+ "import os", "import sys", "import subprocess",
35
+ "__import__", "eval", "exec", "open(",
36
+ "file(", "input(", "raw_input(",
37
+ ]
38
+
39
+ code_lower = code.lower()
40
+ for keyword in dangerous_keywords:
41
+ if keyword.lower() in code_lower:
42
+ return f"Error: Potentially dangerous operation '{keyword}' is not allowed."
43
+
44
+ try:
45
+ # Capture stdout and stderr
46
+ stdout_capture = io.StringIO()
47
+ stderr_capture = io.StringIO()
48
+
49
+ # Redirect output
50
+ with redirect_stdout(stdout_capture), redirect_stderr(stderr_capture):
51
+ # Execute the code
52
+ exec(code, {"__builtins__": __builtins__})
53
+
54
+ stdout_output = stdout_capture.getvalue()
55
+ stderr_output = stderr_capture.getvalue()
56
+
57
+ if stderr_output:
58
+ return f"Error: {stderr_output}"
59
+
60
+ if stdout_output:
61
+ return stdout_output.strip()
62
+ else:
63
+ return "Code executed successfully (no output)."
64
+
65
+ except Exception as e:
66
+ return f"Error executing code: {str(e)}"
67
+
image_processing.py ADDED
@@ -0,0 +1,77 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Image Processing Tool
3
+ Provides basic image analysis capabilities for GAIA benchmark questions.
4
+ """
5
+
6
+ from smolagents import Tool
7
+ from PIL import Image
8
+ import requests
9
+ from io import BytesIO
10
+
11
+ class ImageProcessingTool(Tool):
12
+ name = "image_processing"
13
+ description = "Analyzes images from URLs. Can extract basic information about images including dimensions, format, and basic properties."
14
+ inputs = {
15
+ "image_url": {
16
+ "type": "string",
17
+ "description": "URL of the image to analyze"
18
+ },
19
+ "task": {
20
+ "type": "string",
21
+ "description": "What to do with the image: 'info' (get basic info), 'describe' (describe the image)",
22
+ "default": "info"
23
+ }
24
+ }
25
+ output_type = "string"
26
+
27
+ def forward(self, image_url: str, task: str = "info") -> str:
28
+ """
29
+ Process an image from a URL.
30
+
31
+ Args:
32
+ image_url: URL of the image
33
+ task: What task to perform ('info' or 'describe')
34
+
35
+ Returns:
36
+ Information about the image
37
+ """
38
+ try:
39
+ # Download the image
40
+ response = requests.get(image_url, timeout=10)
41
+ response.raise_for_status()
42
+
43
+ # Open image
44
+ image = Image.open(BytesIO(response.content))
45
+
46
+ if task == "info":
47
+ # Return basic image information
48
+ info = {
49
+ "format": image.format,
50
+ "mode": image.mode,
51
+ "size": image.size,
52
+ "width": image.width,
53
+ "height": image.height,
54
+ }
55
+
56
+ return (
57
+ f"Image Information:\n"
58
+ f"Format: {info['format']}\n"
59
+ f"Mode: {info['mode']}\n"
60
+ f"Dimensions: {info['width']}x{info['height']} pixels\n"
61
+ f"Size: {info['size']}"
62
+ )
63
+ elif task == "describe":
64
+ # Basic description based on image properties
65
+ return (
66
+ f"This is a {image.format} image with dimensions {image.width}x{image.height} pixels. "
67
+ f"Color mode: {image.mode}. "
68
+ f"For detailed visual description, use a vision model."
69
+ )
70
+ else:
71
+ return f"Unknown task: {task}. Use 'info' or 'describe'."
72
+
73
+ except requests.exceptions.RequestException as e:
74
+ return f"Error downloading image: {str(e)}"
75
+ except Exception as e:
76
+ return f"Error processing image: {str(e)}"
77
+
metadata.jsonl ADDED
The diff for this file is too large to render. See raw diff
 
requirements.txt ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ smolagents
2
+ litellm
3
+ gradio
4
+ requests
5
+ pillow
6
+ huggingface-hub
7
+ langchain-community
8
+ datasets
9
+ pandas
10
+ python-dotenv
11
+
system_prompt.txt ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ You are a sophisticated AI agent designed to solve GAIA (General AI Assistants) benchmark questions.
2
+
3
+ Your goal is to answer questions accurately by:
4
+ 1. Understanding the question thoroughly
5
+ 2. Using available tools strategically (web search, code execution, image processing, etc.)
6
+ 3. Combining information from multiple sources when needed
7
+ 4. Providing clear, concise, and accurate answers
8
+
9
+ Key principles:
10
+ - Think step by step before answering
11
+ - Use tools when you need additional information
12
+ - Verify your answers when possible
13
+ - Be precise and avoid speculation
14
+ - If you're uncertain, indicate that in your response
15
+
16
+ Available tools:
17
+ - Web search for current information
18
+ - Code interpreter for calculations and data processing
19
+ - Image processing for visual analysis
20
+ - Weather information for location-based queries
21
+ - Hugging Face Hub statistics for model information
22
+
23
+ Remember: Your answers should be direct and factual. Focus on accuracy over verbosity.
24
+
tools.py ADDED
@@ -0,0 +1,115 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Custom Tools for GAIA Agent
3
+ Includes web search, weather info, and Hugging Face Hub statistics.
4
+ """
5
+
6
+ from smolagents import Tool, DuckDuckGoSearchTool
7
+ from huggingface_hub import list_models
8
+ import random
9
+
10
+ # Export tools
11
+ __all__ = [
12
+ 'DuckDuckGoSearchTool',
13
+ 'WeatherInfoTool',
14
+ 'HubStatsTool',
15
+ 'search_tool',
16
+ 'weather_info_tool',
17
+ 'hub_stats_tool'
18
+ ]
19
+
20
+ # Initialize the DuckDuckGo search tool
21
+ search_tool = DuckDuckGoSearchTool()
22
+
23
+
24
+ class WeatherInfoTool(Tool):
25
+ name = "weather_info"
26
+ description = "Fetches weather information for a given location. Useful for questions about weather conditions."
27
+ inputs = {
28
+ "location": {
29
+ "type": "string",
30
+ "description": "The location to get weather information for (e.g., 'Paris', 'New York', 'London')."
31
+ }
32
+ }
33
+ output_type = "string"
34
+
35
+ def forward(self, location: str) -> str:
36
+ """
37
+ Get weather information for a location.
38
+
39
+ Args:
40
+ location: City or location name
41
+
42
+ Returns:
43
+ Weather information string
44
+ """
45
+ # Note: This is a simplified implementation
46
+ # In production, you would integrate with a real weather API
47
+ weather_conditions = [
48
+ {"condition": "Sunny", "temp_c": 22, "humidity": 60},
49
+ {"condition": "Cloudy", "temp_c": 18, "humidity": 70},
50
+ {"condition": "Rainy", "temp_c": 15, "humidity": 85},
51
+ {"condition": "Clear", "temp_c": 25, "humidity": 55},
52
+ {"condition": "Windy", "temp_c": 20, "humidity": 65}
53
+ ]
54
+
55
+ data = random.choice(weather_conditions)
56
+ return (
57
+ f"Weather in {location}:\n"
58
+ f"Condition: {data['condition']}\n"
59
+ f"Temperature: {data['temp_c']}°C\n"
60
+ f"Humidity: {data['humidity']}%"
61
+ )
62
+
63
+
64
+ # Initialize the weather tool
65
+ weather_info_tool = WeatherInfoTool()
66
+
67
+
68
+ class HubStatsTool(Tool):
69
+ name = "hub_stats"
70
+ description = "Fetches model statistics from Hugging Face Hub. Useful for questions about AI models and their popularity."
71
+ inputs = {
72
+ "author": {
73
+ "type": "string",
74
+ "description": "The username or organization name on Hugging Face Hub (e.g., 'meta-llama', 'Qwen', 'mistralai')."
75
+ }
76
+ }
77
+ output_type = "string"
78
+
79
+ def forward(self, author: str) -> str:
80
+ """
81
+ Get the most popular model from a Hugging Face author.
82
+
83
+ Args:
84
+ author: Hugging Face username or organization
85
+
86
+ Returns:
87
+ Information about the most downloaded model
88
+ """
89
+ try:
90
+ # List models from the specified author, sorted by downloads
91
+ models = list(list_models(
92
+ author=author,
93
+ sort="downloads",
94
+ direction=-1,
95
+ limit=5
96
+ ))
97
+
98
+ if models:
99
+ result = f"Top models by {author}:\n\n"
100
+ for i, model in enumerate(models[:5], 1):
101
+ result += (
102
+ f"{i}. {model.id}\n"
103
+ f" Downloads: {model.downloads:,}\n"
104
+ f" Likes: {model.likes}\n\n"
105
+ )
106
+ return result.strip()
107
+ else:
108
+ return f"No models found for author/organization '{author}'."
109
+ except Exception as e:
110
+ return f"Error fetching models for {author}: {str(e)}"
111
+
112
+
113
+ # Initialize the Hub stats tool
114
+ hub_stats_tool = HubStatsTool()
115
+