File size: 5,029 Bytes
7fcdb70
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
# FaraCUA Backend

The backend server for FaraCUA - a Computer Use Agent (CUA) demo powered by Microsoft's Fara-7B vision-language model and Modal for serverless GPU inference.

## Overview

This backend provides:

- **WebSocket API** - Real-time communication with the React frontend for streaming agent actions
- **REST API** - Model listing, random question generation, and trace storage
- **FARA Agent Integration** - Runs the Fara agent with Playwright for browser automation
- **Modal Integration** - Proxies requests to Modal's vLLM endpoint and trace storage

## Architecture

```

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     WebSocket      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     HTTP      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”

β”‚   Frontend  β”‚ ◄───────────────► β”‚   Backend   β”‚ ◄───────────► β”‚    Modal    β”‚

β”‚   (React)   β”‚                    β”‚  (FastAPI)  β”‚               β”‚   (vLLM)    β”‚

β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜               β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

                                         β”‚

                                         β”‚ Playwright

                                         β–Ό

                                   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”

                                   β”‚   Browser   β”‚

                                   β”‚  (Headless) β”‚

                                   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

```

## Files

| File | Description |
|------|-------------|
| `server.py` | Main FastAPI server with WebSocket and REST endpoints |
| `modal_fara_vllm.py` | Modal deployment for vLLM inference and trace storage |
| `pyproject.toml` | Python dependencies |
| `.env.example` | Example environment configuration |

## Setup

### 1. Install Dependencies

```bash

# Using uv (recommended)

uv sync



# Or using pip

pip install -e .

```

### 2. Install Playwright

```bash

playwright install chromium

```

### 3. Deploy Modal Endpoints

```bash

modal deploy backend/modal_fara_vllm.py

```

This deploys:
- **vLLM Server** - GPU-accelerated inference for Fara-7B at `https://<workspace>--fara-vllm-serve.modal.run`
- **Trace Storage** - Endpoint for storing task traces at `https://<workspace>--fara-vllm-store-trace.modal.run`

### 4. Configure Environment

Copy `.env.example` to `.env` and fill in your values:

```bash

cp .env.example .env

```

Required variables:

| Variable | Description |
|----------|-------------|
| `FARA_MODEL_NAME` | Model name (default: `microsoft/Fara-7B`) |
| `FARA_ENDPOINT_URL` | Modal vLLM endpoint URL (from deploy output) |
| `FARA_API_KEY` | API key (default: `not-needed` for Modal) |
| `MODAL_TOKEN_ID` | Modal proxy auth token ID |
| `MODAL_TOKEN_SECRET` | Modal proxy auth token secret |
| `MODAL_TRACE_STORAGE_URL` | Modal trace storage endpoint URL |

Get Modal proxy auth tokens at: https://modal.com/settings/proxy-auth-tokens

### 5. Run the Server

```bash

# Development mode

uvicorn backend.server:app --host 0.0.0.0 --port 8000 --reload



# Or directly

python -m backend.server

```

## API Endpoints

### WebSocket

- `ws://localhost:8000/ws` - Real-time agent communication
  - **Receives**: `user_task`, `stop_task`, `ping`
  - **Sends**: `heartbeat`, `agent_start`, `agent_progress`, `agent_complete`, `agent_error`

### REST

| Method | Endpoint | Description |
|--------|----------|-------------|
| GET | `/api/health` | Health check |
| GET | `/api/models` | List available models |
| GET | `/api/random-question` | Get a random example task |
| POST | `/api/traces` | Store a trace (proxies to Modal) |

## Trace Storage

Task traces are automatically uploaded to Modal volumes for research purposes. Traces include:

- Task instruction and model used
- Step-by-step agent actions with screenshots
- Token usage and timing metrics
- User evaluation (success/failed)

Duplicate traces (same ID and instruction) are automatically overwritten to capture the latest evaluation.

## Docker

The backend is designed to run in Docker alongside the frontend. See the root `Dockerfile` for the combined deployment.

```bash

# Build from root

docker build -t fara-cua .



# Run with env file

docker run -d --name fara-cua -p 7860:7860 --env-file backend/.env fara-cua

```

## Development

### Running Locally

For local development, you can run the backend separately:

```bash

cd backend

uvicorn server:app --host 0.0.0.0 --port 8000 --reload

```

Make sure the frontend is configured to connect to `http://localhost:8000`.

### Testing Modal Endpoints

```bash

# Test vLLM endpoint

modal run backend/modal_fara_vllm.py::test



# Check deployment status

modal app list

```

## License

See the root LICENSE file for license information.