FaraCUA Backend
The backend server for FaraCUA - a Computer Use Agent (CUA) demo powered by Microsoft's Fara-7B vision-language model and Modal for serverless GPU inference.
Overview
This backend provides:
- WebSocket API - Real-time communication with the React frontend for streaming agent actions
- REST API - Model listing, random question generation, and trace storage
- FARA Agent Integration - Runs the Fara agent with Playwright for browser automation
- Modal Integration - Proxies requests to Modal's vLLM endpoint and trace storage
Architecture
βββββββββββββββ WebSocket βββββββββββββββ HTTP βββββββββββββββ
β Frontend β βββββββββββββββββΊ β Backend β βββββββββββββΊ β Modal β
β (React) β β (FastAPI) β β (vLLM) β
βββββββββββββββ βββββββββββββββ βββββββββββββββ
β
β Playwright
βΌ
βββββββββββββββ
β Browser β
β (Headless) β
βββββββββββββββ
Files
| File | Description |
|---|---|
server.py |
Main FastAPI server with WebSocket and REST endpoints |
modal_fara_vllm.py |
Modal deployment for vLLM inference and trace storage |
pyproject.toml |
Python dependencies |
.env.example |
Example environment configuration |
Setup
1. Install Dependencies
# Using uv (recommended)
uv sync
# Or using pip
pip install -e .
2. Install Playwright
playwright install chromium
3. Deploy Modal Endpoints
modal deploy backend/modal_fara_vllm.py
This deploys:
- vLLM Server - GPU-accelerated inference for Fara-7B at
https://<workspace>--fara-vllm-serve.modal.run - Trace Storage - Endpoint for storing task traces at
https://<workspace>--fara-vllm-store-trace.modal.run
4. Configure Environment
Copy .env.example to .env and fill in your values:
cp .env.example .env
Required variables:
| Variable | Description |
|---|---|
FARA_MODEL_NAME |
Model name (default: microsoft/Fara-7B) |
FARA_ENDPOINT_URL |
Modal vLLM endpoint URL (from deploy output) |
FARA_API_KEY |
API key (default: not-needed for Modal) |
MODAL_TOKEN_ID |
Modal proxy auth token ID |
MODAL_TOKEN_SECRET |
Modal proxy auth token secret |
MODAL_TRACE_STORAGE_URL |
Modal trace storage endpoint URL |
Get Modal proxy auth tokens at: https://modal.com/settings/proxy-auth-tokens
5. Run the Server
# Development mode
uvicorn backend.server:app --host 0.0.0.0 --port 8000 --reload
# Or directly
python -m backend.server
API Endpoints
WebSocket
ws://localhost:8000/ws- Real-time agent communication- Receives:
user_task,stop_task,ping - Sends:
heartbeat,agent_start,agent_progress,agent_complete,agent_error
- Receives:
REST
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/health |
Health check |
| GET | /api/models |
List available models |
| GET | /api/random-question |
Get a random example task |
| POST | /api/traces |
Store a trace (proxies to Modal) |
Trace Storage
Task traces are automatically uploaded to Modal volumes for research purposes. Traces include:
- Task instruction and model used
- Step-by-step agent actions with screenshots
- Token usage and timing metrics
- User evaluation (success/failed)
Duplicate traces (same ID and instruction) are automatically overwritten to capture the latest evaluation.
Docker
The backend is designed to run in Docker alongside the frontend. See the root Dockerfile for the combined deployment.
# Build from root
docker build -t fara-cua .
# Run with env file
docker run -d --name fara-cua -p 7860:7860 --env-file backend/.env fara-cua
Development
Running Locally
For local development, you can run the backend separately:
cd backend
uvicorn server:app --host 0.0.0.0 --port 8000 --reload
Make sure the frontend is configured to connect to http://localhost:8000.
Testing Modal Endpoints
# Test vLLM endpoint
modal run backend/modal_fara_vllm.py::test
# Check deployment status
modal app list
License
See the root LICENSE file for license information.