Isaac-0.2-1B by Perceptron

Introducing the 1B parameter variant of Isaac-0.2, the hybrid-reasoning vision-language model.

This release brings major upgrades β€” optional reasoning via thinking traces, perceptive tool calling (including our new Focus system), stronger grounding, better OCR, better desktop use, and improved structured output β€” while remaining fast, compact, and deployable.

Extending the efficient frontier of perception

Isaac 0.2 extends what we started with Isaac 0.1: small models that outperform systems 10Γ— larger on visual reasoning and perception tasks, all running on commodity GPUs or edge devices. From robotics to media search to industrial inspection, Isaac 0.2 delivers high-accuracy perception without the heavy compute footprint.

image

What's New in Isaac 0.2

  • Reasoning via Thinking Traces: Short, structured reasoning traces improve multi-step decisions, small-object understanding, and ambiguous spatial tasks.

  • Perceptive Tool Calling + Focus (Zoom & Crop): Isaac 0.2 can trigger tool calls to focus (i.e., zoom and crop) and re-query the model on a smaller region β€” dramatically improving fine-grained perception.

  • Structured Outputs: More reliable structured output generation for consistent JSON and predictable downstream integration.

  • Complex OCR: Improved text recognition across cluttered, low-resolution, or distorted regions β€” enabling accurate extraction from documents, diagrams, labels, screens, and dense real-world scenes.

  • Desktop Use: Better performance on everyday desktop and mobile workflows such as UI understanding and navigation, making Isaac faster and more capable for agentic use cases.

Performance Benchmarks

image

Chatting with Isaac in πŸ€— Transformers

Learn more at our Huggingface Example Repo, where we demo extracting and rendering points.

pip install perceptron

Usage

import torch
from transformers import AutoModelForCausalLM, AutoProcessor
from transformers.image_utils import load_image
from transformers.utils.import_utils import is_torch_cuda_available

def document_to_messages(document: list[dict]):
    messages, images = [], []
    for item in document:
        if not (content := item.get("content")):
            continue
        role = item.get("role", "user")
        if item.get("type") == "image":
            images.append(load_image(content))
            messages.append({"role": role, "content": "<image>"})
        elif item.get("type") == "text":
            messages.append({"role": role, "content": content})
    return messages, images

hf_path = "PerceptronAI/Isaac-0.2-1B"
device, dtype = ("cuda",torch.bfloat16) if is_torch_cuda_available() else ("cpu",torch.float32)

# Load model/processor from the checkpoint
processor = AutoProcessor.from_pretrained(hf_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    hf_path, trust_remote_code=True, vision_attn_implementation="flash_attention_2"
)
model = model.to(device=device, dtype=dtype)
model.eval()

# Prepare input for generation
document = [
    {
        "type": "text",
        "content": "<hint>BOX</hint>",
        "role": "user",
    },
    {
        "type": "image",
        "content": "https://raw.githubusercontent.com/perceptron-ai-inc/perceptron/refs/heads/main/huggingface/assets/example.webp",
        "role": "user",
    },
    {
        "type": "text",
        "content": "Determine whether it is safe to cross the street. Look for signage and moving traffic.",
        "role": "user",
    },
]
messages, images = document_to_messages(document)
text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
inputs = processor(text=text, images=images, return_tensors="pt")

# Generate text using the model
generated_ids = model.generate(
    tensor_stream=inputs["tensor_stream"].to(next(model.parameters()).device),
    max_new_tokens=256,
    do_sample=False,
)
generated_text = processor.tokenizer.decode(
    generated_ids[0], skip_special_tokens=False
)
print(f"\nFull generated output:\n{generated_text}")
Downloads last month
39
Safetensors
Model size
1B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support