SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass (3DV 2026)
This repository contains the official PyTorch implementation of SceneGen, introduced in SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass.
Now the Training, Inference Code, and Pretrained Models have all been released! Feel free to reach out for discussions!
๐ Resources
Project Page ยท Paper ยท Code ยท Checkpoints
โฉ News
- [2025.11] Evaluation code has been released.
- [2025.11] Glad to share that SceneGen has been accepted to 3DV 2026.
- [2025.9] Our training code and data processing code are released.
- [2025.8] The inference code and checkpoints are released.
- [2025.8] Our pre-print paper has been released on arXiv.
๐ฆ Installation & Pretrained Models
Prerequisites
- Hardware: An NVIDIA GPU with at least 16GB of memory is necessary. The code has been verified on NVIDIA A100 and RTX 3090 GPUs.
- Software:
- The CUDA Toolkit is needed to compile certain submodules. The code has been tested with CUDA versions 12.1.
- Python version 3.8 or higher is required.
Installation Steps
Clone the repo:
git clone https://github.com/Mengmouxu/SceneGen.git cd SceneGenInstall the dependencies: Create a new conda environment named
scenegenand install the dependencies:. ./setup.sh --new-env --basic --xformers --flash-attn --diffoctreerast --spconv --mipgaussian --kaolin --nvdiffrast --demoThe detailed usage of
setup.shcan be found by running. ./setup.sh --help.
Pretrained Models
- First, create a directory in the SceneGen folder to store the checkpoints:
mkdir -p checkpoints - Download the pretrained models for SAM2-Hiera-Large and VGGT-1B from SAM2 and VGGT, then place them in the
checkpointsdirectory. (SAM2 installation and its checkpoints are required for interactive generation with segmentation.) - Download our pretrained SceneGen model from here and place it in the
checkpointsdirectory as follows:SceneGen/ โโโ checkpoints/ โ โโโ sam2-hiera-large โ โโโ VGGT-1B โ โโโ scenegen | โโโckpts | โโโpipeline.json โโโ ...
๐ก Inference
We provide two scripts for inference: inference.py for batch processing and interactive_demo.py for an interactive Gradio demo.
Interactive Demo
This script launches a Gradio web interface for interactive scene generation.
- Features: It uses SAM2 for interactive image segmentation, allows for adjusting various generation parameters, and supports scene generation from single or multiple images.
- Usage:
python interactive_demo.py๐ Quick Start Guide
๐ท Step 1: Input & Segment
- Upload your scene image.
- Use the mouse to draw bounding boxes around objects.
- Click "Run Segmentation" to segment objects.
โป For multi-image generation: maintain consistent object annotation order across all images.
๐๏ธ Step 2: Manage Cache
- Click "Add to Cache" when satisfied with the segmentation.
- Repeat Steps 1-2 for multiple images.
- Use "Delete Selected" or "Clear All" to manage cached images.
๐ฎ Step 3: Generate Scene
- Adjust generation parameters (optional).
- Click "Generate 3D Scene".
- Download the generated GLB file when ready.
Pre-segmented Image Inference
This script processes a directory of pre-segmented images.
- Input: The input folder structure should be similar to
assets/masked_image_test, containing segmented scene images. - Visualization: For scenes with ground truth data, you can use the
--gradioflag to launch a Gradio interface that visualizes both the ground truth and the generated model. - Usage:
python inference.py --gradio
๐ Dataset
To train and evaluate SceneGen, we use the 3D-FUTURE dataset. Please refer to the GitHub repository for detailed preprocessing and data handling instructions.
๐๏ธโโ๏ธ Training
With the processed 3D-FUTURE dataset and the pretrained ss_flow_img_dit_L_16l8_fp16.safetensors model checkpoint from TRELLIS correctly placed in the checkpoints/scenegen/ckpts directory, you can train SceneGen using the following command:
bash scripts/train.sh
๐งช Evaluation
To generate the 3D scenes on the 3D-FUTURE test set using the SceneGen model, use the following command:
bash scenegen_eval.sh
which will use the scenegen_eval.py script to generate the normalized scenes.
To evaluate the trained SceneGen model on the 3D-FUTURE test set, use the following command:
cd evalscene
bash eval_scenegen.sh
Make sure to have the processed 3D-FUTURE dataset and the rendered images in place as described in the Dataset section and the evaluation configs in evalscene/configs/test/scene_evaluation_scenegen.yaml set correctly. Then the evaluation script will compute metrics between the normalized generated scenes and the ground truth.
Some packages used in the evaluation require additional installation. Please install the packages: torchmetrics, lpips, clip, and probreg via pip.
๐ Citation
If you use this code and data for your research or project, please cite:
@inproceedings{meng2026scenegen,
author = {Meng, Yanxu and Wu, Haoning and Zhang, Ya and Xie, Weidi},
title = {SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass},
booktitle = {International Conference on 3D Vision 2026},
year = {2026},
}
Acknowledgements
Many thanks to the code bases from TRELLIS, DINOv2, and VGGT.
Contact
If you have any questions, please feel free to contact [email protected] and [email protected].