EgoX: Egocentric Video Generation from a Single Exocentric Video
This repository provides model weights of EgoX, a video-to-video generation model that synthesizes egocentric (first-person) videos from a single exocentric (third-person) video.
EgoX is built on top of a large-scale video diffusion backbone and enables exo-to-ego viewpoint transformation without requiring multi-view inputs.
For detailed results, implementation details, and demo videos, please refer to our paper and project repository.
Usage
Please refer to the Quick Start section for instructions on running inference and required preprocessing steps.
Citation
If you find this model or code useful in your research, please cite our paper:
@misc{kang2025egoxegocentricvideogeneration,
title={EgoX: Egocentric Video Generation from a Single Exocentric Video},
author={Taewoong Kang and Kinam Kim and Dohyeon Kim and Minho Park and Junha Hyung and Jaegul Choo},
year={2025},
eprint={2512.08269},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2512.08269},
}
Acknowledgement
This work builds upon the valuable open-source efforts of
4DNeX and
EgoExo4D.
We sincerely appreciate their contributions to the computer vision and robotics communities.
- Downloads last month
- 186
Model tree for DAVIAN-Robotics/EgoX
Base model
Wan-AI/Wan2.1-I2V-14B-480P-Diffusers