EgoX: Egocentric Video Generation from a Single Exocentric Video

This repository provides model weights of EgoX, a video-to-video generation model that synthesizes egocentric (first-person) videos from a single exocentric (third-person) video.
EgoX is built on top of a large-scale video diffusion backbone and enables exo-to-ego viewpoint transformation without requiring multi-view inputs.

For detailed results, implementation details, and demo videos, please refer to our paper and project repository.

Usage

Please refer to the Quick Start section for instructions on running inference and required preprocessing steps.

Citation

If you find this model or code useful in your research, please cite our paper:

@misc{kang2025egoxegocentricvideogeneration,
  title={EgoX: Egocentric Video Generation from a Single Exocentric Video},
  author={Taewoong Kang and Kinam Kim and Dohyeon Kim and Minho Park and Junha Hyung and Jaegul Choo},
  year={2025},
  eprint={2512.08269},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2512.08269},
}