Abstract
PlenopticDreamer enables consistent multi-view video re-rendering through synchronized generative hallucinations, leveraging camera-guided retrieval and progressive training mechanisms for improved temporal coherence and visual fidelity.
Camera-controlled generative video re-rendering methods, such as ReCamMaster, have achieved remarkable progress. However, despite their success in single-view setting, these works often struggle to maintain consistency across multi-view scenarios. Ensuring spatio-temporal coherence in hallucinated regions remains challenging due to the inherent stochasticity of generative models. To address it, we introduce PlenopticDreamer, a framework that synchronizes generative hallucinations to maintain spatio-temporal memory. The core idea is to train a multi-in-single-out video-conditioned model in an autoregressive manner, aided by a camera-guided video retrieval strategy that adaptively selects salient videos from previous generations as conditional inputs. In addition, Our training incorporates progressive context-scaling to improve convergence, self-conditioning to enhance robustness against long-range visual degradation caused by error accumulation, and a long-video conditioning mechanism to support extended video generation. Extensive experiments on the Basic and Agibot benchmarks demonstrate that PlenopticDreamer achieves state-of-the-art video re-rendering, delivering superior view synchronization, high-fidelity visuals, accurate camera control, and diverse view transformations (e.g., third-person to third-person, and head-view to gripper-view in robotic manipulation). Project page: https://research.nvidia.com/labs/dir/plenopticdreamer/
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- ReCamDriving: LiDAR-Free Camera-Controlled Novel Trajectory Video Generation (2025)
- Infinite-Homography as Robust Conditioning for Camera-Controlled Video Generation (2025)
- StereoWorld: Geometry-Aware Monocular-to-Stereo Video Generation (2025)
- SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time (2025)
- PostCam: Camera-Controllable Novel-View Video Generation with Query-Shared Cross-Attention (2025)
- CETCAM: Camera-Controllable Video Generation via Consistent and Extensible Tokenization (2025)
- Light-X: Generative 4D Video Rendering with Camera and Illumination Control (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper