Generating complete 360° panoramas from narrow field of view images is ongoing research as omnidirectional RGB data is not readily available. Existing GAN-based approaches face some barriers to achieving higher quality output, and have poor generalization performance over different mask types. In this paper, we present our 360° indoor RGB panorama outpainting model using latent diffusion models (LDM), called IPO-LDM. We introduce a new bi-modal latent diffusion structure that utilizes both RGB and depth panoramic data during training, but works surprisingly well to outpaint normal depth-free RGB images during inference. We further propose a novel technique of introducing progressive camera rotations during each diffusion denoising step, which leads to substantial improvement in achieving panorama wraparound consistency. Results show that our IPO-LDM not only significantly outperforms state-of-the-art methods on RGB panorama outpainting, but can also produce multiple and diverse well-structured results for different types of masks.
The overall pipeline of our proposed IPO-LDM method
IPO-LDM effectively generates semantically meaningful content and plausible appearances on various masks with multiple and diverse solutions.
RGB input
Depth GT
IPO-LDM output
Given complete RGB images, our IPO-LDM can correspondingly generate accurate absolute depth images.
Code will be available soon.
This website is adapted from GLIGEN.
@misc{wu2023ipoldm,
title={IPO-LDM: Depth-aided 360-degree Indoor RGB Panorama Outpainting via Latent Diffusion Model},
author={Tianhao Wu and Chuanxia Zheng and Tat-Jen Cham},
year={2023},
eprint={2307.03177},
archivePrefix={arXiv},
primaryClass={cs.CV}
}