SHINOBI

Shape and Illumination using Neural Object Decomposition via BRDF Optimization In-the-wild

University of TübingenGoogle Research

SHINOBIreconstructs shape, illumination and materials from in-the-wild image collections.

We present SHINOBI, an end-to-end framework for the reconstruction of shape, material, and illumination from object images captured with varying lighting, pose, and background. Inverse rendering of an object based on unconstrained image collections is a long-standing challenge in computer vision and graphics and requires a joint optimization over shape, radiance, and pose. We show that an implicit shape representation based on a multi-resolution hash encoding enables faster and robust shape reconstruction with joint camera alignment optimization that outperforms prior work. Further, to enable the editing of illumination and object reflectance (i.e. material) we jointly optimize BRDF and illumination together with the object's shape. Our method is class-agnostic and works on in-the-wild image collections of objects to produce relightable 3D assets for several use cases such as AR/VR, movies, games, etc.

Overview

SHINOBI is a category-agnostic technique to jointly reconstruct 3D shape and material properties of objects from unconstrained in-the-wild image collections. This data regime poses multiple challenges as images are captured in different environments using a variety of devices resulting in varying backgrounds, illumination, camera poses, and intrinsics. Conventional structure-from-motion techniques like COLMAP fail to reconstruct image collections under these challenging conditions. Recent methods like SAMURAI [2] and NeRS [3] can be initialized from very coarse view directions but still yield low quality reconstructions for many challenging scenes. Additionally, optimization takes more than 12 hours in the case of SAMURAI. In contrast, we propose a pipeline based on multiresolution hash grids [4] which allows us to process more rays in a shorter time. Using this advantage we are able to improve reconstruction quality while still keeping a competitive optimization time (~4 hours). However, naive integration of multi-resolution hash grids is not well suited to camera pose estimation due to discontinuities in the gradients with respect to the input positions. We propose several components that work together to stabilize the camera pose optimization and encourage sharp features:

  • Hybrid Multiresolution Hash Encoding with resolution level annealing
  • Optimized camera parameterization and constraint camera multiplex using a projection based loss over all camera proposals for a given view
  • Per-view importance weighting to leverage the important observation that some views are more useful for optimization than others
  • Patch-based alignment losses to aid in camera alignment and reconstruction of high-frequency details

Method

In the figure below we visualize the SHINOBI optimization pipeline. Two resolution annealed encoding branches, the multiresolution hash grid and the Fourier embedding are used to learn a neural volume conditioned on the input coordinates and illumination. Our patch-based losses and regularization scheme enables robust optimization of camera parameters jointly with the shape, material and per image illumination.

Results

The parametric material model allows for controlled editing of the object’s appearance. Also the illumination can be adjusted, e.g. for realistic composites. A mesh extraction allows further editing and integration in the standard graphics pipeline including real-time rendering. SHINOBI can help in obtaining relightable 3D assets for e-commerce applications as well as 3D AR and VR for entertainment and education.

ApplicationsWe show a scene featuring objects from the NAVI dataset [1] in a new consistent illumination environment as it would be required for AR and VR applications.

Comparison to SAMURAIWe reconstruct PBR material parameters basecolor, metallic and roughness. Compared to SAMURAI [2] high-frequency details are better preserved while optimization time is reduced to roughly a third. Here we show the "bald eagle" object from the NAVI dataset [1].

Reconstructed AssetsExample results from NAVI dataset [1]. Click on an image for an interactive 3D visualization. Select different environment maps for illumination below.

lebombo

Citation

@misc{engelhardt2023-shinobi, author ={Engelhardt, Andreas and Raj, Amit and Boss, Mark and Zhang, Yunzhi and Kar, Abhishek and Li, Yuanzhen and Sun, Deqing and Barron, Jonathan T. and Lensch, Hendrik P.A. and Jampani, Varun},title ={{SHINOBI}: {Sh}ape and {I}llumination using {N}eural {O}bject Decomposition via {B}RDF Optimization {I}n-the-wild},booktitle ={preprint},year ={2023}}

Acknowledgements

This work has been partially funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy – EXC number 2064/1 – Project number 390727645 and SFB 1233, TP 02 - Project number 276693517. It was supported by the German Federal Ministry of Education and Research (BMBF): Tübingen AI Center, FKZ: 01IS18039A.

References

[1] V. Jampani et al., NAVI: Category-agnostic image collections with high-quality 3D shape and pose annotations, in NeurIPS, 2023.

[2] M. Boss et al., SAMURAI: Shape And Material from Unconstrained Real-world Arbitrary Image collections, in NeurIPS, 2022.

[3] J. Zhang, G. Yang, S. Tulsiani, and D. Ramanan, NeRS: Neural reflectance surfaces for sparse-view 3D reconstruction in the wild, in NeurIPS, 2021.

[4] T. Müller, A. Evans, C. Schied, and A. Keller, Instant neural graphics primitives with a multiresolution hash encoding, ACM Trans. Graph., 2022.