1Westlake University 2ETH Zurich 3Zhejiang University
USB-NeRF is able to correct rolling shutter distortions and recover accurate camera motion trajectory simultaneously under the framework of NeRF, by modeling the physical image formation process of a rolling shutter camera.
We propose to represent the 3D scene with NeRF and model the camera motion trajectory with a differentiable continuous-time cubic B-Spline in the \(\mathbf{SE}(3)\) space. Given a sequence of rolling shutter images, we optimize the camera motion trajectory (i.e. estimate the parameters of the cubic B-Splines) and learn the implicit 3D representation simultaneously.
Different from global shutter cameras(left figure), each scanline/image row of the rolling shutter camera(right figure) is captured at different timestamps. Assuming the readout direction of RS camera is from top to bottom, this process can be mathematically modeled as (assuming infinitesimal exposure time):
where \(\mathbf{I}^r(\mathbf{x})\) is the rolling shutter image, \([\mathbf{I}(\mathbf{x})]_i\) denotes an operator which extracts the \(i^{th}\) row from image \(\mathbf{I}(\mathbf{x})\), \(\mathbf{I}^g_i(\mathbf{x})\) is the global shutter image captured at the same pose as the \(i^{th}\) row of \(\mathbf{I}^r(\mathbf{x})\). We denote the pose of the \(i^{th}\) row of \(\mathbf{I}^r(\mathbf{x})\) as \(\mathbf{T}_{c_i}^w\). Thus, provided the 3D representation by NeRF and the known poses \(\mathbf{T}_{c_i}^w\) for \(i=0,1,...,(H-1)\), where \(H\) is the height of the image, we can easily render the corresponding rolling shutter image \(\mathbf{I}^r(\mathbf{x})\). From the above derivations, we can see that \(\mathbf{I}^r(\mathbf{x})\) is a function of \(\mathbf{\theta}\) (ie. the weight of the MLP network), and \(\mathbf{T}_{c_i}^w\) for \(i=0,1,...,(H-1)\). Furthermore, we can also find that \(\mathbf{I}^r(\mathbf{x})\) is differentiable with respect to \(\mathbf{T}_{c_i}^w\) and \(\mathbf{\theta}\). It thus lays the foundation for our bundle adjustment formulation with a sequence of rolling shutter images.
We evaluate the performance of our method against other state-of-the-art methods in terms of rolling shutter effect removal with both synthetic and real datasets. Our method achieves better performance over all prior learning-free and learning-based methods.
Our method could take advantage of multi-view information instead of only two views, and thus performs better than existing methods in terms of novel view image synthesis.
The experimental results demonstrate that our method is able to estimate the motion trajectories with a sequence of rolling shutter images.
Please cite us if you find our work useful: