Photon-Starved Scene Inference using Single Photon Cameras
Proc. ICCV 2021
Scene understanding under low-light conditions is a challenging problem. This is due to the small number of photons captured by the camera and the resulting low signal-to-noise ratio (SNR). Single-photon cameras (SPCs) are an emerging sensing modality that are capable of capturing images with high sensitivity. Despite having minimal read-noise, images captured by SPCs in photon-starved conditions still suffer from strong shot noise, preventing reliable scene inference. We propose photon scale-space — a collection of high-SNR images spanning a wide range of photons-per-pixel (PPP) levels (but same scene content) as guides to train inference model on low photon flux images. We develop training techniques that push images with different illumination levels closer to each other in feature representation space. The key idea is that having a spectrum of different brightness levels during training enables effective guidance, and increases robustness to shot noise even in extreme noise cases. Based on the proposed approach, we demonstrate, via simulations and real experiments with a SPAD camera, high-performance on various inference tasks such as image classification and monocular depth estimation under ultra low-light, down to <1.
Proc. ICCV 2021
Photon scale-space is a hierarchy of images, each with a different flux level, but sharing the same scene content. Successive images in the hierarchy have similar flux, so that high-flux images can guide the low-flux images during a training procedure.
We use photon-scale space to develop a meta network architecture called the photon net, where a network is trained with multiple input images with the same scene content but with varying noise levels in order to push them together in the feature space.The proposed approach is modular and versatile, lending itself to a wide range of inference tasks such as classification and depth estimation
Real SPAD images captured from CUB-200-2011 dataset using the SwissSPAD2 camera. SPCs captures sequence of binary images like (S1) with heavy shot noise. N-sum images (SN) are average of N binary images.
Results with Real SPAD Sensor of image classification on CUB-200 dataset for S1 test images with prediction probabilities output by both Student Teacher Learning and Photon Net (Ours). Classification output is highlighted in red for wrong prediction and green for correct prediction
Comparison of our method with Joint Denoising, which uses a denoiser with Depth Estimation architecture. Figure shows example depth output results of our approach and the baseline. Photon Net outperforms baseline approach both qualitatively and quantitatively for multiple noise levels