3D Scene Inference from Transient Histograms
Proc. ECCV 2022
Time-critical and resource-constrained 3D scene inference using transient histograms as a primitive scene representation.
Time-resolved image sensors that capture light at pico-to-nanosecond timescales were once limited to niche applications but are now rapidly becoming mainstream in consumer devices. We propose low-cost and low-power imaging modalities that capture scene information from minimal time-resolved image sensors with as few as one pixel. The key idea is to flood illuminate large scene patches (or the entire scene) with a pulsed light source and measure the time-resolved reflected light by integrating over the entire illuminated area. The one-dimensional measured temporal waveform, called transient, encodes both distances and albedoes at all visible scene points and as such is an aggregate proxy for the scene’s 3D geometry. We explore the viability and limitations of the transient waveforms by themselves for recovering scene information, and also when combined with traditional RGB cameras. We show that plane estimation can be performed from a single transient and that using only a few more it is possible to recover a depth map of the whole scene. We also show two proof-of-concept hardware prototypes that demonstrate the feasibility of our approach for compact, mobile, and budget-limited applications.
Proc. ECCV 2022
Many applications can significantly benefit from having 3D sensor data, yet capturing this data is not always possible due to the sensors being mechanically complex and bulky, high-powered, and costly.
Part of the issue is that current-generation sensors can only measure the depth of a single scene point at a time, leading to slow scanning sensors and requiring complex mechanical designs.
Instead of scanning every scene point, in this work, we ask the following question: “What would happen if we purposefully diffuse both the detector’s and light’s source’s FoV such that they cover the whole scene?”
To motivate this idea, we show some simulated examples. On the left are some basic shapes, and on the right, we show the incident waveform, called transient histogram, which we would measure. Notice that each different shape seems to correspond to a different transient. Intuitively, the transient histogram captures a "signature" of both the object's shape, and its orientation.
We propose two methods to estimate planar parameters from a single transient and demonstrate these using a cheap repurposed proximity sensor.
Dense depth estimation is very challenging to do with transients as a single transient contains no spatial information. To overcome this challenge, we instead use a small array of transients.
Our method produces a depth estimate at every scene point given only a few transients. However, if we have access to an accompanying RGB image we can it to augment our estimate using an optional post-processing step.
Here we compare with leading MDE techniques. You can see that while these techniques have visually appealing results they usually have large depth errors. Overall, our method produces substantially smaller depth errors.
In fact, we show that with an array of transients as small as a 4 by 3, we can achieve similar accuracy as SOTA MDE techniques while requiring 1/10th power, much less bandwidth and orders of magnitude less compute.
We further demonstrate these capabilities using a custom hardware prototype. Observe how our method provides higher depth information than simply applying an upsampling operation to a low-resolution depth map.