Video from a Single Coded Exposure Photograph using a Learned Over-Complete Dictionary
Proc. ICCV 2011
High speed compressive imaging (up to 1000 fps) using a conventional 30 fps camera, an LCoS (liquid crystal on silicon) light modulator and large scale dictionary learning. Applications in consumer and scientific imaging.
Cameras face a fundamental tradeoff between the spatial and temporal resolution – digital still cameras can capture images with high spatial resolution, but most high-speed video cameras suffer from low spatial resolution. It is hard to overcome this tradeoff without incurring a significant increase in hardware costs.
In this project, we propose techniques for sampling, representing and reconstructing the space-time volume in order to overcome this tradeoff. Our approach has two important distinctions compared to previous works: (1) we achieve sparse representation of videos by learning an over-complete dictionary on video patches, and (2) we adhere to practical constraints on sampling scheme which is imposed by architectures of present image sensor devices. Consequently, our sampling scheme can be implemented on image sensors by making a straightforward modification to the control unit.
To demonstrate the power of our approach, we have implemented a prototype imaging system with per-pixel coded exposure control using a liquid crystal on silicon (LCoS) device. Using both simulations and experiments on a wide range of scenes, we show that our method can effectively reconstruct a video from a single image maintaining high spatial resolution.
This project was done in collaboration with Yasunobu Hitomi and Tomoo Mitsunaga of Sony Corporation.
Due to image sensor hardware factors, as the frame rate increases, spatial resolution decreases. It causes degradation of image quality.
The goal of our work is to design an imaging system that can capture videos with both high spatial and temporal resolutions. In this project, we focus on two problems: 1) sampling, and 2) representation of space-time volumes for designing practical compressive video acquisition systems.
For the maximum flexibility in designing sampling schemes, it is important to have pixel-wise exposure control. Meanwhile, we design sampling functions adhering to practical constraints on sampling scheme which is imposed by architectures of present image sensor devices.
We propose learning an over-complete dictionary from a large collection of videos, and represent any given video as a sparse, linear combination of the elements from the dictionary. The redundant nature of these dictionaries leads to highly sparse representations.
While we have not yet fabricated a CMOS image sensor chip with per-pixel exposure control, we constructed an emulation imaging system with an LCoS device to achieve pixel-wise exposure control. We show video reconstruction results for a variety of motions, ranging from simple linear translation to complex fluid motion and muscle deformations.
Comments
Share This Article