Spike-Based Anytime Perception
Proc. WACV 2023
Moving beyond frame-based visual recognition, targeting applications with tight latency and power constraints
In many emerging computer vision applications, it is critical to adhere to stringent latency and power constraints. The current neural network paradigm of frame-based, floating-point inference is often ill-suited to these resource-constrained applications. Spike-based perception – enabled by spiking neural networks (SNNs) – is one promising alternative. Unlike conventional neural networks (ANNs), spiking networks exhibit smooth tradeoffs between latency, power, and accuracy. SNNs are the archetype of an “anytime algorithm” whose accuracy improves smoothly over time. This property allows SNNs to adapt their computational investment in response to changing resource constraints. Unfortunately, mainstream algorithms for training SNNs (i.e., those based on ANN-to-SNN conversion) tend to produce models that are inefficient in practice. To mitigate this problem, we propose a set of principled optimizations that reduce latency and power consumption by 1–2 orders of magnitude in converted SNNs. These optimizations leverage a set of novel efficiency metrics designed for anytime algorithms. We also develop a state-of-the-art simulator, SaRNN, which can simulate SNNs using commodity GPU hardware and neuromorphic platforms. We hope that the proposed optimizations, metrics, and tools will facilitate the future development of spike-based vision systems.
Proc. WACV 2023
We start by defining a pair of metrics: Pareto latency and Pareto power. Instead of measuring the performance of the model at a single operating point, these metrics integrate over the entire accuracy-time or accuracy-power tradeoff curve.
We then use these metrics to optimize SNNs trained via ANN conversion. There are three phases in the SNN's life cycle: (1) ANN training, (2) ANN-to-SNN
conversion, and (3) SNN inference.
Phase 1 involves training a conventional ANN for some computer vision task. We propose fine-tuning the ANN for increased activation and weight sparsity. This sparsity is preserved when we later convert to an SNN. We demonstrate a novel method for improving activation sparsity using batch normalization. Decreasing the value of the batch normalization beta parameter leads to improved sparsity.
Here we show examples of activation sparsity improvements in a convolutional MNIST model.
We then consider phase 2, ANN to SNN conversion. This conversion involves an activation scaling step. We show that this scaling can be done at the granularity of individual neurons. In previous work, this scaling was done at the granularity of entire layers.
We propose an algorithm for formally optimizing the activation scaling factors that leverages our Pareto metrics. Formal optimization is guaranteed to perform at least as well as past heuristic-based scaling methods (e.g., Spike-Norm).
Lastly, we consider phase 3, SNN inference. We propose a high-entropy initialization scheme that significantly reduces latency compared to a standard initialization. The y-axis units in this plot are "Pareto latency relative to a model with a standard, low-entropy initialization."
Our optimizations reduce the SNN's latency and power usage by 1-2 orders of magnitude.