Improving the Efficiency of CMOS Image Sensors Through In-Sensor Selective Attention

The resolution of CMOS image sensors keeps increasing, posing a fundamental challenge to sensor throughput and efficiency. Inspired by selective attention in human vision, we introduce a saliency step to continuously select salient pixels, and reduce output volume, power consumption and latency. To minimize the overhead, we integrate three methods: image down-sampling, model reduction, and finally padding at a minimum for the same accuracy of object detection with selected pixels. We demonstrate our approach on various datasets, achieving 70.5% reduction in the output volume for BDD100K, which translates to 4.3× and 3.4× reduction in power consumption and latency, respectively.