NCOD: Near-Optimum Video Compression for Object Detection

Abstract

Since for a considerable portion of the captured video the target is a machine learning task, rather than a human audience, transmission of videos in such applications requires efficient video compression tailored for machine vision. However, existing compression solutions are optimized for human vision. This paper presents a methodology to optimize an existing video compression standard, HEVC, for a machine vision task, Object Detection (OD). To this end, (1) a dataset of compressed videos, including several compression-ratios and their corresponding OD performance is collected to enable modeling, (2) A trade-off point (knee-point) between bitrate and OD performance is defined, that finds the point after which no major improvements will be achieved, (3) an extensive set of features were extracted and studied to model this point, via a practical machine learning method. The resulting solution can predict the knee-point with MAE=1.28, resulting in a ΔRecall of only 0.012 and bitrate reduction of 86.56%, compared to OD with very high-quality video.