Deep Occlusion Reasoning for Multi-Camera Multi-Target Detection

Project page for the Deep-Occlusion Multi-Camera Multi-People tracker.

Arxiv paper and code are available.

Abstract

People detection in single 2D images has improved greatly in recent years. However, comparatively little of this progress has percolated into multi-camera multi-people tracking algorithms, whose performance still degrades severely when scenes become very crowded. In this work, we introduce a new architecture that combines Convolutional Neural Nets and Conditional Random Fields to explicitly model those ambiguities. One of its key ingredients are high-order CRF terms that model potential occlusions and give our approach its robustness even when many people are present. Our model is trained end-to-end and we show that it outperforms several state-of-art algorithms on challenging scenes.

Detection Results

We provide a two minutes long people detection video illustrating the behaviour output of our method on the challenging and crowded new ETHZ dataset.

Deep-Occlusion reasoning + K-Shortest Path Tracking. Also available with top-view .

Gaussian Mixture Networks

Technical details on learning gaussian mixture networks used in our discriminative model.

Visualization of the output of the discriminative model and the learned Gaussian classes.

Efficient Implementation of Mean-Fields

We provide a short document which reviews the keys elements towards efficient impementation of Mean-Fields inference with our High-Order potentials.