Daimler Multi-Cue Occluded Pedestrian Classification Benchmark

Markus Enzweiler and Dariu M. Gavrila
E-Mail: dariu.gavrila@daimler.com
July 01, 2010

(C) 2010 by Daimler AG


  1. Introduction
  2. License Agreement
  3. Datasets
  4. Matlab Interface
  5. Contact


This README describes the Daimler Multi-Cue Occluded Pedestrian Classification Benchmark introduced in the publication:

M. Enzweiler, A. Eigenstetter, B. Schiele and D. M. Gavrila,
Multi-Cue Pedestrian Classification with Partial Occlusion Handling,
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010.

This dataset contains a collection of pedestrian (non-occluded and partially occluded) and non-pedestrian images. It is made publicly available to academic and non-academic entities for research purposes.

License Agreement

This dataset is made freely available to academic and non-academic entities for research purposes such as academic research, teaching, scientific publications, or personal experimentation. Permission is granted to use, copy, and distribute the data given that you agree:

  1. That the dataset comes "AS IS", without express or implied warranty. Although every effort has been made to ensure accuracy, Daimler does not accept any responsibility for errors or omissions.
  2. That you include a reference to the above publication in any published work that makes use of the dataset.
  3. That if you have altered the content of the dataset or created derivative work, prominent notices are made so that any recipients know that they are not receiving the original data.
  4. That you may not use or distribute the dataset or any derivative work for commercial purposes as, for example, licensing or selling the data, or using the data with a purpose to procure a commercial gain.
  5. That this license agreement is retained with all copies of the dataset.
  6. That all rights not expressly granted to you are reserved by Daimler.


Our training and test samples consist of manually labeled pedestrian and non-pedestrian bounding boxes in images captured from a vehicle-mounted calibrated stereo camera rig in an urban environment. For each manually labeled pedestrian, we created additional samples by geometric jittering. Non-pedestrian samples were the result of a shape detection pre-processing step with relaxed threshold setting, i.e. containing a bias towards more difficult patterns.

Dense stereo is computed using the semi-global matching algorithm (H. Hirschmueller, Stereo processing by semi-global matching and mutual information, IEEE PAMI, 30(2):328-341, 2008) To compute dense optical flow, we use structure- and motion-adaptive regularized flow (A. Wedel et al., Structure- and motion-adaptive regularization for high accuracy optic flow, ICCV, 2009).

Training and test samples have a resolution of 48 x 96 pixels with a 12-pixel border around the pedestrians. Note, that the experiments in our paper (see above) were done on 36 x 84 pixel images with a border of 6 pixels, i.e. crops of the provided dataset, with a three-component layout corresponding to head, torso, legs. For publication of the dataset, we chose to provide images with a larger border and without a pre-defined component layout, to allow for higher flexibility in the selection of components.

Datasets are provided in Matlab .mat format which contain a N (rows) x M (cols) matrix with N the number of samples and M the vectorized dimension of the images (48*96 = 4608). Images were vectorized using a row-wise scheme (note that Matlab typically uses column-wise ordering). Samples are aligned across image cues, so that the n-th sample in intensity corresponds to the n-th sample in stereo and flow data.

Training Set 52112 samples

32465 samples

Test Set (Non-Occluded) 25608 samples

16235 samples

Test Set (Partially Occluded) 11160 samples

16235 samples


In intensity images, each pixel encodes gray-level intensity. In stereo images, each pixel encodes the estimated depth in meters. In flow images, each pixel encodes the estimated (horizontal) sub-pixel optical flow between two temporally aligned images. Note, that to prevent flow values to become negative, an offset of 127 has been added to the estimated flow value, i.e. a value of 127 corresponds to zero flow.

Matlab Interface

Here is Matlab sample code to load and visualize the data:

clear all;
close all;

%% load

%% visualize (the 3rd sample)

imshow(reshape(pedOccludedTestIntensity(whichSample,:), 48,96)',[]);

imshow(reshape(pedOccludedTestStereo(whichSample,:), 48,96)',[]);
colormap hot;

imshow(reshape(pedOccludedTestFlow(whichSample,:), 48,96)',[]);
colormap hot;

The resulting figures should look like this:

Intensity Stereo Flow
Intensity Stereo Flow


Please direct questions regarding the dataset and benchmarking procedure to Prof. Dr. Dariu Gavrila or Markus Enzweiler.