Daimler Monocular Pedestrian Detection Benchmark


Markus Enzweiler and Dariu M. Gavrila
E-Mail: dariu.gavrila@daimler.com
November 25, 2008

(C) 2008 by Daimler AG

Contents

  1. Introduction
  2. License Agreement
  3. Datasets
  4. Ground Truth
  5. Camera Parameters
  6. Ground Truth File Format
  7. Benchmarking Procedure
  8. Ground Truth Parser (Matlab)
  9. Contact

Introduction

This README describes the Daimler Monocular Pedestrian Detection Benchmark introduced in the publication

M. Enzweiler and D. M. Gavrila. Monocular Pedestrian Detection: Survey and Experiments.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009.

This dataset contains a collection of pedestrian and non-pedestrian images including ground truth annotations. It is made publicly available to academic and non-academic entities for research purposes.


License Agreement (TODO)

This dataset is made freely available to academic and non-academic entities for research purposes such as academic research, teaching, scientific publications, or personal experimentation. Permission is granted to use, copy, and distribute the data given that you agree:

  1. That the dataset comes "AS IS", without express or implied warranty. Although every effort has been made to ensure accuracy, Daimler does not accept any responsibility for errors or omissions.
  2. That you include a reference to the above publication in any published work that makes use of the dataset.
  3. That if you have altered the content of the dataset or created derivative work, prominent notices are made so that any recipients know that they are not receiving the original data.
  4. That you may not use or distribute the dataset or any derivative work for commercial purposes as, for example, licensing or selling the data, or using the data with a purpose to procure a commercial gain.
  5. That this license agreement is retained with all copies of the dataset.
  6. That all rights not expressly granted to you are reserved by Daimler.

Datasets

This dataset contains a collection of pedestrian and non-pedestrian images in 8bit PGM format, split into training and test data.

Training images were recorded at various day times and locations with no constraints on illumination, pedestrian pose or clothing, except that pedestrians are fully visible in an upright position. 15660 pedestrian (positive) samples are provided as training examples in two resolutions: 18x36 and 48x96. These samples were obtained by manually extracting 3915 rectangular position labels from video images. The original aspect ratios of the pedestrians were preserved, resulting in a variable amount of border pixels in horizontal direction. Four pedestrian samples were created from each label by means of mirroring and randomly shifting the bounding boxes by a few pixels in horizontal and vertical directions to account for localization errors in the application system. Pedestrian labels have a minimum height of 72 pixels, so that there is no upscaling involved in view of different training sample resolutions. Further, we provide 6744 full images not containing any pedestrians from negative samples for training can be extracted.

The test dataset consists of an independent image sequence comprising 21790 PGM images (640x480 pixels) with 56492 manual labels, including 259 trajectories of fully visible pedestrians, captured from a moving vehicle in a 27 minute drive through urban traffic.

For installation, simply extract the provided .tgz archives. This will create the folders

Data/TestData
and
Data/TrainingData


Ground Truth

Ground truth is given by manual annotations of bounding box locations for fully-visible pedestrians, pedestrian groups, partially occluded pedestrians, bicyclists and motorcyclists. For performance evaluation, we differentiate between the scenarios of generic pedestrian detection (evaluation in 2D) and pedestrian detection from a moving vehicle (evaluation in 3D). We provide two different ground truth files - one for each case.

2D Ground Truth

Ground truth is given by bounding box location of pedestrians in pixel coordinates with the top-left pixel in the image being (0,0). Fully-visible pedestrians of at least 72 pixels height are marked as required (confidence = 1.0, see below). Smaller or partially occluded pedestrians, as well as bicyclists and motorcyclists, are considered optional (confidence = 0.0, see below). Please refer to the paper for details on the evaluation criteria and procedure. 2D ground truth is provided in Database format in the file

GroundTruth/GroundTruth2D.db
Specification of the Database format is given below.

3D Ground Truth

2D ground truth has been projected to 3D using camera calibration and the assumption of pedestrians standing on the ground. Fully-visible pedestrians within the sensor coverage area, +-4m in lateral direction and 10m-25m in front of the vehicle (12m-27m in front of the camera), are marked as required (confidence = 1.0, see below). Pedestrians not in the sensor coverage area or partially occluded pedestrians, as well as bicyclists and motorcyclists, are considered optional (confidence = 0.0, see below). Please refer to the paper for details on the evaluation criteria and procedure. 3D ground truth is provided in Database format in the file

GroundTruth/GroundTruth3D.db
Specification of the Database format is given below.


Camera Parameters

Camera parameters necessary to project from 2D to 3D (ground-plane constraint) are provided in the file:
Calibration/camera_parameters.txt

The camera is mounted inside the vehicle below the rear view mirror. The world coordinate system origin is below the camera on the ground-plane. x is pointing to the right, y is pointing down and z is pointing towards the driving direction The camera is 1.17m above the ground and 2.0m behind the front of the vehicle. 3D coordinates in the 3D ground truth file denote the central point of the pedestrian on the ground plane (y=0) in world coordinates. Because of the camera-vehicle offset, the considered 3D detection range of 10m-25m in front of the vehicle corresponds to a distance of 12m-27m from the camera in the world coordinate system.


Ground Truth File Format

2D and 3D ground truth files are provided in the ASCII Database format. Specification is given below.

: sequence separator, initiates an image sequence entry
seq_id string identifier describping the sequence
absolute_path path to directory containing sequence images
nr_images length of sequence, or -1 if unkown

  ; image (frame) separator, initiates an image frame entry
  image_name image file name
  image_width image_height image size
  0 nr_of_objects ignore this entry; number of objects in the image

    # object_class 2D object separator, initiates an object entry in 2D (image) coordinates;
object class:
0=fully-visible pedestrian
1=bicyclist
2=motorcyclist
10=pedestrian group
255=partially visible pedestrian, bicyclist, motorcyclist
    obj_id unique_id object ID to identify trajectories of the same physical object;
additional ID unique to each object entry
    confidence confidence value indicating if this ground truth object is required (1.0) or optional (0.0)
    min_x min_y max_x max_y 2D bounding box coordinates (integer values, top-left in the image is 0,0)
    0 ignore this entry
    ... (end of 2D object entry, more objects to follow)

    § object_class 3D object separator, initiates an object entry in 3D (world) coordinates; object class:
0=fully-visible pedestrian
1=bicyclist
2=motorcyclist
10=pedestrian group
255=partially visible pedestrian, bicyclist, motorcyclist
    obj_id unique_id object ID to identify trajectories of the same physical object;
additional ID unique to each object entry
    confidence confidence value indicating if this ground truth object is required (1.0) or optional (0.0)     obj_min_x obj_min_y obj_min_z
    obj_max_x obj_max_y obj_max_z
3D bounding box coordinates (float values)
Here, obj_min_x = obj_max_x and obj_min_z = obj_max_z denoting the central point of the
pedestrian in x and z. obj_min_y = obj_max_y = 0 (on the ground-plane).
    ... (end of 3D object entry, more objects to follow)
  ... (end of image frame entry, more images to follow)
... (end of image sequence entry, more sequences to follow)

Benchmarking Procedure

Authors who wish to evaluate pedestrian detectors on this dataset are encouraged to follow the benchmarking procedure and criteria as detailed in the publication given above.

The original authors would like to hear about other publications that make use of the benchmark data set in order to include corresponding references on the benchmark website.


Ground Truth Parser (Matlab)

For convenience, a Matlab parser for 2D and 3D ground truth is provided in the directory

GroundTruthParser
See
GroundTruthParser/example.m
for details on how to interpret the Database format.

Note that this software is provided "as is" without warranty of any kind.


Contact

Please direct questions regarding the dataset and benchmarking procedure to Prof. Dr. Dariu Gavrila or Markus Enzweiler.