This README describes the Daimler Monocular Pedestrian Detection Benchmark introduced in the publication
M. Enzweiler and D. M. Gavrila. Monocular Pedestrian Detection: Survey and Experiments.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009.
This dataset contains a collection of pedestrian and non-pedestrian images including ground truth annotations. It is made publicly available to academic and non-academic entities for research purposes.
This dataset is made freely available to academic and non-academic entities for research purposes such as academic research, teaching, scientific publications, or personal experimentation. Permission is granted to use, copy, and distribute the data given that you agree:
This dataset contains a collection of pedestrian and non-pedestrian images in 8bit PGM format, split into training and test data.
Training images were recorded at various day times and locations with no constraints on illumination, pedestrian pose or clothing, except that pedestrians are fully visible in an upright position. 15660 pedestrian (positive) samples are provided as training examples in two resolutions: 18x36 and 48x96. These samples were obtained by manually extracting 3915 rectangular position labels from video images. The original aspect ratios of the pedestrians were preserved, resulting in a variable amount of border pixels in horizontal direction. Four pedestrian samples were created from each label by means of mirroring and randomly shifting the bounding boxes by a few pixels in horizontal and vertical directions to account for localization errors in the application system. Pedestrian labels have a minimum height of 72 pixels, so that there is no upscaling involved in view of different training sample resolutions. Further, we provide 6744 full images not containing any pedestrians from negative samples for training can be extracted.
The test dataset consists of an independent image sequence comprising 21790 PGM images (640x480 pixels) with 56492 manual labels, including 259 trajectories of fully visible pedestrians, captured from a moving vehicle in a 27 minute drive through urban traffic.
For installation, simply extract the provided .tgz archives. This will create the folders
Data/TestDataand
Data/TrainingData
Ground truth is given by manual annotations of bounding box locations for fully-visible pedestrians, pedestrian groups, partially occluded pedestrians, bicyclists and motorcyclists. For performance evaluation, we differentiate between the scenarios of generic pedestrian detection (evaluation in 2D) and pedestrian detection from a moving vehicle (evaluation in 3D). We provide two different ground truth files - one for each case.
Ground truth is given by bounding box location of pedestrians in pixel coordinates with the top-left pixel in the image being (0,0). Fully-visible pedestrians of at least 72 pixels height are marked as required (confidence = 1.0, see below). Smaller or partially occluded pedestrians, as well as bicyclists and motorcyclists, are considered optional (confidence = 0.0, see below). Please refer to the paper for details on the evaluation criteria and procedure. 2D ground truth is provided in Database format in the file
GroundTruth/GroundTruth2D.dbSpecification of the Database format is given below.
2D ground truth has been projected to 3D using camera calibration and the assumption of pedestrians standing on the ground. Fully-visible pedestrians within the sensor coverage area, +-4m in lateral direction and 10m-25m in front of the vehicle (12m-27m in front of the camera), are marked as required (confidence = 1.0, see below). Pedestrians not in the sensor coverage area or partially occluded pedestrians, as well as bicyclists and motorcyclists, are considered optional (confidence = 0.0, see below). Please refer to the paper for details on the evaluation criteria and procedure. 3D ground truth is provided in Database format in the file
GroundTruth/GroundTruth3D.dbSpecification of the Database format is given below.
Calibration/camera_parameters.txt
The camera is mounted inside the vehicle below the rear view mirror. The world coordinate system origin is below the camera on the ground-plane. x is pointing to the right, y is pointing down and z is pointing towards the driving direction The camera is 1.17m above the ground and 2.0m behind the front of the vehicle. 3D coordinates in the 3D ground truth file denote the central point of the pedestrian on the ground plane (y=0) in world coordinates. Because of the camera-vehicle offset, the considered 3D detection range of 10m-25m in front of the vehicle corresponds to a distance of 12m-27m from the camera in the world coordinate system.
2D and 3D ground truth files are provided in the ASCII Database format. Specification is given below.
: | sequence separator, initiates an image sequence entry | |
seq_id | string identifier describping the sequence | |
absolute_path | path to directory containing sequence images | |
nr_images | length of sequence, or -1 if unkown | |
; | image (frame) separator, initiates an image frame entry | |
image_name | image file name | |
image_width image_height | image size | |
0 nr_of_objects | ignore this entry; number of objects in the image | |
# object_class | 2D object separator, initiates an object entry in 2D (image) coordinates; object class: 0=fully-visible pedestrian 1=bicyclist 2=motorcyclist 10=pedestrian group 255=partially visible pedestrian, bicyclist, motorcyclist |
|
obj_id unique_id | object ID to identify trajectories of the same physical object; additional ID unique to each object entry |
|
confidence | confidence value indicating if this ground truth object is required (1.0) or optional (0.0) | |
min_x min_y max_x max_y | 2D bounding box coordinates (integer values, top-left in the image is 0,0) | |
0 | ignore this entry | |
... | (end of 2D object entry, more objects to follow) | |
§ object_class | 3D object separator, initiates an object entry in 3D (world) coordinates;
object class: 0=fully-visible pedestrian 1=bicyclist 2=motorcyclist 10=pedestrian group 255=partially visible pedestrian, bicyclist, motorcyclist |
|
obj_id unique_id | object ID to identify trajectories of the same physical object; additional ID unique to each object entry |
|
confidence | confidence value indicating if this ground truth object is required (1.0) or optional (0.0) | obj_min_x obj_min_y obj_min_z | 3D bounding box coordinates (float values) Here, obj_min_x = obj_max_x and obj_min_z = obj_max_z denoting the central point of the pedestrian in x and z. obj_min_y = obj_max_y = 0 (on the ground-plane). |
... | (end of 3D object entry, more objects to follow) | |
... | (end of image frame entry, more images to follow) | |
... | (end of image sequence entry, more sequences to follow) |
Authors who wish to evaluate pedestrian detectors on this dataset are encouraged to follow the benchmarking procedure and criteria as detailed in the publication given above.
The original authors would like to hear about other publications that make use of the benchmark data set in order to include corresponding references on the benchmark website.
For convenience, a Matlab parser for 2D and 3D ground truth is provided in the directory
GroundTruthParserSee
GroundTruthParser/example.mfor details on how to interpret the Database format.
Note that this software is provided "as is" without warranty of any kind.
Please direct questions regarding the dataset and benchmarking procedure to Prof. Dr. Dariu Gavrila or Markus Enzweiler.