Towards Improving Calibration in Object Detection Under Domain Shift
(Accepted @NeurIPS2022)

 

Muhammad Akhtar Munir1,2, Muhammad Haris Khan2, Muhammad Saquib Sarfraz3,4, Mohsen Ali1

 

1Information Technology University of the Punjab, 2Mohamed bin Zayed University of Artificial Intelligence, 3Karlsruhe Institute of Technology, 4Mercedes-Benz Tech Innovation

Abstract

The increasing use of deep neural networks in safety-critical applications requires the trained models to be well-calibrated. Most current calibration techniques address classification problems while focusing on improving calibration on in-domain predictions. Little to no attention is paid towards addressing calibration of visual object detectors which occupy similar space and importance in many decision making systems. In this paper, we study the calibration of current object detection models, particularly under domain shift. To this end, we first introduce a plug-and-play train-time calibration loss for object detection. It can be used as an auxiliary loss function to improve detector’s calibration. Second, we devise a new uncertainty quantification mechanism for object detection which can implicitly calibrate the commonly used self-training based domain adaptive detectors. We include in our study both single-stage and two-stage object detectors. We demonstrate that our loss improves calibration for both in-domain and out-of-domain detections with notable margins. Finally, we show the utility of our techniques in calibrating the domain adaptive object detectors in diverse domain shift scenarios.

Highlights

In this paper, we study the calibration of object detection models for both in-domain and out-of-domain detections. We observe that:

(1) Detection models demonstrate poor calibration for in-domain and out-of-domain detections

(2) Unsupervised domain adaptive detection models are rather miscalibrated when compared to their predictive accuracy in a target domain.

Towards developing well-calibrated object detection models, for both in-domain and out-of-domain scenarios, we propose a new plug-and-play loss formulation, termed as train-time calibration for detection (TCD). It can be used with task-specific loss functions during the training phase and acts as a regularization for detections.
In addition, we develop an implicit calibration technique for self-training based domain adaptive detectors.
Finally, we empirically show that this technique is complementary to our loss function and they both can be utilized during adaptation to further boost calibration under challenging domain shift detection
scenarios.

We validate the effectiveness of our loss function towards improving the calibration of different DNN-based object detection paradigms and different domain adaptive detection models under challenging domain shift scenarios.

Method

We propose a train-time calibration loss for object detection (TCD loss) that studies in-domain and out-domain scenarios. While minimizing the difference between confidence and accuracy for classification calibration, our loss function is designed also to minimize the difference between confidence and intersection over union that improves calibration for detectors. Further, we introduce an implicit calibration mechanism for improving self-training based domain adaptive detectors. It features a new uncertainty quantification mechanism for object detection and then leverages this to modulate the one-hot encoded pseudo-targets.

Figure: Reliability diagrams. Top row: DNN-based detector (FCOS) trained using task-specific loss. Bottom row: Ours, trained with adding the proposed TCD loss.

Datasets


Cityscapes dataset features images of road and street scenes and offers 2975 and 500 examples for training and validation, respectively. It comprises following categories: {person, rider, car, truck, bus, train, motorbike, and bicycle}.

Foggy Cityscapes dataset is constructed using Cityscapes dataset by simulating foggy weather utilizing depth maps provided in Cityscapes with three levels of foggy weather.

Sim10k dataset is a collection of synthesized images, comprising 10K images and their corresponding bounding box annotations.

KITTI dataset bears resemblance to Cityscapes as it features images of road scenes with wide view of area, except that KITTI images were captured with a different camera setup.
Following existing works, we consider car class for experiments when adapting from KITTI or Sim10k.

MSCOCO is a large-scale dataset for object detection containing 80 classes

PASCAL VOC 2012 dataset contains 20 classes (common with subset of MSCOCO 80 classes)

Qualitative Results

Visual depiction of calibration results for out-of-domain detections with one-stage detector (left column) and one-stage detector trained with our TCD loss (right column)

Publication

Paper Towards Improving Calibration in Object Detection Under Domain Shift(PDF)
Code https://github.com/akhtarvision/tcd_calib
Contact Muhammad Akhtar Munir (akhtar.munir@itu.edu.pk)

 

Authors’ Information:

Muhammad Akhtar Munir, PhD Student, Intelligent Machines Lab, ITU, Lahore, Pakistan
Email: akhtar.munir@itu.edu.pk
Web: https://akhtarvision.github.io/

Dr. Muhammad Haris Khan, Assistant Professor, MBZUAI, Abu Dhabi, UAE
Email: muhammad.haris@mbzuai.ac.ae
Web: https://mbzuai.ac.ae/pages/muhammad-haris/

Dr. Muhammad Saquib Sarfraz, Senior Scientist & Lecturer, KIT, Karlsruhe, Germany
Email: muhammad.sarfraz@kit.edu
Web: https://ssarfraz.github.io/

Dr. Mohsen Ali, Associate Professor, Intelligent Machines Lab, ITU, Lahore, Pakistan
Email: mohsen.ali@itu.edu.pk
Web: https://im.itu.edu.pk/