Synergizing between Self-Training and Adversarial Learning for Domain Adaptive Object Detection (Accepted @NeurIPS2021)


Muhammad Akhtar Munir1, Muhammad Haris Khan2, Muhammad Saquib Sarfraz3,4, Mohsen Ali1


1Information Technology University of the Punjab, 2Mohamed bin Zayed University of Artificial Intelligence, 3Karlsruhe Institute of Technology, 4Daimler TSS


We study adapting trained object detectors to unseen domains manifesting significant variations of object appearance, viewpoints and backgrounds. Most current methods align domains by either using image-level or instance-level feature alignment in an adversarial fashion. This often suffers due to the presence of unwanted background and as such lacks class-specific alignment since the target domain is unlabelled. A common remedy to promote class-level alignment is to use high confidence predictions on the new domain as pseudo labels. These high confidence predictions are often fallacious since the model is poorly calibrated under domain shift. In this paper, we propose to leverage model’s predictive uncertainty to strike the right balance between adversarial feature alignment and class-level alignment. Specifically, we measure predictive uncertainty on class assignments and the bounding box predictions. Certain predictions are used to generate pseudo labels for self-supervision, whereas the uncertain ones are used to generate tiles for an adversarial feature alignment stage. This synergy between tiling around the uncertain object regions and generating pseudo-labels from certain object regions allows us to capture both the image and instance level context during the model adaptation stage.
We perform extensive experiments covering various domain shift scenarios. Our approach improves upon existing state-of-the-art methods with visible margins.


Our key contributions include the following:
(1) We introduce a new uncertainty-guided framework that strikes the right balance between self-training and adversarial feature alignment for adapting object detection methods. Both pseudo-labelling for self-training and tiling for adversarial alignment are impactful due to their simplicity, generality and ease of implementation.
(2) We propose a method for estimating the object detection uncertainty via taking into account variations in both the localization prediction and confidence prediction across Monte-Carlo dropout inferences.
(3) We show that, selecting pseudo-labels with low uncertainty and using relatively uncertain regions for adversarial alignment, it is possible to address the poor calibration caused by domain shift, and hence improve model’s generalization across domains.
(4) Unlike most of the previous methods, we build on computationally efficient one-stage anchor-less object detectors and achieve state-of-the-art results with notable margins across various adaptation scenarios.


We propose to leverage model’s predictive uncertainty to strike the right balance between adversarial feature alignment and self-training. To this end, we introduce uncertainty-guided pseudo-labels selection (UGPL) for self-training and uncertainty-guided tiling (UGT) for adversarial alignment. The former allows generating accurate pseudo-labels to improve feature discriminability for class-specific alignment, while the latter enables extracting tiles on uncertain, object-like regions for effective domain alignment.

Figure: An illustration on which detections will be considered as pseudo-labels and which for extracting tiles. More certain detections, such as pedestrians are taken as pseudo-labels, whereas relatively uncertain ones, like cars under fog, are used for extracting tiles.

Figure: Overall architecture of our method. Fundamentally, it is a one-stage detector with an adversarial feature alignment stage. We propose uncertainty-guided self training with pseudo-labels (UGPL) and uncertainty-guided adversarial alignment via tiling (UGT) (in dotted boxes). UGPL produces accurate pseudo-labels in target image which are used in tandem with ground-truth labels in source image for training. UGT extracts tiles around possibly object-like regions in target image which are used with randomly extracted tiles around ground-truth labels in source domain for adversarial feature alignment.


Cityscapes dataset features images of road and street scenes and offers 2975 and 500 examples for training and validation, respectively. It comprises following categories: {person, rider, car, truck, bus, train, motorbike, and bicycle}.

Foggy Cityscapes dataset is constructed using Cityscapes dataset by simulating foggy weather utilizing depth maps provided in Cityscapes with three levels of foggy weather.

Sim10k dataset is a collection of synthesized images, comprising 10K images and their corresponding bounding box annotations.

KITTI dataset bears resemblance to Cityscapes as it features images of road scenes with wide view of area, except that KITTI images were captured with a different camera setup.
Following existing works, we consider car class for experiments when adapting from KITTI or Sim10k.

Qualitative Results


Paper Synergizing between Self-Training and Adversarial Learning for Domain Adaptive Object Detection (PDF)
Code Coming Soon
Contact Muhammad Akhtar Munir (


Authors’ Information:

Muhammad Akhtar Munir, PhD Student, Intelligent Machines Lab, ITU, Lahore, Pakistan

Dr. Muhammad Haris Khan, Assistant Professor, MBZUAI, Abu Dhabi, UAE

Dr. Muhammad Saquib Sarfraz, Senior Scientist & Lecturer, KIT, Karlsruhe, Germany

Dr. Mohsen Ali, Assistant Professor, Intelligent Machines Lab, ITU, Lahore, Pakistan



title={Synergizing between Self-Training and Adversarial Learning for Domain Adaptive Object Detection},

author={Munir, Muhammad Akhtar and Khan, Muhammad Haris and Sarfraz, M Saquib and Ali, Mohsen},

journal={arXiv preprint arXiv:2110.00249},

year={2021} }



NeurIPS (To be updated)

author = {Akhtar Munir, Muhammad and Haris Khan, Muhammad and Saquib Sarfaraz, Muhammad and Ali, Mohsen},
booktitle = {Advances in Neural Information Processing Systems},
editor = {},
pages = {},
publisher = {},
title = {Synergizing between Self-Training and Adversarial Learning for Domain Adaptive Object Detection},
url = {},
volume = {},
year = {2021}