| Paper | Leveraging Orientation for Weakly Supervised Object Detection with Application to Firearm Localization |
| Data Set | Will be available soon |
| Contact | Mohsen Ali (mohsen.ali@itu.edu.pk) |
Javed Iqbal, Muhammad Akhtar Munir, Arif Mahmood, Afsheen Rafaqat Ali, and Mohsen Ali, Leveraging Orientation for Weakly Supervised Object Detection with Application to Firearm Localization, Neurocomputing (2021).
Automatic detection of firearms is important for enhancing the security and safety of people, however, it is a challenging task owing to the wide variations in shape, size and appearance of firearms. Also, most of the generic object detectors process axis-aligned rectangular areas though, a thin and long rifle may actually cover only a small percentage of that area and the rest may contain irrelevant details suppressing the required object signatures. To handle these challenges, we propose a weakly supervised Orientation Aware Object Detection (OAOD) algorithm which learns to detect oriented object bounding boxes (OBB) while using Axis Aligned Bounding Boxes (AABB) for training. The proposed OAOD is different from the existing oriented object detectors which strictly require OBB during training which may not always be present. The goal of training on AABB and detection of OBB is achieved by employing a multistage scheme, with Stage-1 predicting the AABB and Stage-2 predicting OBB. In-between the two stages, the oriented proposal generation module along with the object aligned RoI pooling is designed to extract features based on the predicted orientation and to make these features orientation invariant. A diverse and challenging dataset consisting of eleven thousand images is also proposed for firearm detection which is manually annotated for firearm classification and localization. The proposed ITU Firearm dataset (ITUF) contains a wide range of guns and rifles. The OAOD algorithm is evaluated on the ITUF dataset and compared with current state-of-the-art object detectors, including fully supervised oriented object detectors. OAOD has outperformed both types of object detectors with a significant margin. The experimental results (mAP: 88.3 on AABB & mAP: 77.5 on OBB) demonstrate effectiveness of the proposed algorithm for firearm detection.
Our proposed system predicts both axis-aligned as well as object aligned bounding boxes, while only being trained on the axis-aligned bounding boxes and orientation information in a weakly-supervised fashion. Main contributions of the current work include:
• We propose a weakly supervised deep learning architecture to predict Oriented Bounding Boxes (OBB) without using OBB annotations while training.
• Orientation classification and regression module are proposed to predict orientation from the axis-aligned region proposals.
• An Oriented Proposal Generation (OPG) module is proposed to generate Oriented Region Proposals (ORP) followed by Object Aligned RoI-pooling (OARoI-Pooling) to pool target object features while discarding the background noise. Such a setup results in the features that are independent of the object’s orientation simplifying the task of classification and bounding box regression. Thus improving the accuracy of classifier and bounding box regressor in the last stage.
• An extensive firearm dataset, ITU-Firearm (ITUF), is also proposed consisting of around 13647 annotated firearm instances in 10973 images.
• Our method achieves state-of-the-art performance compared to existing methods on the proposed ITUF dataset.
The architecture of proposed OAOD Algorithm is shown below: (a) computes deep features (b) outputs object and orientation classification along with respective AABB and orientation offsets (c) In Stage-2, the OPG module generates Oriented Region Proposals ORP using predictions from Stage-1. (d) OARoI-Pooling is applied to pool orientation independent features followed by bounding box regression (e) The final regression output are then used to generate OBB using inverse transformation

We have collected a large dataset of images containing firearms, named as ITUF. Axis-aligned bounding box (AABB) of each firearm in each image has
been hand-annotated. Dataset has been divided into training and testing splits, for the testing split OBB were also manually annotated to enable comparison with existing OBB predicting algorithms. As per our knowledge, ITUF is the first large firearm dataset in the public domain. ITUF captures varied scenes (indoor, outdoor, lighting conditions) & scenarios (firearms pointed, carried, lying on tables/ground/racks) and contains various makes and models of firearms (from pistols to AK-47). This diversity makes ITUF a challenging and realistic dataset for the firearm detection task.
ITUF was collected from the web by incorporating keywords, such as weapons, wars, pistol, movie names, firearms, types of firearms, sniper, shooter, corps, guns and rifles, in the web search. Results were cleaned to remove images not containing firearms, duplicates and synthetic
ones. The final dataset consists of 10, 973 fully annotated images with 13647 firearm instances.
We have divided firearms into two classes; ‘Gun’ class includes different variations of pistols and revolvers; whereas ‘Rifle’ class contains hunting-rifles to AK-47 (including small machine guns). AABB for each firearm in every image is tagged by an annotator, along with a class label and an angle representing the orientation of the firearm. Orientation is annotated as the angle made by line joining muzzle and the back tip (hammer or butt) of the firearm. Orientations are quantized into 8 bins as shown below.
