IM

PUBLICATIONS

WEAKLY-SUPERVISED DOMAIN ADAPTATION FOR BUILT-UP REGION SEGMENTATION IN AERIAL AND SATELLITE IMAGERY

Built-up area estimation is an important component in understanding the human impact on the environment, effect of public policy and in general urban population analysis. The diverse nature of aerial and satellite imagery (capturing different geographical locations, terrains and weather conditions) and lack of labeled data covering this diversity makes machine learning algorithms difficult to generalize for such tasks, especially across multiple domains. Re-training for new domain is both computationally and labor expansive mainly due to the cost of collecting pixel level labels required for the segmentation task. Domain adaptation algorithms have been proposed to enable algorithms trained on images of one domain (source) to work on images from other dataset (target). Unsupervised domain adapta- tion is a popular choice since it allows the trained model to adapt without requiring any ground-truth information of the target domain. On the other hand, due to the lack of strong spatial context and structure, in comparison to the ground imagery, application of existing unsupervised domain adaptation methods results in the sub-optimal adap- tation. We thoroughly study limitations of existing domain adaptation methods and propose a weakly-supervised adaptation strategy where we assume image level labels are available for the target domain. More specifically, we design a built-up area seg- mentation network (as encoder-decoder), with image classification head added to guide the adaptation. The devised system is able to address the problem of visual differences in multiple satellite and aerial imagery datasets, ranging from high resolution (HR) to very high resolution (VHR), by investigating the latent space as well as the structured output space.

A realistic and challenging HR dataset is created by hand-tagging the 73.4 sq-km of Rwanda, capturing a variety of build-up structures over different terrain. The devel- oped dataset is spatially rich compared to existing datasets and covers diverse built-up scenarios including built-up areas in forests and deserts, mud houses, tin and colored rooftops. Extensive experiments are performed by adapting from the single-source domain datasets, such as Massachusetts Buildings Dataset, to segment out the target domain. We achieve high gains ranging 11.6%-52% in IoU over the existing state-of- the-art methods.

JAVED IQBAL AND MOHSEN ALI

ACCEPTED IN EUROPEAN CONFERENCE ON COMPUTER VISION (ECCV) 2020

PDF

LOCALIZING FIREARM CARRIERS BY IDENTIFYING HUMAN-OBJECT PAIRS

Visual identification of gunmen in a crowd is a challenging problem, that requires resolving the association of a person with an object (firearm). We present a novel approach to address this problem, by defining human-object interaction (and non-interaction) bounding boxes. In a given image, human and firearms are separately detected. Each detected human is paired with each detected firearm, allowing us to create a paired bounding box that contains both object and the human.A network is trained to classify these paired-bounding-boxes into human carrying the identified firearm or not. Extensive experiments were performed to evaluate effectiveness of the algorithm, including exploiting full pose of the human, hand key-points, and their association with the firearm. The knowledge of spatially localized features is key to success of our method by using multi-size proposals with adaptive average pooling. We have also extended a previously firearm detection dataset, by adding more images and tagging in extended dataset the human-firearm pairs (including bounding boxes for firearms and gunmen). The experimental results (AP = 78.5) demonstrate effectiveness of the proposed method.

ABDUL BASIT, M. AKHTAR MUNIR, MOHSEN ALI AND ARIF MAHMOOD

ACCEPTED IN ICIP2020

Show More

PDF

EXPLOITING GEOMETRIC CONSTRAINTS ON DENSE TRAJECTORIES FOR MOTION SALIENCY

The existing approaches for salient motion segmentation are unable to explicitly learn geometric cues and often give false detections on prominent static objects. We exploit multiview geometric constraints to avoid such mistakes. To handle nonrigid background like sea, we also propose a robust fusion mechanism between motion and appearance-based features. We find dense trajectories, covering every pixel in the video, and propose trajectory-based epipolar distances to distinguish between background and foreground regions. Trajectory epipolar distances are data-independent and can be readily computed given a few features’ correspondences in the images. We show that by combining epipolar distances with optical flow, a powerful motion network can be learned. Enabling the network to leverage both of these information, we propose a simple mechanism, we call input-dropout. We outperform the previous motion network on DAVIS-2016 dataset by 5.2% in mean IoU score. By robustly fusing our motion network with an appearance network using the proposed input-dropout, we also outperform the previous methods on DAVIS-2016, 2017 and Segtrackv2 dataset.

MUHAMMAD FAISAL, IJAZ AKHTER, MOHSEN ALI AND RICHARD HARTLEY

Accepted in WACV-2020

Show More

PDF

MULTI-FOCUS IMAGE FUSION USING CONTENT ADAPTIVE BLURRING

Multi-focus image fusion has emerged as an important research area in information fusion. It aims at increasing the depth-of-field by extracting focused regions from multiple partially focused images, and merging them together to produce a composite image in which all objects are in focus. In this paper, a novel multi-focus image fusion algorithm is presented in which the task of detecting the focused regions is achieved using a Content Adaptive Blurring (CAB) algorithm. The proposed algorithm induces non-uniform blur in a multi-focus image depending on its underlying content. In particular, it analyzes the local image quality in a neighborhood and determines if the blur should be induced or not without losing local image quality. In CAB, pixels belonging to the blur regions receive little or no blur at all, whereas the focused regions receive significant blur. Absolute difference of the original image and the CAB-blurred image yields initial segmentation map, which is further refined using morphological operators and graph-cut techniques to improve the segmentation accuracy. Quantitative and qualitative evaluations and comparisons with current state-of-the-art on two publicly available datasets demonstrate the strength of the proposed algorithm.

MUHAMMAD SHAHID FARID, ARIF MAHMOOD, SOMAYA ALI AL-MAADEED

INFORMATION FUSION 2019

Show More

PDF

AUTOMATIC IMAGE TRANSFORMATION FOR INDUCING AFFECT

Current image transformation and recoloring algorithms try to introduce artistic effect in the photographed images, based on users input of target image(s) or selection of pre-designed filters. In this paper we present an automatic image-transformation method that transforms the source image such that it induces an emotional affect on the viewer, as desired by the user. Our method can handle much more diverse set of images than previous methods. A discussion and reasoning of failure cases has been provided, indicating inherent limitation of color-transfer based methods in use of emotion assignment. Note: Project Page is titled “Emotional Filters: AUTOMATIC IMAGE TRANSFORMATION FOR INDUCING AFFECT”.

AFSHEEN RAFAQAT ALI, MOHSEN ALI

BRITISH MACHINE VISION CONFERENCE (BMVC) 2017

Show More

PDF

DECONSTRUCTING BINARY CLASSIFIERS IN COMPUTER VISION

This paper develops the novel notion of deconstructive learning and proposes a practical model for deconstructing a broad class of binary classifiers commonly used in vision applications. Specifically, the problem studied in this paper is: Given an image-based binary classifier CC as a black-box oracle, how much can we learn of its internal working by simply querying it? In particular, we demonstrate that it is possible to ascertain the type of kernel function used by the classifier and the number of support vectors using only image queries and ascertain the unknown feature space too.

MOHSEN ALI, JEFFREY HO

ASIAN CONFERENCE ON COMPUTER VISION (ACCV) 2014

Show More

PDF

DECONSTRUCTING BINARY CLASSIFIERS IN COMPUTER VISION

MOHSEN ALI, JEFFREY HO

ASIAN CONFERENCE ON COMPUTER VISION (ACCV) 2014

Show More

PDF

Learning from Scale-Invariant Examples for Domain Adaptation in Semantic Segmentation

Self-supervisedlearningapproachesforunsuperviseddomain adaptation (UDA) of semantic segmentation models suffer from chal- lenges of predicting and selecting reasonable good quality pseudo labels. In this paper, we propose a novel approach of exploiting scale-invariance property of the semantic segmentation model for self-supervised domain adaptation. Our algorithm is based on a reasonable assumption that, in general, regardless of the size of the object and stuff (given context) the semantic labeling should be unchanged. We show that this constraint is violated over the images of the target domain, and hence could be used to transfer labels in-between differently scaled patches. Specifically, we show that semantic segmentation model produces output with high entropy when presented with scaled-up patches of target domain, in comparison to when presented original size images. These scale-invariant examples are extracted from the most confident images of the target domain. Dy- namic class specific entropy thresholding mechanism is presented to filter out unreliable pseudo-labels. Furthermore, we also incorporate the focal loss to tackle the problem of class imbalance in self-supervised learn- ing. Extensive experiments have been performed, and results indicate that exploiting the scale-invariant labeling, we outperform existing self- supervised based state-of-the-art domain adaptation methods. Specifi- cally, we achieve 1.3% and 3.8% of lead for GTA5 to Cityscapes and SYNTHIA to Cityscapes with VGG16-FCN8 baseline network.

PDF

MLSL: MULTI-LEVEL SELF-SUPERVISED LEARNING FOR DOMAIN ADAPTATION WITH SPATIALLY INDEPENDENT AND SEMANTICALLY CONSISTENT LABELING

Most of the recent Deep Semantic Segmentation algorithms suffer from large generalization errors, even when powerful hierarchical representation models based on convolutional neural networks have been employed. This could be attributed to limited training data and large distribution gap in train and test domain datasets.In this paper, we propose a multi-level self-supervised learning model for domain adaptation of semantic segmentation.Exploiting the idea that an object (and most of the stuff given context) should be labeled consistently regardless of its location, we generate spatially independent and semantically consistent (SISC) pseudo-labels by segmenting multiple sub-images using base model and designing an aggregation strategy. Image level pseudo weak-labels, PWL, are computed to guide domain adaptation by capturing global context similarity in source and domain at latent space level. Thus helping latent space learn the representation even when there are very few pixels belonging to the domain category (small object for example) compared to rest of the image.Our multi-level Self-supervised learning (MLSL) outperforms existing state-of-art (self or adversarial learning) algorithms. Specifically, keeping all setting similar and employing MLSL we obtain an mIoU gain of 5.1% on GTA-V to Cityscapes adaptation and 4.3% on SYNTHIA to Cityscapes adaptation compared to existing state-of-art method.

JAVED IQBAL AND MOHSEN ALI

ACCEPTED IN WACV2020

Show More

PDF

TWIN-NET DESCRIPTOR: TWIN NEGATIVE MINING WITH QUAD LOSS FOR PATCH BASED MATCHING

Local keypoint matching is an important step for computer vision based tasks. In recent years, Deep Convolutional Neural Network (CNN) based strategies have been employed to learn descriptor generation to enhance keypoint matching accuracy. Recent state-of-art works in this direction primarily rely upon a triplet based loss function (and its variations) utilizing three samples: an anchor, a positive and a negative. In this work we propose a novel “Twin Negative Mining” based sampling strategy coupled with a Quad loss function to train a deep neural network based pipeline (Twin-Net) for generating a robust descriptor that provides an increased discriminatory power to differentiate between patches that do not correspond to each other. Our sampling strategy and choice of loss function is aimed at placing an upper bound that descriptors of two patches representing same location could be at worst no more dissimilar than the descriptors of two similar looking patches that do-not belong to same 3D location. This results in an increase in the generalization capability of the network and outperforms its existing counterparts when trained over the same datasets. Twin-Net outputs a 128-dimensional descriptor and uses L2 Distance as the similarity metric, and hence conforms to the classical descriptor matching pipelines such as that of SIFT. Our results on Brown and HPatches datasets demonstrate Twin-Net’s consistently better performance as well as better discriminatory and generalization capability as compared to the state-of-art.

AMAN IRSHAD, REHAN HAFIZ, MOHSEN ALI, YONGJU CHO, AND JEONGIL SEO

IEEE ACCESS

Show More

PDF

MOVING OBJECT DETECTION IN COMPLEX SCENE USIN SPATIOTEMPORAL STRUCTURED-SPARSE RPCA

Moving object detection is a fundamental step in various computer vision applications. Robust principal component analysis (RPCA)-based methods have often been employed for this task. However, the performance of these methods deteriorates in the presence of dynamic background scenes, camera jitter, camouflaged moving objects, and/or variations in illumination. It is because of an underlying assumption that the elements in the sparse component are mutually independent, and thus the spatiotemporal structure of the moving objects is lost. To address this issue, we propose a spatiotemporal structured sparse RPCA algorithm for moving objects detection, where we impose spatial and temporal regularization on the sparse component in the form of graph Laplacians. Each Laplacian corresponds to a multi-feature graph constructed over superpixels in the input matrix. We enforce the sparse component to act as eigenvectors of the spatial and temporal graph Laplacians while minimizing the RPCA objective function. These constraints incorporate a spatiotemporal subspace structure within the sparse component. Thus, we obtain a novel objective function for separating moving objects in the presence of complex backgrounds. The proposed objective function is solved using a linearized alternating direction method of multipliers based batch optimization. Moreover, we also propose an online optimization algorithm for real-time applications. We evaluated both the batch and online solutions using six publicly available data sets that included most of the aforementioned challenges. Our experiments demonstrated the superior performance of the proposed algorithms compared with the current state-of-the-art methods.

SAJID JAVED , ARIF MAHMOOD , SOMAYA AL-MAADEED , THIERRY BOUWMANS , AND SOON KI JUNG , SENIOR MEMBER, IEEE

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 28, NO. 2, FEBRUARY 2019

Show More

PDF

MORE FOR LESS: INSIGHTS INTO CONVOLUTIONAL NETS FOR 3D POINT CLOUD RECOGNITION

With the recent breakthrough in commodity 3D imaging solutions such as depth sensing, photogrammetry, stereoscopic vision and structured light, 3D shape recognition is becoming an increasingly important problem. A longstanding question is what should be the format of the 3D shape (such as voxel, mesh, point-cloud etc.) and what could be a good generic feature representation for shape recognition. This question is particularly important in the context of convolutional neural network (CNN) whose efficacy and complexity depends upon the choice of input shape format and the design of network. It has been seen that both 3D voxel representation as well as collection of rendered views on 2D images have produced competing results. Similarly, it have been seen that networks with few million parameters and networks with several hundred million parameters have similar performance. In this work we compare these solutions and provide an analysis on the factors resulting in increase in the parameters without significantly improving accuracy. On the basis of the above analysis we propose a representation method (point cloud to 2D grid) and architecture that results in much less parameters for the CNN but has competing accuracy.

USAMA SHAFIQ, MURTAZA TAJ, MOHSEN ALI

INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP) 2017

Show More

PDF

USING SATELLITE IMAGERY FOR GOOD: DETECTING COMMUNITIES IN DESERT AND MAPPING VACCINATION ACTIVITIES

Deep convolutional neural networks (CNNs) have outperformed existing object recognition and detection algorithms. This paper describes a deep learning approach that analyzes a geo-referenced satellite image and efficiently detects built structures in it. A Fully Convolutional Network (FCN) is trained on low-resolution Google earth satellite imagery in order to achieve the end result. The detected built communities are then correlated with the vaccination activity.

ANZA SHAKEEL, MOHSEN ALI

ARXIV 2017

Show More

PDF

HIGH-LEVEL CONCEPTS FOR AFFECTIVE UNDERSTANDING OF IMAGES

This paper aims to bridge the affective gap between image content and the emotional response of the viewer, it elicits, by using High-Level Concepts (HLCs). In contrast to previous work that relied solely on low-level features or used convolutional neural network (CNN) as a blackbox, we use HLCs generated by pre-trained CNNs in an explicit way to investigate the relations/associations between these HLCs and a(small)set of Ekmanâ€™s emotional classes. Experimental results have demonstrated that our results are comparable to existing methods, with a clear view of the association between HLCs and emotional classes that is ostensibly missing in most existing work.

PUBLICATIONS

WEAKLY-SUPERVISED DOMAIN ADAPTATION FOR BUILT-UP REGION SEGMENTATION IN AERIAL AND SATELLITE IMAGERY

JAVED IQBAL AND MOHSEN ALI

ACCEPTED IN EUROPEAN CONFERENCE ON COMPUTER VISION (ECCV) 2020

LOCALIZING FIREARM CARRIERS BY IDENTIFYING HUMAN-OBJECT PAIRS

ABDUL BASIT, M. AKHTAR MUNIR, MOHSEN ALI AND ARIF MAHMOOD

ACCEPTED IN ICIP2020

EXPLOITING GEOMETRIC CONSTRAINTS ON DENSE TRAJECTORIES FOR MOTION SALIENCY

MUHAMMAD FAISAL, IJAZ AKHTER, MOHSEN ALI AND RICHARD HARTLEY

MULTI-FOCUS IMAGE FUSION USING CONTENT ADAPTIVE BLURRING

MUHAMMAD SHAHID FARID, ARIF MAHMOOD, SOMAYA ALI AL-MAADEED

INFORMATION FUSION 2019

AUTOMATIC IMAGE TRANSFORMATION FOR INDUCING AFFECT

AFSHEEN RAFAQAT ALI, MOHSEN ALI

BRITISH MACHINE VISION CONFERENCE (BMVC) 2017

DECONSTRUCTING BINARY CLASSIFIERS IN COMPUTER VISION

MOHSEN ALI, JEFFREY HO

ASIAN CONFERENCE ON COMPUTER VISION (ACCV) 2014

DECONSTRUCTING BINARY CLASSIFIERS IN COMPUTER VISION

MOHSEN ALI, JEFFREY HO

ASIAN CONFERENCE ON COMPUTER VISION (ACCV) 2014

Learning from Scale-Invariant Examples for Domain Adaptation in Semantic Segmentation

MLSL: MULTI-LEVEL SELF-SUPERVISED LEARNING FOR DOMAIN ADAPTATION WITH SPATIALLY INDEPENDENT AND SEMANTICALLY CONSISTENT LABELING

JAVED IQBAL AND MOHSEN ALI

ACCEPTED IN WACV2020

TWIN-NET DESCRIPTOR: TWIN NEGATIVE MINING WITH QUAD LOSS FOR PATCH BASED MATCHING

AMAN IRSHAD, REHAN HAFIZ, MOHSEN ALI, YONGJU CHO, AND JEONGIL SEO

IEEE ACCESS

MOVING OBJECT DETECTION IN COMPLEX SCENE USIN SPATIOTEMPORAL STRUCTURED-SPARSE RPCA

SAJID JAVED , ARIF MAHMOOD , SOMAYA AL-MAADEED , THIERRY BOUWMANS , AND SOON KI JUNG , SENIOR MEMBER, IEEE

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 28, NO. 2, FEBRUARY 2019

MORE FOR LESS: INSIGHTS INTO CONVOLUTIONAL NETS FOR 3D POINT CLOUD RECOGNITION

USAMA SHAFIQ, MURTAZA TAJ, MOHSEN ALI

INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP) 2017

USING SATELLITE IMAGERY FOR GOOD: DETECTING COMMUNITIES IN DESERT AND MAPPING VACCINATION ACTIVITIES

ANZA SHAKEEL, MOHSEN ALI

ARXIV 2017

HIGH-LEVEL CONCEPTS FOR AFFECTIVE UNDERSTANDING OF IMAGES

AFSHEEN RAFAQAT ALI, USMAN SHAHID, MOHSEN ALI, JEFFREY HO

WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV) 2017