Computer Vision and Machine Learning (CVML) research group address the wide range of problems including Affective Computing, Remote Sensing, Object Detection, Semantic Segmentation and Anomaly Detection. The group has been successfully exploring and implementing novel techniques using computer vision and machine/deep learning algorithms. Our research work handles the theoretical and practical aspects in the area of deep learning and eager to solve locally relevant problems by using learning methods. CVML is motivated to design and apply the intelligent algorithms that efficiently perform the computer vision tasks.

Faculty Members

PhD - Computer Science, University of Florida, USA

Ph.D. Computer Science, Center for Research in Computer Vision, University of Central Florida, USA



Anamoly Detection

In addition to proposing a new anomaly detection method, we introduce a new large-scale first of its kind dataset of 128 hours of videos. It consists of 1900 long and untrimmed real-world surveillance videos, with 13 realistic anomalies such as Abuse, Arrest, Arson, Assault, Accident, Burglary, Explosion, Fighting, Robbery, Shooting as well as normal activities.
Please check out our CVPR 2018 paper:
PI: Dr. Waqas Sultani

Object Detection

Automatic detection of firearms is important for enhancing security and safety of people, however, it is a challenging task owing to the wide variations in shape, size, and appearance of firearms. Viewing angle variations and occlusions by the weapon’s carrier and the surrounding people, further increases the difficulty of the task. Moreover, the existing object detectors process rectangular areas, though a thin and long rifle may actually cover only a small percentage of that area and the rest may contain irrelevant details suppressing the required object signatures. To handle these challenges we propose an Orientation Aware Object Detector (OAOD) which has achieved improved firearm detection and localization performance. The OAOD algorithm is evaluated on the ITUF dataset and compared with the current state of the art object detectors.

Please check out our work and arXiv paper:
Paper: Orientation Aware Object Detection (OAOD) with Application to Firearms,
OAOD Web-Link,
Github: OAOD with Application to Firearms

PI: Dr. Mohsen Ali


Moving Object Segmentation in Videos

Video object segmentation aims at clustering pixels in videos into objects or background. In this project, instead of treating deep learning as a black box and fixating on infinite iterations on the network design, we have focused on more intelligent and informative input. We are exploiting the geometrical constraints between the frames to recover what is similar across the frames. These constraints have been studied extensively and unlike the data-dependence learning, they have closed-form solution (for the given correspondences in the images).
In Collaboration with: Dr Ijaz Postdoc Researcher at NUS
Work in progress
PI: Dr. Mohsen Ali

Domain Adaptation of Semantic Segmentation

Semantic segmentation is a challenging problem due to pixel level annotations requirement. Deep Convolutional Neural Networks (DCNNs) are performing with tremendous results on Semantic Segmentation problem but there are still limitation of training data for real-time applications. Domain Adaptation of Semantic segmentation tries to adapt the target domain data distribution without knowing labels to effectively do semantic segmentation in real-time scenarios. Generative adversarial Networks are also incorporated to learn the distribution of both the source and target data simultaneously and minimize their difference.

Please check out our work and recent papers:
Weakly-supervised domain adaptation for built-up region segmentation in aerial and satellite imagery [ISPRS Journal of Photogrammetry and Remote Sensing 2020],
MLSL: Multi-Level Self-Supervised Learning for Domain Adaptation with Spatially Independent and Semantically Consistent Labeling [WACV 2020],

PI: Dr. Mohsen Ali


Remote Sensing

We at ITU are studying satellite and aerial imagery to develop tools that will assist Government and Non-Government organization in analyzing urban population, road structure, urban and rural structural development, agricultural regions, animal migrations and destruction caused by natural disasters. Primarily we have studied satellite and aerial images in order to detect residential areas using computer vision and machine learning techniques.

Please check out our work and recent papers:
Weakly-supervised domain adaptation for built-up region segmentation in aerial and satellite imagery [ISPRS Journal of Photogrammetry and Remote Sensing 2020],
Destruction from sky: Weakly supervised approach for destruction detection in satellite imagery [ISPRS Journal of Photogrammetry and Remote Sensing 2020],
Deep built-structure counting in satellite imagery using attention based re-weighting [ISPRS Journal of Photogrammetry and Remote Sensing 2019],

PI: Dr. Mohsen Ali


Affective Computing

Affective Computing is a research field that addresses this problem and affectively analyze the multimedia to build emotionally intelligent machines, capable of better human machine interaction. The exciting applications of affective computing include affective content recommendation, abstraction and affective description generation. Affective content analysis also helps us find the reason why a specific content is evoking particular emotion in its viewers.

PI: Dr. Mohsen Ali


Medical Image Analysis

Vision impairment/loss is one of the major diseases in the world and according to World Health Organization (WHO), more than 285 million people are suffered from vision impairment, from which more than 39 million are blind. In this project we predict the blindness using retinal layers images containing Choroidal Neovascular (CNV) and Diabetic Macular Edema (DME), DRUSEN is for vision weakness and Normal is for observation only. This prediction of blindness results into anti Vascular Endothelial Growth Factor (VEGF) treatment, which stops the retinal disease and make retina properly working for vision.

I = {(x2, y2 ), (x2, y2),…, (xn, yn)}         xi ε Rd
Ylabels ε {0CNV, 1DME, 2DRUSEN, 3NORMAL}

PI: Dr. Mohsen Ali

Relevant projects can be found here