Destruction from Sky: Weakly Supervised approach for Destruction Detection in Satellite Imagery
Architecture of proposed Weakly Supervised Attention Network (WSAN). Given an input training image, we divide it into fixed-size non-overlapping patches and obtain the visual features through DenseNet. For each visual feature (patch) the attention unit generates weights. After multiplying attention weights with their corresponding input feature vector and then taking mean of it, we send it to classifier unit which then classifies the whole large image to either destructed or non-destructed class.
Hard Negative Mining process. To reduce the false positive rate, we retrain the WSAN by considering high destruction score patches in destruction images as positive patches. For the negative patches, we use patches from non-destructed images and low confidence destruction score patches in destruction images.
Abstract:
Natural and man-made disasters cause huge damage to built infrastructures and results in loss of human lives. The rehabilitation efforts and rescue operations are hampered by non-availability of accurate and timely information regarding the location of damaged infrastructure and its extent. In this paper, we model the destruction in satellite imagery using deep learning model employing a weakly-supervised approach. In stark contrast to previous approaches, instead of solving the problem as change detection (using pre and post-event images), we model to identify destruction itself using single post-event image. To overcome the challenge of collecting pixel level ground truth data mostly used during training, we only assume image-level labels, representing either destruction is present (at any location) in a given image or not. Proposed attention based mechanism learns to identify the image-patches with destruction automatically under the sparsity constraint. Furthermore, to reduce false positive and improve segmentation quality, hard negative mining technique has been proposed that results in a considerable improvement over baseline. To validate our approach, we have collected new dataset containing destruction and non-destruction images from Indonesia, Yemen, Japan, and Pakistan. On testingdataset, we obtained excellent destruction results with pixel-level accuracy of 93 % and patch level accuracy of 91%. The source code and dataset will be made publicly available.
Qualitative Results:
WSAN: Weakly Supervised Attention Network, HNM: Hard Negative Mining, CRF: Conditional Random Field. (a) shows parts of large input test images, (b) shows output mask generated from our WSAN model at certain threshold, (c) shows output mask after re-training through hard negative mining and (d) shows final output mask after applying CRF. Finally, (e) shows images with the ground truth mask.
Input image overlayed with ground truth mask and predicted mask from our three step approach. The sky color shows intersection area of predicted and ground truth, the green color shows false positives and purple color shows false negatives.
Quantitative Results:
Table 1: Pixels level Accuracy, precision, recall and F1 score, IOU Score for different components of proposed approach.
Table 2: Patch level Accuracy, precision, recall and F1 score for different components of proposed approach.
Links:
arXiv (WSAN.pdf)
Github ( WSAN)
Dataset ( Destruction )
Keywords:
Semantic Segmentation, Wsan, Unsupervised Learning, Attention Network, Crf , Negative Hard Mining
BIBTEX:
——
Contact: mscs16045@itu.edu.pk