A DATASET AND BENCHMARK FOR MALARIA LIFE-CYCLE CLASSIFICATION IN THIN BLOOD SMEAR IMAGES

Publication

Qazi Ammar Arshad, Mohsen Ali, Saeed-ul Hassan, Chen Chen, Ayisha Imran, Ghulam Rasul, and Waqas Sultani, A Dataset and Benchmark for Malaria Life-Cycle Classification in Thin Blood Smear Images, Arxiv (2021).

Abstract

Malaria microscopy, microscopic examination of stained blood slides to detect parasite Plasmodium, is considered to be a gold-standard for detecting life-threatening disease malaria. Detecting the plasmodium parasite requires a skilled examiner and may take up to 10 to 15 minutes to completely go through the whole slide. Due to a lack of skilled medical professionals in the underdeveloped or resource deficient regions, many cases go misdiagnosed, which result in unavoidable medical complications. We propose to complement the medical professionals by creating a deep learning-based method to automatically detect (localize) the plasmodium parasites in the photograph of stained film. To handle the unbalanced nature of the dataset, we adopt a two-stage approach. Where the first stage is trained to detect blood cells and classify them into just healthy or infected. The second stage is trained to classify each detected cell further into the malaria life-cycle stage. To facilitate the research in machine learning-based malaria microscopy, we introduce
a new large scale microscopic image malaria dataset. Thirty-eight thousand cells are tagged from the 345 microscopic images of different Giemsa-stained slides of blood samples. Extensive experimentation is performed using different Convolutional Neural Networks on this
dataset. Our experiments and analysis reveal that the two-stage approach works better than the one-stage approach for malaria detection. To ensure the usability of our approach, we have also developed a mobile app that will be used by local hospitals for investigation and educational purposes. The dataset, its annotations, and implementation codes will be released upon publication of the paper.

Contribution

In this paper, we present a two-stage approach for malaria detection and malaria life-cycle-stage classification. During the first stage, given the image captured by a microscopic camera or mobile phone, microscopic cells are efficiently segmented using morphological operation and watershed algorithm. This is followed by the extraction of deep convolution features for malaria versus non-malaria classification. Once the malaria cells are detected, in the second stage, we employ another deep classifier for the malaria stage classifications.
Currently, no large public microscopic image dataset annotating malaria is available, especially one from the developing country. We have collected samples from a local hospital in Lahore, Pakistan, and got them annotated from the expert. Our dataset contains P.vivax malaria species in four different life cycle stages including Ring, Schizont, Trophozoite, Gametocyte. Finally, to make our approach to be used in practical scenarios, we have also developed a mobile app that makes our approach user friendly. Extensive experiments are conducted to handle the bias introduced due to the class imbalance in the dataset. The experimental results indicate the efficacy of our approach.

Method

The flow diagram of the proposed malaria detection approach. Processes start with an image captured with a mobile phone/microscopic camera followed by the pre-processing binary mask generation. After segmenting the cells using the watershed algorithm, individual cells are fed into our binary classification network which separates healthy and infected cells. Finally, the infected cells are fed into another multi-class classification model for life cycle-stage detection.

IML Malaria Dataset

We have collected a new malaria dataset that is captured with an XSZ-107 series microscope. Globally 49% cases of malaria are reported due to P.vivax, therefore, we concentrate on collecting images of microscopic slides infected by genus of plasmodium. Blood smears were prepared by spreading thin blood film on the slide and were left to dry. After fixing the blood smear through methanol, the slide is stained using Giemsa
solution for 15 – 20 minutes. Finally, the cover-slips are permanently fixed on the slide to keep them safe. Our thin-film blood slides are prepared by trained experts in labs. Images are captured with the help of a camera mounted on a microscope at 100x 2 objective magnification. Images are annotated and captured by an expert hematologist, at a local institute. The dataset contains 345 images consisting of 111 blood cells on average, both healthy and ones infected by the parasite. We annotate the infected cells into classes per life-cycle stage,
that is, Ring, Trophozoite, Schizont, and Gametocyte as shown in the figure below.

In addition to this bounding box for each cell is also annotated. The following figure shows some examples of images in our dataset.

BibTex

@article{arshad2021dataset,
title={A Dataset and Benchmark for Malaria Life-Cycle Classification in Thin Blood Smear Images},
author={Arshad, Qazi Ammar and Ali, Mohsen and Hassan, Saeed-ul and Chen, Chen and Imran, Ayisha and Rasul, Ghulam and Sultani, Waqas},
journal={arXiv preprint arXiv:2102.08708},
year={2021}
}