Estimation of BMI from facial images using semantic segmentation based region-aware pooling

Paper Estimation of BMI from facial images using semantic segmentation based region-aware pooling
Data Set VIP-attribute, VisualBMI, and Bollywood dataset
Contact Waqas Sultani (


Nadeem Yousaf, Sarfaraz Hussein and Waqas Sultani, Estimation of BMI from Facial Images using Semantic Segmentation based Region-Aware Pooling, Computers in Biology and Medicine 2021.



Body-Mass-Index (BMI) conveys important information about one’s life such as health and socio-economic conditions. Large-scale automatic estimation of BMIs can help predict several societal behaviors such as health, job opportunities, friendships, and popularity. The recent works have either employed hand-crafted geometrical face features or face-level deep convolutional neural network features for face to BMI prediction. The hand-crafted geometrical face feature lack generalizability and face-level deep features don’t have detailed local information. Although useful, these methods missed the detailed local information which is essential for exact BMI prediction. In this paper, we propose to use deep features that are pooled from different face regions (eye, nose, eyebrow, lips, etc.,) and demonstrate that this explicit pooling from face regions can significantly boost the performance of BMI prediction. To address the problem of accurate and pixel-level face regions localization, we propose to use face semantic segmentation in our framework. Extensive experiments are performed using different Convolutional Neural Network (CNN) backbones including FaceNet and VGG-face on three publicly available datasets: VisualBMI, Bollywood and VIP attributes. Experimental results demonstrate that, as compared to the recent works, the proposed Reg-GAP gives a percentage improvement of 22.4% on VIP-attribute, 3.3% on VisualBMI, and 63.09% on the Bollywood dataset.



In this thesis, we addressed the main challenge of predicting the BMI from facial regions. We modified a face semantic segmentation-based network to generate the mask for various face regions. We also modified the FaceNet and VGGFace to use these mask at deeper layers to generate the features for each face region so that our model can put attention to the regions which are contributing the most toward the prediction of the BMI and later these features are fed to our regression module, which is designed for the BMI prediction.

To summarize, our method has the following key contributions:

  • We modified the BiseNet module to generate masks for each region.
  • We designed a module to pre-process these masks.
  • We modified VGGFace & FaceNet by using Region masks at the last convolutional layers.
  • We performed REG-Gap to extract features related to regions.
  • We have also performed GAP to compare our improvement.•
  • We performed a t-SNE analysis to report the robustness of features in classification tasks as well.
  • We defined a regression module that predicts the final BMI.



The pipeline of the proposed approach. Given the input image, we crop the face region employing face detection. Each face region mask is obtained through face semantic segmentation. After that, face region masks are element-wise multiplied with the convolution feature maps to give high weights to different face regions. Global average pooling is then applied to each masked convolution map separately. Finally, we employ the regression module to obtain the BMI prediction.


Face Semantic Segmentation

Examples of face semantic segmentation. The first and the third column show the face images from the VIP
attribute dataset and VisualBMI dataset while the second and the fourth column shows their resultant face semantic segmentation.

Given the face semantic segmentation, we extract different face regions and explicitly pool features from those regions. The regions are (Left to Right): ear, eyes, eyebrow, hair, lips, neck, nose, and skin. The bottom two rows show the binary mask obtained from segmentation and the top two rows show region corresponding feature maps. The first and third row samples are from the VisualBMI dataset while the second and fourth row samples are from the VIP attribute dataset.


  title={Estimation of BMI from facial images using semantic segmentation based region-aware pooling},
  author={Yousaf, Nadeem and Hussein, Sarfaraz and Sultani, Waqas},
  journal={Computers in Biology and Medicine},