EpONet

Exploiting Geometric Constraints on Dense Trajectories for Motion Saliency

Abstract

The existing approaches for salient motion segmentation are unable to explicitly learn geometric cues and often give false detections on prominent static objects. We exploit multiview geometric constraints to avoid such mistakes. To handle nonrigid background like sea, we also propose a robust fusion mechanism between motion and appearance-based features. We find dense trajectories, covering every pixel in the video, and propose trajectory-based epipolar distances to distinguish between background and foreground regions. Trajectory epipolar distances are data-independent and can be readily computed given a few features’ correspondences in the images. We show that by combining epipolar distances with optical flow, a powerful motion network can be learned. Enabling the network to leverage both of these information, we propose a simple mechanism, we call input-dropout. We outperform the previous motion network on DAVIS-2016 dataset by 5.2% in mean IoU score. By robustly fusing our motion network with an appearance network using the proposed input-dropout, we also outperform the previous methods on DAVIS-2016, 2017 and Segtrackv2 dataset.

Overview

Quantitative Results

EpO-Net vs. Mp-Net on DAVIS-2016 dataset.

Comparison of our motion (EpO) and fusion network (EpO+), with state-of-the-art on DAVIS-2016 with intersection over union J , F-measure F, and temporal stability T . Best & second best scores have been bold and are underlined respectively. AGS uses eye-gaze data to train their network, whereas we only exploit information existent in the videos itself by enforcing the geometrical constraints.

Real Background Synthetic Foreground (RBSF) Dataset

We create our own synthetic dataset, called RBSF (Real Background, Synthetic foreground), by overlaying the 20 different foreground objects performing various movements with 5 different real background videos. We downloaded both the foregrounds and backgrounds videos from YouTube and mixed them using Video Editing Tool. The foreground objects are running dog, cat, camel, jumping human, football player, dancing girl etc. The backgrounds videos contained scenes capturing riversides, shipyard, house-outdoors, houses-indoors, and playgrounds. The videos are in stored in 720p (720×1280) resolution and foreground objects are fairly large size (30% to 50% of the frame). The reasonable fast motion of foreground objects allows us to compute accurate optical flow and long trajectories. It is easy to scale this dataset up. We use this dataset to only train the EpO (motion) network, using only optical flow and ED, and no appearance information, therefore, repeating background shall not affect learning process. However, for our requirements, the dataset is large enough, it consists of 100 videos and 19,797 frames in total, and secondly we observe that generating more data do not improve results, thanks to the well-constrained epipolar distances (ED). Therefore the backgrounds do not affect the training. Frames from few sequences from RBSF are shown below.

Few examples sequences are shown from our (RBSF) dataset.

Dataset Link

Coming Soon…

Github Repository:

To download code and pre-trained models for EpO-Net, please visit our github repository.

bibtex:

@article{faisal2019exploiting,
  title={Exploiting Geometric Constraints on Dense Trajectories for Motion Saliency},
  author={Faisal, Muhammad and Akhter, Ijaz and Ali, Mohsen and Hartley, Richard},
  journal={arXiv preprint arXiv:1909.13258},
  year={2019}
}

￩￫ x