The results of the background matting algorithm
Using a handheld smartphone camera, we capture two images of a scene, one with the subject and one without. We employ a deep network with an adversarial loss to recover alpha matte and foreground color. We composite the result onto a novel background.


We propose a method for creating a matte – the per-pixel foreground color and alpha – of a person by taking photos or videos in an everyday setting with a handheld camera. Most existing matting methods require a green screen background or a manually created trimap to produce a good matte. Automatic, trimap-free methods are appearing, but are not of comparable quality. In our trimap free approach, we ask the user to take an additional photo of the background without the subject at the time of capture. This step requires a small amount of foresight but is far less timeconsuming than creating a trimap. We train a deep network with an adversarial loss to predict the matte. We first train a matting network with supervised loss on ground truth data with synthetic composites. To bridge the domain gap to real imagery with no labeling, we train another matting network guided by the first network and by a discriminator that judges the quality of composites. We demonstrate results on a wide variety of photos and videos and show significant improvement over the state of the art.


[Paper Arxiv], to appear in CVPR 2020

  title={Background Matting: The World is Your Green Screen},
  author = {Soumyadip Sengupta and Vivek Jayaram and Brian Curless and Steve Seitz and Ira Kemelmacher-Shlizerman},
  booktitle={Computer Vision and Pattern Regognition (CVPR)},

Blog Post, with simplified methods and discussions

Microsoft Virtual Stage, background matting with Kinect for virtual presentation! The demo was unveiled at MSBuild'20


Inference code released in Github.
Training code: Coming soon ...

Background Matting v2

Check out Background Matting v2.0 project, REAL-TIME & better quality (30fps at FHD and 60fps at 4K).

Captured videos for Background Matting

We capture 50 videos of subjects performing different motion with fix and hand-held camera in both indoor and outdoor settings. We also capture the background as the subject leaves the scene. We will soon release this data to help future research on Background Matting.

Comparison with existing methods

We show qualitative comparison w.r.t. Background subtraction, Semantic segmentation (Deeplabv3+) and Alpha matting techniques. For Alpha matting algorithms, we compare with state-of-the-art (i) trimap based methods Context Aware Matting (CAM) and Index Matting (IM), where trimap is automatically created from segmentation, and (ii) automatic matting algorithm Late Fusion Matting (LFM). Our algorithm is first trained on synthetic-composite Adobe dataset with supervision (Ours Adobe) and then on unlabelled real data with self-supervision and adversarial loss (Ours Real). We also show that trianing on real data improves matting quality.


The authors thank the labmates from UW GRAIL lab, Ellie Bridge and Andrey Ryabstev for their support in data capturing and helpful discussions. This work was supported by NSF/Intel Visual and Experimental Computing Award #1538618, the UW Reality Lab, Facebook, Google, Futurewei.


Soumyadip Sengupta