Netflix open-sources VOID, a video tool that removes objects — and their effects

What is VOID?
It has been reported that Netflix researchers released VOID (Video Object and Interaction Deletion), a model and pipeline that doesn't just paint over a person or object in a clip — it tries to erase the object and the physical interactions that follow. Want a guitarist gone and their guitar to clatter to the floor naturally? VOID aims to do that, not just remove the silhouette. That claim is striking: editing that respects downstream physics, not just pixels.
How it works
VOID is built on top of CogVideoX and fine‑tuned for interaction‑aware video inpainting. The repo includes two transformer checkpoints (Pass 1 for base inpainting, Pass 2 for a warped‑noise refinement); you can run Pass 1 alone or chain both for better temporal consistency. Mask generation is a multi‑stage process — Gemini (via Google’s API) and SAM2 are used to produce a quadmask video format that encodes four semantic regions per pixel (0 = primary object, 63 = overlap, 127 = affected region, 255 = background). Note: the background prompt in prompt.json should describe the clean scene after removal, not the removal itself.
Requirements and practical limits
This is not a one‑click magic wand. The project recommends a GPU with 40GB+ VRAM (A100 class) and calls out explicit setup steps (hf downloads, ffmpeg or imageio‑ffmpeg, SAM2 install, GEMINI_API_KEY). It has been reported that examples include objects falling or rolling after removal, but real‑world results will vary — consider this an impressive research demo more than a turnkey consumer tool. Ethical questions pop up immediately: content moderation, deepfake risks, and where automated “undo” edits belong in creativity and privacy.
Community and next steps
Netflix included demos and a Hugging Face Gradio demo; the repo encourages community builds and PRs. Expect forks, tools, and eyebrow‑raising applications — both helpful (cleaning shots, VFX) and concerning (misuse). So what now? Researchers will iterate, the community will experiment, and policy discussions will need to catch up. And yes — someone will try to delete themselves from a concert clip. What could possibly go wrong?
Sources: github.com/netflix, Hacker News
Comments