defuddle: Get the main content of any page as Markdown

What it is
Defuddle strips the fluff. Feed it a URL or HTML and it finds the main article, tossing comments, sidebars, headers, footers and other noise so you’re left with readable content — HTML or Markdown. It has been reported that Defuddle was created for the Obsidian Web Clipper, though it’s built to run anywhere: in the browser, in Node.js, or from the command line. Beware: the project’s README warns it’s “very much a work in progress.” So proceed with curiosity, not blind faith.
How it works
Unlike some extractors that play fast and loose, Defuddle aims to be more forgiving and consistent: fewer uncertain removals, special handling for footnotes, math and code blocks, and richer metadata extraction (including schema.org). It even uses a page’s mobile styles as a heuristic to guess what’s unnecessary — clever, and a bit cheeky. The author positions it as an alternative to Mozilla Readability, with notable differences in behavior and output.
Why it matters
Why should you care? Because manual cleanup of clipped articles is a drag. If you save notes to Obsidian, feed a personal archive, or build an RSS-to-Markdown pipeline, getting a predictable, clean Markdown output saves time and sanity. Want reliable metadata too — author, published date, main image? Defuddle promises that. It’s a small tool for a common frustration, and when it works, it’s the little jolt of relief every information worker needs.
Getting started
Install with npm (npm install defuddle), or run it ad-hoc with npx. It ships a browser API, a Node-friendly module (works with JSDOM, linkedom, etc.), and a CLI with options for JSON, Markdown, debug mode and file output. One caveat: for defuddle/node to import properly you’ll need "type": "module" in package.json. Try it, tweak it, and expect improvements — this is an emerging utility, not a finished saint.
Sources: github.com/kepano, Lobsters
Comments