How It Works
By default (onlyMainContent: true), Reader uses a multi-step algorithm:
1. Find Main Content Container
Reader looks for main content in this order:<main>element[role="main"]attribute- Single
<article>element - Common content IDs/classes (
#content,.post-content, etc.) - Largest text block (fallback heuristic)
2. Remove Navigation Chrome
If no main content container is found, Reader removes:<nav>,<header>,<footer>,<aside>- Sidebars, menus, breadcrumbs
- Social sharing, comments sections
- Newsletter forms, cookie banners
3. Always Remove
Regardless of mode, Reader always removes:- Scripts, styles, noscript, templates
- Hidden elements
- Overlays, modals, popups
- Cookie consent banners
- Fixed/sticky positioned elements
- Ads and tracking pixels
Controlling Extraction
Disable Main Content Extraction
For full-page capture (includes nav, header, footer):Include Specific Elements
Keep only specific elements using CSS selectors:Exclude Specific Elements
Remove specific elements:Combine Include and Exclude
CLI Options
HTML to Markdown
Reader uses supermarkdown for HTML to Markdown conversion, a high-performance Rust library with full GFM support.Supported Elements
| Element | Markdown Output |
|---|---|
| Headings | # H1, ## H2, etc. |
| Paragraphs | Plain text with blank lines |
| Lists | - or 1. |
| Links | [text](url) |
| Images |  |
| Code | `inline` or fenced blocks |
| Tables | GFM table syntax |
| Blockquotes | > quoted text |

