How It Works
When you callcrawl(), Reader:
- Fetches the seed URL and extracts all links
- Filters links by domain, patterns, and robots.txt
- Queues new URLs using breadth-first search
- Continues until depth or page limits are reached
- Optionally scrapes content from discovered pages
Basic Usage
Crawl with Scraping
To also scrape the content of discovered pages:Crawl Options
| Option | Default | Description |
|---|---|---|
url | required | Seed URL to start crawling |
depth | 1 | Maximum crawl depth |
maxPages | 20 | Maximum pages to discover |
scrape | false | Also scrape content |
delayMs | 1000 | Delay between requests |
includePatterns | [] | URL patterns to include (regex) |
excludePatterns | [] | URL patterns to exclude (regex) |
Depth Explained
Depth controls how far from the seed URL the crawler will go:- Depth 0: Only the seed URL
- Depth 1: Seed URL + pages linked from it
- Depth 2: Seed URL + linked pages + pages linked from those

