CLI Usage

Reader includes a powerful CLI for scraping and crawling from the terminal.

Installation

The CLI is included with the Reader package:

npm install -g @vakra-dev/reader

Or use with npx:

npx @vakra-dev/reader scrape https://example.com

Scrape Command

Basic Usage

# Scrape a single URL
reader scrape https://example.com

# Scrape multiple URLs
reader scrape https://example.com https://example.org

# Short alias
reader s https://example.com

Output Formats

# Markdown only (default)
reader scrape https://example.com

# Multiple formats
reader scrape https://example.com -f markdown,html

# Save to file
reader scrape https://example.com -o output.json

Concurrency

# Scrape multiple URLs concurrently
reader scrape url1 url2 url3 url4 url5 -c 3

Timeouts

# Set per-page timeout
reader scrape https://example.com -t 60000

# Set batch timeout
reader scrape url1 url2 url3 --batch-timeout 300000

Content Extraction

# Disable main content extraction (full page)
reader scrape https://example.com --no-main-content

# Include specific elements
reader scrape https://example.com --include-tags ".article,.content"

# Exclude specific elements
reader scrape https://example.com --exclude-tags ".comments,.sidebar"

Proxy

reader scrape https://example.com --proxy http://user:pass@proxy.example.com:8080

Debugging

# Verbose logging
reader scrape https://example.com -v

# Show browser window
reader scrape https://example.com --show-chrome

All Options

Option	Short	Default	Description
`--format`	`-f`	`markdown`	Output formats (comma-separated)
`--output`	`-o`	stdout	Output file path
`--concurrency`	`-c`	`1`	Parallel requests
`--timeout`	`-t`	`30000`	Per-page timeout (ms)
`--batch-timeout`		`300000`	Total batch timeout (ms)
`--proxy`			Proxy URL
`--user-agent`			Custom user agent
`--no-main-content`			Include full page
`--include-tags`			CSS selectors to include
`--exclude-tags`			CSS selectors to exclude
`--show-chrome`			Show browser window
`--verbose`	`-v`		Enable logging
`--standalone`			Bypass daemon

Crawl Command

Basic Usage

# Crawl a website
reader crawl https://example.com

# Short alias
reader c https://example.com

Depth and Limits

# Set crawl depth
reader crawl https://example.com -d 3

# Limit pages
reader crawl https://example.com -m 100

# Both
reader crawl https://example.com -d 3 -m 100

Scrape Content

# Crawl and scrape content
reader crawl https://example.com -d 2 --scrape

# With format
reader crawl https://example.com --scrape -f markdown

URL Filtering

# Include patterns
reader crawl https://example.com --include "blog/*,docs/*"

# Exclude patterns
reader crawl https://example.com --exclude "admin/*,api/*"

Rate Limiting

# Set delay between requests
reader crawl https://example.com --delay 2000

All Options

Option	Short	Default	Description
`--depth`	`-d`	`1`	Maximum crawl depth
`--max-pages`	`-m`	`20`	Maximum pages to discover
`--scrape`	`-s`		Scrape content
`--format`	`-f`	`markdown`	Output formats
`--output`	`-o`	stdout	Output file path
`--delay`		`1000`	Delay between requests (ms)
`--timeout`	`-t`		Total crawl timeout (ms)
`--include`			URL patterns to include
`--exclude`			URL patterns to exclude
`--proxy`			Proxy URL
`--user-agent`			Custom user agent
`--show-chrome`			Show browser window
`--verbose`	`-v`		Enable logging

Daemon Mode

For multiple requests, use daemon mode to keep the browser pool warm:

Start Daemon

# Start with default settings
reader start

# Custom pool size
reader start --pool-size 5

# Custom port
reader start -p 4000

Check Status

reader status

Stop Daemon

reader stop

Auto-Connect

When a daemon is running, CLI commands automatically connect to it:

# Start daemon
reader start --pool-size 5

# These commands use the daemon's browser pool
reader scrape https://example.com
reader scrape https://example.org
reader crawl https://example.net

# Bypass daemon (standalone mode)
reader scrape https://example.com --standalone

# Stop daemon when done
reader stop

Output Format

CLI output is always JSON with the following structure:

Scrape Output

{
  "data": [
    {
      "markdown": "# Page Title\n\nContent...",
      "html": "<h1>Page Title</h1>...",
      "metadata": {
        "baseUrl": "https://example.com",
        "scrapedAt": "2024-01-15T10:30:00Z",
        "duration": 1234,
        "website": {
          "title": "Page Title",
          "description": "Page description"
        }
      }
    }
  ],
  "batchMetadata": {
    "totalUrls": 1,
    "successfulUrls": 1,
    "failedUrls": 0,
    "totalDuration": 1234
  }
}

Crawl Output

{
  "urls": [
    { "url": "https://example.com/", "title": "Home" },
    { "url": "https://example.com/about", "title": "About" }
  ],
  "metadata": {
    "totalUrls": 2,
    "maxDepth": 1,
    "totalDuration": 2345,
    "seedUrl": "https://example.com"
  }
}

Examples

Scrape and process with jq

# Extract just the markdown
reader scrape https://example.com | jq -r '.data[0].markdown'

# Get all titles from batch
reader scrape url1 url2 url3 | jq -r '.data[].metadata.website.title'

Save crawl results

reader crawl https://docs.example.com -d 3 --scrape -o docs.json

Batch scrape from file

cat urls.txt | xargs reader scrape -c 5 -o results.json

Documentation

Concepts

Guides

Installation

Scrape Command

Basic Usage

Output Formats

Concurrency

Timeouts

Content Extraction

Proxy

Debugging

All Options

Crawl Command

Basic Usage

Depth and Limits

Scrape Content

URL Filtering

Rate Limiting

All Options

Daemon Mode

Start Daemon

Check Status

Stop Daemon

Auto-Connect

Output Format

Scrape Output

Crawl Output

Examples

Scrape and process with jq

Save crawl results

Batch scrape from file

Next Steps

Basic Scraping

Deployment

Documentation

Concepts

Guides

​Installation

​Scrape Command

​Basic Usage

​Output Formats

​Concurrency

​Timeouts

​Content Extraction

​Proxy

​Debugging

​All Options

​Crawl Command

​Basic Usage

​Depth and Limits

​Scrape Content

​URL Filtering

​Rate Limiting

​All Options

​Daemon Mode

​Start Daemon

​Check Status

​Stop Daemon

​Auto-Connect

​Output Format

​Scrape Output

​Crawl Output

​Examples

​Scrape and process with jq

​Save crawl results

​Batch scrape from file

​Next Steps

Basic Scraping

Deployment

Installation

Scrape Command

Basic Usage

Output Formats

Concurrency

Timeouts

Content Extraction

Proxy

Debugging

All Options

Crawl Command

Basic Usage

Depth and Limits

Scrape Content

URL Filtering

Rate Limiting

All Options

Daemon Mode

Start Daemon

Check Status

Stop Daemon

Auto-Connect

Output Format

Scrape Output

Crawl Output

Examples

Scrape and process with jq

Save crawl results

Batch scrape from file

Next Steps