Skip to main content

Type Definition

interface CrawlOptions {
  // Required
  url: string;

  // Crawl limits
  depth?: number;
  maxPages?: number;

  // Scraping
  scrape?: boolean;
  scrapeConcurrency?: number;
  formats?: Array<"markdown" | "html">;

  // Rate limiting
  delayMs?: number;
  timeoutMs?: number;

  // URL filtering
  includePatterns?: string[];
  excludePatterns?: string[];

  // Request configuration
  userAgent?: string;
  proxy?: ProxyConfig;

  // Debugging
  verbose?: boolean;
  showChrome?: boolean;
}

Options Reference

Required Options

OptionTypeDescription
urlstringSeed URL to start crawling from

Crawl Limit Options

OptionTypeDefaultDescription
depthnumber1Maximum crawl depth
maxPagesnumber20Maximum pages to discover

Scraping Options

OptionTypeDefaultDescription
scrapebooleanfalseAlso scrape content of discovered pages
scrapeConcurrencynumber2Concurrent scraping threads
formatsArray<"markdown" | "html">["markdown", "html"]Output formats when scraping

Rate Limiting Options

OptionTypeDefaultDescription
delayMsnumber1000Delay between requests (ms)
timeoutMsnumberundefinedTotal timeout for crawl (ms)

URL Filtering Options

OptionTypeDefaultDescription
includePatternsstring[]undefinedURL patterns to include (regex)
excludePatternsstring[]undefinedURL patterns to exclude (regex)

Request Configuration Options

OptionTypeDefaultDescription
userAgentstringundefinedCustom user agent string
proxyProxyConfigundefinedProxy configuration

Debugging Options

OptionTypeDefaultDescription
verbosebooleanfalseEnable verbose logging
showChromebooleanfalseShow browser window

Examples

Basic Crawl

await reader.crawl({
  url: "https://example.com",
  depth: 2,
  maxPages: 50,
});

Crawl with Scraping

await reader.crawl({
  url: "https://example.com",
  depth: 2,
  maxPages: 50,
  scrape: true,
  scrapeConcurrency: 5,
  formats: ["markdown"],
});

With URL Filtering

await reader.crawl({
  url: "https://example.com",
  depth: 3,
  maxPages: 100,
  includePatterns: ["^/docs/", "^/guides/"],
  excludePatterns: ["^/admin/", "^/api/"],
});

With Rate Limiting

await reader.crawl({
  url: "https://example.com",
  depth: 2,
  delayMs: 2000, // 2 seconds between requests
  timeoutMs: 300000, // 5 minute total timeout
});

Full Options

await reader.crawl({
  url: "https://example.com",
  depth: 3,
  maxPages: 100,
  scrape: true,
  scrapeConcurrency: 5,
  formats: ["markdown"],
  delayMs: 1000,
  timeoutMs: 600000,
  includePatterns: ["^/docs/"],
  excludePatterns: ["^/docs/legacy/"],
  proxy: {
    host: "proxy.example.com",
    port: 8080,
    username: "user",
    password: "pass",
  },
  verbose: true,
});