crawl()

Usage

// Via ReaderClient (recommended)
const reader = new ReaderClient();
const result = await reader.crawl(options);

// Standalone function
import { crawl } from "@vakra-dev/reader";
const result = await crawl(options);

Parameters

crawl(options: CrawlOptions): Promise<CrawlResult>

See CrawlOptions for full options reference.

Quick Example

import { ReaderClient } from "@vakra-dev/reader";

const reader = new ReaderClient();

const result = await reader.crawl({
  url: "https://example.com",
  depth: 2,
  maxPages: 50,
  scrape: true,
});

console.log(`Found ${result.urls.length} pages`);
result.urls.forEach((page) => {
  console.log(`- ${page.title}: ${page.url}`);
});

await reader.close();

Return Value

Returns a Promise<CrawlResult>:

interface CrawlResult {
  urls: CrawlUrl[];
  scraped?: ScrapeResult; // Only if scrape: true
  metadata: CrawlMetadata;
}

interface CrawlUrl {
  url: string;
  title: string;
  description: string | null;
}

interface CrawlMetadata {
  totalUrls: number;
  maxDepth: number;
  totalDuration: number;
  seedUrl: string;
}

See CrawlResult for full result structure.

Common Options

Depth and Limits

const result = await reader.crawl({
  url: "https://example.com",
  depth: 3, // How deep to crawl
  maxPages: 100, // Maximum pages to discover
});

With Scraping

const result = await reader.crawl({
  url: "https://example.com",
  depth: 2,
  scrape: true,
  scrapeConcurrency: 5,
  formats: ["markdown"],
});

// Access scraped content
result.scraped?.data.forEach((page) => {
  console.log(page.markdown);
});

URL Filtering

const result = await reader.crawl({
  url: "https://example.com",
  depth: 3,
  includePatterns: ["^/docs/", "^/guides/"],
  excludePatterns: ["^/admin/", "^/api/"],
});

Rate Limiting

const result = await reader.crawl({
  url: "https://example.com",
  depth: 2,
  delayMs: 2000, // 2 seconds between requests
});

With Proxy

const result = await reader.crawl({
  url: "https://example.com",
  depth: 2,
  proxy: {
    host: "proxy.example.com",
    port: 8080,
    username: "user",
    password: "pass",
  },
});

How Depth Works

Depth controls how far from the seed URL the crawler will go:

Seed URL (depth 0)
├── Page A (depth 1)
│   ├── Page D (depth 2)
│   └── Page E (depth 2)
├── Page B (depth 1)
│   └── Page F (depth 2)
└── Page C (depth 1)

depth: 0 - Only the seed URL
depth: 1 - Seed + pages directly linked from it
depth: 2 - Seed + linked pages + pages linked from those

Domain Restrictions

By default, crawling stays within the same domain. External links are not followed.

// Only crawls example.com and subdomains
const result = await reader.crawl({
  url: "https://docs.example.com",
  depth: 3,
});
// Will crawl docs.example.com, api.example.com, etc.
// Will NOT crawl other-site.com

API Reference

Classes

Functions

Types

Usage

Parameters

Quick Example

Return Value

Common Options

Depth and Limits

With Scraping

URL Filtering

Rate Limiting

With Proxy

How Depth Works

Domain Restrictions

API Reference

Classes

Functions

Types

​Usage

​Parameters

​Quick Example

​Return Value

​Common Options

​Depth and Limits

​With Scraping

​URL Filtering

​Rate Limiting

​With Proxy

​How Depth Works

​Domain Restrictions

Usage

Parameters

Quick Example

Return Value

Common Options

Depth and Limits

With Scraping

URL Filtering

Rate Limiting

With Proxy

How Depth Works

Domain Restrictions