Skip to main content

Usage

// Via ReaderClient (recommended)
const reader = new ReaderClient();
const result = await reader.crawl(options);

// Standalone function
import { crawl } from "@vakra-dev/reader";
const result = await crawl(options);

Parameters

crawl(options: CrawlOptions): Promise<CrawlResult>
See CrawlOptions for full options reference.

Quick Example

import { ReaderClient } from "@vakra-dev/reader";

const reader = new ReaderClient();

const result = await reader.crawl({
  url: "https://example.com",
  depth: 2,
  maxPages: 50,
  scrape: true,
});

console.log(`Found ${result.urls.length} pages`);
result.urls.forEach((page) => {
  console.log(`- ${page.title}: ${page.url}`);
});

await reader.close();

Return Value

Returns a Promise<CrawlResult>:
interface CrawlResult {
  urls: CrawlUrl[];
  scraped?: ScrapeResult; // Only if scrape: true
  metadata: CrawlMetadata;
}

interface CrawlUrl {
  url: string;
  title: string;
  description: string | null;
}

interface CrawlMetadata {
  totalUrls: number;
  maxDepth: number;
  totalDuration: number;
  seedUrl: string;
}
See CrawlResult for full result structure.

Common Options

Depth and Limits

const result = await reader.crawl({
  url: "https://example.com",
  depth: 3, // How deep to crawl
  maxPages: 100, // Maximum pages to discover
});

With Scraping

const result = await reader.crawl({
  url: "https://example.com",
  depth: 2,
  scrape: true,
  scrapeConcurrency: 5,
  formats: ["markdown"],
});

// Access scraped content
result.scraped?.data.forEach((page) => {
  console.log(page.markdown);
});

URL Filtering

const result = await reader.crawl({
  url: "https://example.com",
  depth: 3,
  includePatterns: ["^/docs/", "^/guides/"],
  excludePatterns: ["^/admin/", "^/api/"],
});

Rate Limiting

const result = await reader.crawl({
  url: "https://example.com",
  depth: 2,
  delayMs: 2000, // 2 seconds between requests
});

With Proxy

const result = await reader.crawl({
  url: "https://example.com",
  depth: 2,
  proxy: {
    host: "proxy.example.com",
    port: 8080,
    username: "user",
    password: "pass",
  },
});

How Depth Works

Depth controls how far from the seed URL the crawler will go:
Seed URL (depth 0)
├── Page A (depth 1)
│   ├── Page D (depth 2)
│   └── Page E (depth 2)
├── Page B (depth 1)
│   └── Page F (depth 2)
└── Page C (depth 1)
  • depth: 0 - Only the seed URL
  • depth: 1 - Seed + pages directly linked from it
  • depth: 2 - Seed + linked pages + pages linked from those

Domain Restrictions

By default, crawling stays within the same domain. External links are not followed.
// Only crawls example.com and subdomains
const result = await reader.crawl({
  url: "https://docs.example.com",
  depth: 3,
});
// Will crawl docs.example.com, api.example.com, etc.
// Will NOT crawl other-site.com