Skip to main content
Scraping is Reader’s core capability: fetching web pages and extracting clean content.

How It Works

When you call scrape(), Reader:
  1. Loads the page in a real browser (Ulixee Hero)
  2. Handles challenges like JavaScript execution and anti-bot protection
  3. Waits for content to ensure dynamic elements are loaded
  4. Extracts main content by removing navigation, ads, and other noise
  5. Converts to markdown using supermarkdown

Basic Usage

import { ReaderClient } from "@vakra-dev/reader";

const reader = new ReaderClient();

const result = await reader.scrape({
  urls: ["https://example.com"],
});

console.log(result.data[0].markdown);

await reader.close();

Output Formats

Reader supports two output formats:
FormatDescription
markdownClean markdown, optimized for LLMs
htmlCleaned HTML with main content only
const result = await reader.scrape({
  urls: ["https://example.com"],
  formats: ["markdown", "html"],
});

console.log(result.data[0].markdown);
console.log(result.data[0].html);

Scrape Result Structure

interface ScrapeResult {
  data: WebsiteScrapeResult[];
  batchMetadata: BatchMetadata;
}

interface WebsiteScrapeResult {
  markdown?: string;
  html?: string;
  metadata: {
    baseUrl: string;
    totalPages: number;
    scrapedAt: string;
    duration: number;
    website: WebsiteMetadata;
  };
}

Website Metadata

Reader extracts rich metadata from each page:
interface WebsiteMetadata {
  title: string | null;
  description: string | null;
  author: string | null;
  language: string | null;
  favicon: string | null;
  image: string | null;
  canonical: string | null;
  openGraph: { ... } | null;
  twitter: { ... } | null;
}

Timeouts

Control how long Reader waits for pages:
const result = await reader.scrape({
  urls: ["https://example.com"],
  timeoutMs: 60000, // 60 seconds per page
});

Waiting for Selectors

Wait for specific elements before extracting content:
const result = await reader.scrape({
  urls: ["https://example.com"],
  waitForSelector: ".article-content",
});

Custom User Agent

const result = await reader.scrape({
  urls: ["https://example.com"],
  userAgent: "MyBot/1.0",
});

Custom Headers

const result = await reader.scrape({
  urls: ["https://example.com"],
  headers: {
    "Accept-Language": "en-US",
    "Cookie": "session=abc123",
  },
});

Next Steps