Scraping Hub: Four Engines, One Interface — From Static Pages to Bot-Protected SPAs

Thinnest AI Team

Feb 27, 2026• 5 min read

Scraping Hub: Four Engines, One Interface — From Static Pages to Bot-Protected SPAs

Not All Websites Are Created Equal

Your agent needs knowledge from the web. Simple enough — until you realize that a company blog, a JavaScript SPA, a Cloudflare-protected documentation site, and a dynamically-loaded product catalog each require completely different scraping approaches.

Use the wrong scraper and you get empty results, blocked requests, or garbled content. Use the right one and you get clean, structured knowledge your agent can actually use.

The Scraping Hub eliminates guesswork by giving you four engines behind one interface.

The Four Engines

Standard (BeautifulSoup)

The workhorse. Fast, lightweight, and reliable for 80% of the web. If the page renders without JavaScript, this is your engine.

Speed: Fastest — sub-second per page
Best for: Blogs, documentation, news sites, static pages
Features: CSS selector targeting, metadata extraction, clean markdown output

Crawl4AI

AI-powered content extraction that intelligently identifies main content, strips boilerplate, and produces structured output.

Speed: Fast
Best for: Complex layouts where you want the "article" without the nav bars and ads
Features: Smart content detection, markdown output, multi-page crawling with link following

Firecrawl

Cloud-based scraping with full JavaScript rendering. When the content only appears after the JavaScript loads, Firecrawl handles it.

Speed: Medium — JS rendering takes time
Best for: SPAs, React/Next.js sites, lazy-loaded content
Features: Full JS rendering, structured data extraction, sitemap crawling

Scrapling

Stealth scraping for sites that actively block bots. Three fetcher tiers escalate from basic to full browser emulation.

Speed: Slower — stealth requires patience
Best for: Bot-protected sites, Cloudflare/Akamai-protected pages, sites with rate limiting
Features: Three fetcher tiers (basic/stealth/full browser), proxy support, fingerprint rotation, adaptive parsing

thinnestAI vs. Competitors: Knowledge Ingestion

Capability	thinnestAI	Voiceflow	Botpress	Relevance AI
Scraping engines	4 engines (BS4, Crawl4AI, Firecrawl, Scrapling)	1 (basic HTTP)	1 (basic HTTP)	1 (basic HTTP)
JavaScript rendering	Yes — Firecrawl + Scrapling	No	No	No
Stealth/anti-detection	Yes — Scrapling with 3 tiers	No	No	No
Content deduplication	Yes — SHA-256 content hashing	No	No	Basic
Visual engine selector	Yes — cards with feature badges	Single scraper, no choice	Single scraper, no choice	Basic URL input

Advanced Features

CSS selectors: Target specific content areas (e.g., article.main-content) — skip headers, footers, and sidebars
Depth control: Set how many link levels to follow (0 = single page, 1 = page + linked pages)
Page limits: Cap the total number of pages scraped to control costs and time
Content deduplication: SHA-256 hashing prevents duplicate chunks when re-scraping
Real-time progress: SSE-powered progress display with extracted page previews

Get Started

The Scraping Hub is live on all plans. Add a Web URL knowledge source, select your engine, and start extracting. Standard and Crawl4AI require no API keys. Firecrawl needs a Firecrawl API key. Scrapling is fully self-contained.

Scrape Your First Page Free →

No credit card required • 4 engines included • Content deduplication built in

Scraping Hub: Four Engines, One Interface — From Static Pages to Bot-Protected SPAs

Not All Websites Are Created Equal

The Four Engines

Standard (BeautifulSoup)

Crawl4AI

Firecrawl

Scrapling

thinnestAI vs. Competitors: Knowledge Ingestion

Advanced Features

Get Started

Related documentation

Subscribe to our newsletter

Related reading

Platform

Docs