Scraping Hub: Four Engines, One Interface — From Static Pages to Bot-Protected SPAs
Not All Websites Are Created Equal
Your agent needs knowledge from the web. Simple enough — until you realize that a company blog, a JavaScript SPA, a Cloudflare-protected documentation site, and a dynamically-loaded product catalog each require completely different scraping approaches.
Use the wrong scraper and you get empty results, blocked requests, or garbled content. Use the right one and you get clean, structured knowledge your agent can actually use.
The Scraping Hub eliminates guesswork by giving you four engines behind one interface.
The Four Engines
Standard (BeautifulSoup)
The workhorse. Fast, lightweight, and reliable for 80% of the web. If the page renders without JavaScript, this is your engine.
- Speed: Fastest — sub-second per page
- Best for: Blogs, documentation, news sites, static pages
- Features: CSS selector targeting, metadata extraction, clean markdown output
Crawl4AI
AI-powered content extraction that intelligently identifies main content, strips boilerplate, and produces structured output.
- Speed: Fast
- Best for: Complex layouts where you want the "article" without the nav bars and ads
- Features: Smart content detection, markdown output, multi-page crawling with link following
Firecrawl
Cloud-based scraping with full JavaScript rendering. When the content only appears after the JavaScript loads, Firecrawl handles it.
- Speed: Medium — JS rendering takes time
- Best for: SPAs, React/Next.js sites, lazy-loaded content
- Features: Full JS rendering, structured data extraction, sitemap crawling
Scrapling
Stealth scraping for sites that actively block bots. Three fetcher tiers escalate from basic to full browser emulation.
- Speed: Slower — stealth requires patience
- Best for: Bot-protected sites, Cloudflare/Akamai-protected pages, sites with rate limiting
- Features: Three fetcher tiers (basic/stealth/full browser), proxy support, fingerprint rotation, adaptive parsing
thinnestAI vs. Competitors: Knowledge Ingestion
| Capability | thinnestAI | Voiceflow | Botpress | Relevance AI |
|---|---|---|---|---|
| Scraping engines | 4 engines (BS4, Crawl4AI, Firecrawl, Scrapling) | 1 (basic HTTP) | 1 (basic HTTP) | 1 (basic HTTP) |
| JavaScript rendering | Yes — Firecrawl + Scrapling | No | No | No |
| Stealth/anti-detection | Yes — Scrapling with 3 tiers | No | No | No |
| Content deduplication | Yes — SHA-256 content hashing | No | No | Basic |
| Visual engine selector | Yes — cards with feature badges | Single scraper, no choice | Single scraper, no choice | Basic URL input |
Advanced Features
- CSS selectors: Target specific content areas (e.g.,
article.main-content) — skip headers, footers, and sidebars - Depth control: Set how many link levels to follow (0 = single page, 1 = page + linked pages)
- Page limits: Cap the total number of pages scraped to control costs and time
- Content deduplication: SHA-256 hashing prevents duplicate chunks when re-scraping
- Real-time progress: SSE-powered progress display with extracted page previews
Get Started
The Scraping Hub is live on all plans. Add a Web URL knowledge source, select your engine, and start extracting. Standard and Crawl4AI require no API keys. Firecrawl needs a Firecrawl API key. Scrapling is fully self-contained.
No credit card required • 4 engines included • Content deduplication built in