17 Verified Tools

Prepare Web Data for AI Training

Crawl and extract clean data for LLM training, RAG applications, and AI model development.

100K+
Pages/run
MD/JSON
Formats
Clean
Quality
Training dataRAG pipelinesKnowledge basesContent extraction
Total Actors
17
Total Users
102K
Avg Rating
4.4/5
Updated
Weekly

ALL AI & LLM SCRAPERS

Sorted by popularity. Each actor is tested weekly. Click any tool to see detailed features, pricing, and user reviews.

VIEW ALL CATEGORIES_
AI & LLM
apify
apify

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.

93.6k users
Free
Try Free
AI & LLM
apify
apify

Web browser for OpenAI Assistants, RAG pipelines, or AI agents, similar to a web browser in ChatGPT. It queries Google Search, scrapes the top N pages, and returns their content as Markdown for further processing by an LLM. It can also scrape individual URLs.

8.4k users
Free
Try Free
Verified
AI & LLM
parseforge
parseforge

AI-powered colorization tool for black and white manga panels. Upload manga panel images and get them automatically colorized with vibrant, appropriate colors that match manga art style. Maintains original line art and shading while applying professional colorization.

126 users
Free
Try Free
Verified
AI & LLM
parseforge
parseforge

Extract and translate text from manga images using Nano Banana AI. Processes manga panel images to extract text in multiple languages and translates to multiple target languages simultaneously. Each extracted text includes the original text and translations to all selected languages.

21 users
Free
Try Free
Verified
AI & LLM
parseforge
parseforge

Generate eye-catching video thumbnails automatically using Google Gemini AI. Upload a video or provide a URL, and get multiple high-quality thumbnail options in seconds. Perfect for content creators, marketers, and video editors who need professional thumbnails without design skills.

18 users
Free
Try Free
Verified
AI & LLM
parseforge
parseforge

Transform your text descriptions into retro-style pixel art with this advanced AI pixel art generator. Perfect for games, digital art projects, and creative applications, this Actor offers a nostalgic 8-bit and 16-bit style artwork creation.

18 users
Free
Try Free
Verified
AI & LLM
parseforge
parseforge

Convert PDF documents into structured JSON using AI-powered OCR and smart data extraction. The Actor processes every page to ensure complete coverage, then identifies text, fields, tables, and key details, delivering clean, organized JSON ready for automation or analysis.

14 users
Free
Try Free
Verified
AI & LLM
parseforge
parseforge

Convert HTML to structured JSON using AI! Uses OpenAI to extract and structure data from HTML into clean JSON format. Perfect for developers and data analysts who need to transform HTML into structured data without manual parsing.

13 users
Free
Try Free
Verified
AI & LLM
parseforge
parseforge

Reviews any resume alongside its target job description and serves up clear guidance that hiring teams can trust. Instead of rewriting documents manually, you get verdicts, structure feedback, action items, and ready-to-use rewrites that stay faithful to the original text.

8 users
Free
Try Free
Verified
AI & LLM
parseforge
parseforge

Automates audio transcription from multiple sources (files or links). Normalizes input format to ensure optimal processing. Generates word-for-word transcriptions maintaining references to source audio, perfect for datasets requiring traceability and regulatory compliance.

7 users
Free
Try Free
Verified
AI & LLM
parseforge
parseforge

Collect models from Hugging Face Hub via public API endpoints. Get metadata including author, downloads, likes, lastModified, task, library, license, tags and filenames.

6 users
Free
Try Free
Verified
AI & LLM
parseforge
parseforge

Turn any photo into a printable coloring book page. Upload images or paste URLs and let our AI create clean black and white line art with simplified backgrounds and bold outlines. Perfect for parents, teachers, activity book creators, and artists who need instant coloring pages for all ages.

6 users
Free
Try Free
Verified
AI & LLM
parseforge
parseforge

Upload resumes, cover letters, or job application text to detect potential fraud. The actor extracts text from PDF/DOCX/TXT files, evaluates authenticity with OpenAI, and returns a verdict, score, and justification.

5 users
Free
Try Free
Verified
AI & LLM
parseforge
parseforge

Transform restaurant dish photos into professional food photography using AI. Automatically enhance lighting, colors, and composition while maintaining dish authenticity. Optimized for delivery platforms (Uber Eats, DoorDash, Rappi, DidiFood) with platform-specific image sizes and aspect ratios.

5 users
Free
Try Free
Verified
AI & LLM
parseforge
parseforge

AI-powered restoration tool for old and damaged photographs. Restore vintage photos by fixing time degradation, physical damage, or performing full recovery. Optional colorization, upscaling, and AI artifact removal. Perfect for preserving family memories, historical photos, and vintage artwork.

4 users
Free
Try Free
Verified
AI & LLM
parseforge
parseforge

Fix and normalize addresses in bulk using AI. Upload a CSV or provide a list of addresses, and the Actor returns fully standardized results with proper capitalization, filled missing fields, corrected formats, and consistent structure for clean, reliable address data.

4 users
Free
Try Free
Verified
AI & LLM
parseforge
parseforge

Extract structured JSON data from passports, driver’s licenses, and ID cards using advanced AI vision. Automatically capture personal details, document info, dates, and all relevant fields from ID images, turning them into clean, accurate JSON for fast verification workflows.

2 users
Free
Try Free

START SCRAPING AI & LLM DATA TODAY

Pick a tool from our ai & llm collection. Most have free tiers. Get your first data in under 5 minutes.