Ai Web Scraper - Natural language and Vision scraper

Ai Web Scraper - Natural language and Vision scraper

Powerful AI Web Scraper using Google's Gemini Vision. Specify data extraction in natural language. Supports infinite scroll, above-the-fold analysis, automatic cookie consent, pay-per-event pricing, and screenshot storage for debugging.

AIAUTOMATIONDEVELOPER_TOOLSApify

AI Web Scraper – Natural Language & Vision Scraper (Playwright + Pay-Per-Event)

The AI Web Scraper is an advanced and intuitive web scraping tool powered by Google's Gemini Large Language Model (LLM). Define your scraping needs in natural language, and the AI dynamically identifies and extracts relevant data directly from webpage screenshots.


🔥 What's New (Update: March 21, 2025)

  • Structured Output via Dynamic Schema: Automatically generates structured data outputs tailored precisely to your scraping instructions, improving data consistency and usability.
  • Enhanced Instruction Parsing: Improved AI understanding of natural language instructions, extracting clearer, database-ready item lists.
  • Enhanced Cookie Consent Handling: Smarter AI-driven cookie acceptance improves automation and reduces manual intervention.
  • Streamlined Browser Management: Efficiently manages Playwright browser instances, optimizing performance and resource utilization.

How It Works

  1. Define Instructions Clearly
    Use plain language to specify exactly what data you need:

    "Extract the product title, price, and description."
  2. AI-Driven Data Extraction
    Gemini LLM intelligently analyzes webpage screenshots, dynamically locating requested items.

  3. Flexible Scrolling Options

    • Infinite Scrolling: For pages that continuously load new content.
    • No-Overlap Scrolling: Captures comprehensive screenshots of static pages.
    • Above-the-Fold Only: Capture just the initially visible content without scrolling.
  4. Structured JSON Outputs Receive data neatly structured for easy analysis and integration:

    1{
    2  "url": "https://example_A.com",
    3  "items": [
    4    {"product_name": "Item A", "price": "$29.99", "description": "A great product.","url ":"https://example_A.com"},
    5    {"product_name": "Item B", "price": "$49.99", "description": "Another great product.","url":"https://example_B.com"}
    6  ]
    7}

Example Input Configuration

1{
2  "instructions": "Give me the product name and price for each product that isn't blue",
3  "start_urls": [
4    "https://www.example_A.com/product1",
5    "https://www.example_B.com/product2"
6  ],
7  "has_infinite_scroll": false,
8  "save_screenshots": false,
9  "above_fold_only": false
10}

Important Notes

  • Pay-per-event: Charges apply each time the Gemini LLM analyzes a screenshot.
  • Optimized Instructions: Clearer instructions produce better AI-driven results.
  • Legal Compliance: Always adhere to website terms of service and relevant privacy regulations.

How to Use

  1. Apify Setup: Log in to Apify and select the actor.
  2. Configure Inputs: Specify URLs, instructions, and scrolling behaviors.
  3. Run and Extract: Start the actor, and seamlessly access structured data outputs.

Use Cases

  • E-commerce Analysis: Extract product details, pricing, and reviews.
  • Market Intelligence: Monitor competitor offerings and pricing.
  • Lead Generation: Collect data from directories or listings.
  • Media Monitoring: Capture news headlines, article summaries, or author details.

Integrations

  • Easily integrates into Apify's cloud ecosystem.
  • Automate post-processing via Apify tasks, actors, or APIs.

Feedback & Issues

Your input is valuable! Report any issues or suggest new features via the Issues section on the Apify actor page.

Thanks for choosing AI Web Scraper!

Frequently Asked Questions

Is it legal to scrape job listings or public data?

Yes, if you're scraping publicly available data for personal or internal use. Always review Websute's Terms of Service before large-scale use or redistribution.

Do I need to code to use this scraper?

No. This is a no-code tool — just enter a job title, location, and run the scraper directly from your dashboard or Apify actor page.

What data does it extract?

It extracts job titles, companies, salaries (if available), descriptions, locations, and post dates. You can export all of it to Excel or JSON.

Can I scrape multiple pages or filter by location?

Yes, you can scrape multiple pages and refine by job title, location, keyword, or more depending on the input settings you use.

How do I get started?

You can use the Try Now button on this page to go to the scraper. You’ll be guided to input a search term and get structured results. No setup needed!