URL Summary Scraper

URL Summary Scraper

A powerful Apify actor that extracts essential website information, including title, description, images, and social media links. Perfect for quick data gathering and insights from any URL.

AUTOMATIONLEAD_GENERATIONApify

Website Summary Scraper

This project is a web scraping tool designed to extract metadata from websites. It uses libraries like Axios for HTTP requests, Cheerio for HTML parsing, and Apify SDK for actor management. The scraper can fetch metadata such as titles, descriptions, social media links, and more from a webpage.

Features

  • Fetches metadata from a given URL.
  • Supports custom user-agent strings.
  • Respects robots.txt rules unless explicitly ignored.
  • Extracts social media links and contact information.
  • Extracts external links.

Usage

  1. Prepare the input:

It should include at least the url field.

1{
2  "url": "https://example.com",
3  "language": "en-US",
4  "ignoreRobots": false,
5  "ignoreExternalLinks": false
6}
  1. Output:

    1{
    2  "title": "Example Domain",
    3  "description": "This domain is for use in illustrative examples in documents.",
    4  "keywords": "example, domain, illustrative, examples, documents",
    5  "image": "https://example.com/image.png",
    6  "facebook": "https://facebook.com/example",
    7  "x": "https://twitter.com/example",
    8  "linkedin": "https://linkedin.com/company/example",
    9  "instagram": "https://instagram.com/example",
    10  "youtube": "https://youtube.com/example",
    11 "trustpilot": "https://trustpilot.com/review/example.com",
    12  "canonical": "https://example.com",
    13  "url_fetched": "https://example.com",
    14  "url": "https://example.com",
    15  "mail": "contact@example.com",
    16  "robotsAllow": true,
    17  "linksExternal": ["https://example.com/external1", "https://example.com/external2"]
    18}

Configuration

  • User-Agent: The scraper uses a random user-agent string for each request to mimic a real browser.
  • Language: You can specify the Accept-Language header in the input payload.
  • Robots.txt: By default, the scraper respects robots.txt rules. Set ignoreRobots to true in the input payload to bypass this.
  • External Links: By default, the scraper extracts external links. Set ignoreExternalLinks to true in the input payload to bypass this.

Frequently Asked Questions

Is it legal to scrape job listings or public data?

Yes, if you're scraping publicly available data for personal or internal use. Always review Websute's Terms of Service before large-scale use or redistribution.

Do I need to code to use this scraper?

No. This is a no-code tool — just enter a job title, location, and run the scraper directly from your dashboard or Apify actor page.

What data does it extract?

It extracts job titles, companies, salaries (if available), descriptions, locations, and post dates. You can export all of it to Excel or JSON.

Can I scrape multiple pages or filter by location?

Yes, you can scrape multiple pages and refine by job title, location, keyword, or more depending on the input settings you use.

How do I get started?

You can use the Try Now button on this page to go to the scraper. You’ll be guided to input a search term and get structured results. No setup needed!