Get Urls Pro

Get Urls Pro

This Apify actor crawls websites, extracts and creates a hierarchy of links, allowing you to visualize the structure of a website. The crawler can be configured to use either standard HTTP requests with BeautifulSoup (fast HTML parsing) or Selenium (for JavaScript-heavy pages)

AUTOMATIONOTHERSEO_TOOLSApify

Website Crawler

This Apify actor crawls websites, extracts and creates a hierarchy of links, allowing you to visualize the structure of a website. The crawler can be configured to use either standard HTTP requests with BeautifulSoup (fast HTML parsing) or Selenium (for JavaScript-heavy pages).

Features

  • Crawl any website starting from a specified URL
  • Control crawl depth and number of links per page
  • Filter out specific file extensions
  • Option to use Selenium for JavaScript-heavy websites
  • Prevent duplicate URLs in the output
  • Proxy support (via Apify Proxy)

Input Parameters

ParameterTypeDescription
startUrlStringThe starting URL to crawl (e.g., https://jamesclear.com/five-step-creative-process)
useSeleniumBooleanUse Selenium for JavaScript-heavy pages
allowDuplicatesBooleanAllow duplicate URLs in the output
maxDepthIntegerMaximum depth of link recursion (1-30)
maxChildrenPerLinkIntegerMaximum number of children per parent link (1-100)
sameDomainOnlyBooleanonly crawl urls with the same domain as the start url, (default: true)
ignoredExtensionsArrayFile extensions to ignore when crawling

Output

The actor outputs a JSON object with the following structure:

1[
2    {
3  "url": "https://jamesclear.com/five-step-creative-process",
4  "name": null,
5  "query": "",
6  "depth": 0,
7  "parentUrl": null
8},
9{
10  "url": "https://jamesclear.com/",
11  "name": null,
12  "query": "",
13  "depth": 1,
14  "parentUrl": "https://jamesclear.com/five-step-creative-process"
15},
16{
17  "url": "https://jamesclear.com/books",
18  "name": "Books",
19  "query": "",
20  "depth": 1,
21  "parentUrl": "https://jamesclear.com/five-step-creative-process"
22},
23{
24  "url": "https://jamesclear.com/articles",
25  "name": "Articles",
26  "query": "",
27  "depth": 1,
28  "parentUrl": "https://jamesclear.com/five-step-creative-process"
29},
30{
31  "url": "https://jamesclear.com/3-2-1",
32  "name": "Newsletter",
33  "query": "",
34  "depth": 2,
35  "parentUrl": "https://jamesclear.com/"
36},
37{
38  "url": "https://jamesclear.com/events?g=4",
39  "name": "Speaking",
40  "query": "g=4",
41  "depth": 2,
42  "parentUrl": "https://jamesclear.com/"
43}
44]

Example Usage

Basic Crawl

To create a basic map of a website with default settings:

1{
2  "startUrl": "https://google.com",
3  "useSelenium": false,
4  "maxDepth": 2,
5  "maxChildrenPerLink": 5,
6}

Deep Crawl with Selenium

For a deeper crawl of a JavaScript-heavy website:

1{
2  "startUrl": "https://jamesclear.com/five-step-creative-process",
3  "useSelenium": true,
4  "maxDepth": 2,
5  "maxChildrenPerLink": 5,
6  "allowDuplicates": false,
7  "ignoredExtensions": ["gif", "jpg", "png", "css", "jpeg", "pdf", "doc", "docx"]
8}

Implementation Details

This actor is built with:

  • Apify Python SDK
  • BeautifulSoup for standard HTML parsing
  • Selenium with Chrome WebDriver for JavaScript-heavy pages
  • Asynchronous processing for better performance

notes

  • JavaScript-heavy pages may require the useSelenium option enabled
  • Very large websites should use lower maxDepth and maxChildrenPerLink values to avoid hitting memory limits, or talking way long time

Frequently Asked Questions

Is it legal to scrape job listings or public data?

Yes, if you're scraping publicly available data for personal or internal use. Always review Websute's Terms of Service before large-scale use or redistribution.

Do I need to code to use this scraper?

No. This is a no-code tool — just enter a job title, location, and run the scraper directly from your dashboard or Apify actor page.

What data does it extract?

It extracts job titles, companies, salaries (if available), descriptions, locations, and post dates. You can export all of it to Excel or JSON.

Can I scrape multiple pages or filter by location?

Yes, you can scrape multiple pages and refine by job title, location, keyword, or more depending on the input settings you use.

How do I get started?

You can use the Try Now button on this page to go to the scraper. You’ll be guided to input a search term and get structured results. No setup needed!