Get Urls Pro

This Apify actor crawls websites, extracts and creates a hierarchy of links, allowing you to visualize the structure of a website. The crawler can be configured to use either standard HTTP requests with BeautifulSoup (fast HTML parsing) or Selenium (for JavaScript-heavy pages)

AUTOMATIONOTHERSEO_TOOLSApify

Try Now →

Website Crawler

Features

Crawl any website starting from a specified URL
Control crawl depth and number of links per page
Filter out specific file extensions
Option to use Selenium for JavaScript-heavy websites
Prevent duplicate URLs in the output
Proxy support (via Apify Proxy)

Input Parameters

Parameter	Type	Description
`startUrl`	String	The starting URL to crawl (e.g., https://jamesclear.com/five-step-creative-process)
`useSelenium`	Boolean	Use Selenium for JavaScript-heavy pages
`allowDuplicates`	Boolean	Allow duplicate URLs in the output
`maxDepth`	Integer	Maximum depth of link recursion (1-30)
`maxChildrenPerLink`	Integer	Maximum number of children per parent link (1-100)
`sameDomainOnly`	Boolean	only crawl urls with the same domain as the start url, (default: true)
`ignoredExtensions`	Array	File extensions to ignore when crawling

Output

The actor outputs a JSON object with the following structure:

1[
2    {
3  "url": "https://jamesclear.com/five-step-creative-process",
4  "name": null,
5  "query": "",
6  "depth": 0,
7  "parentUrl": null
8},
9{
10  "url": "https://jamesclear.com/",
11  "name": null,
12  "query": "",
13  "depth": 1,
14  "parentUrl": "https://jamesclear.com/five-step-creative-process"
15},
16{
17  "url": "https://jamesclear.com/books",
18  "name": "Books",
19  "query": "",
20  "depth": 1,
21  "parentUrl": "https://jamesclear.com/five-step-creative-process"
22},
23{
24  "url": "https://jamesclear.com/articles",
25  "name": "Articles",
26  "query": "",
27  "depth": 1,
28  "parentUrl": "https://jamesclear.com/five-step-creative-process"
29},
30{
31  "url": "https://jamesclear.com/3-2-1",
32  "name": "Newsletter",
33  "query": "",
34  "depth": 2,
35  "parentUrl": "https://jamesclear.com/"
36},
37{
38  "url": "https://jamesclear.com/events?g=4",
39  "name": "Speaking",
40  "query": "g=4",
41  "depth": 2,
42  "parentUrl": "https://jamesclear.com/"
43}
44]

Example Usage

Basic Crawl

To create a basic map of a website with default settings:

1{
2  "startUrl": "https://google.com",
3  "useSelenium": false,
4  "maxDepth": 2,
5  "maxChildrenPerLink": 5,
6}

Deep Crawl with Selenium

For a deeper crawl of a JavaScript-heavy website:

1{
2  "startUrl": "https://jamesclear.com/five-step-creative-process",
3  "useSelenium": true,
4  "maxDepth": 2,
5  "maxChildrenPerLink": 5,
6  "allowDuplicates": false,
7  "ignoredExtensions": ["gif", "jpg", "png", "css", "jpeg", "pdf", "doc", "docx"]
8}

Implementation Details

This actor is built with:

Apify Python SDK
BeautifulSoup for standard HTML parsing
Selenium with Chrome WebDriver for JavaScript-heavy pages
Asynchronous processing for better performance

notes

JavaScript-heavy pages may require the useSelenium option enabled
Very large websites should use lower maxDepth and maxChildrenPerLink values to avoid hitting memory limits, or talking way long time

Frequently Asked Questions

Is it legal to scrape job listings or public data?

Yes, if you're scraping publicly available data for personal or internal use. Always review Websute's Terms of Service before large-scale use or redistribution.

Do I need to code to use this scraper?

No. This is a no-code tool — just enter a job title, location, and run the scraper directly from your dashboard or Apify actor page.

What data does it extract?

It extracts job titles, companies, salaries (if available), descriptions, locations, and post dates. You can export all of it to Excel or JSON.

Can I scrape multiple pages or filter by location?

Yes, you can scrape multiple pages and refine by job title, location, keyword, or more depending on the input settings you use.

How do I get started?

You can use the Try Now button on this page to go to the scraper. You’ll be guided to input a search term and get structured results. No setup needed!