This Apify actor crawls websites, extracts and creates a hierarchy of links, allowing you to visualize the structure of a website. The crawler can be configured to use either standard HTTP requests with BeautifulSoup (fast HTML parsing) or Selenium (for JavaScript-heavy pages)
This Apify actor crawls websites, extracts and creates a hierarchy of links, allowing you to visualize the structure of a website. The crawler can be configured to use either standard HTTP requests with BeautifulSoup (fast HTML parsing) or Selenium (for JavaScript-heavy pages).
Parameter | Type | Description |
---|---|---|
startUrl | String | The starting URL to crawl (e.g., https://jamesclear.com/five-step-creative-process) |
useSelenium | Boolean | Use Selenium for JavaScript-heavy pages |
allowDuplicates | Boolean | Allow duplicate URLs in the output |
maxDepth | Integer | Maximum depth of link recursion (1-30) |
maxChildrenPerLink | Integer | Maximum number of children per parent link (1-100) |
sameDomainOnly | Boolean | only crawl urls with the same domain as the start url, (default: true) |
ignoredExtensions | Array | File extensions to ignore when crawling |
The actor outputs a JSON object with the following structure:
1[ 2 { 3 "url": "https://jamesclear.com/five-step-creative-process", 4 "name": null, 5 "query": "", 6 "depth": 0, 7 "parentUrl": null 8}, 9{ 10 "url": "https://jamesclear.com/", 11 "name": null, 12 "query": "", 13 "depth": 1, 14 "parentUrl": "https://jamesclear.com/five-step-creative-process" 15}, 16{ 17 "url": "https://jamesclear.com/books", 18 "name": "Books", 19 "query": "", 20 "depth": 1, 21 "parentUrl": "https://jamesclear.com/five-step-creative-process" 22}, 23{ 24 "url": "https://jamesclear.com/articles", 25 "name": "Articles", 26 "query": "", 27 "depth": 1, 28 "parentUrl": "https://jamesclear.com/five-step-creative-process" 29}, 30{ 31 "url": "https://jamesclear.com/3-2-1", 32 "name": "Newsletter", 33 "query": "", 34 "depth": 2, 35 "parentUrl": "https://jamesclear.com/" 36}, 37{ 38 "url": "https://jamesclear.com/events?g=4", 39 "name": "Speaking", 40 "query": "g=4", 41 "depth": 2, 42 "parentUrl": "https://jamesclear.com/" 43} 44]
To create a basic map of a website with default settings:
1{ 2 "startUrl": "https://google.com", 3 "useSelenium": false, 4 "maxDepth": 2, 5 "maxChildrenPerLink": 5, 6}
For a deeper crawl of a JavaScript-heavy website:
1{ 2 "startUrl": "https://jamesclear.com/five-step-creative-process", 3 "useSelenium": true, 4 "maxDepth": 2, 5 "maxChildrenPerLink": 5, 6 "allowDuplicates": false, 7 "ignoredExtensions": ["gif", "jpg", "png", "css", "jpeg", "pdf", "doc", "docx"] 8}
This actor is built with:
useSelenium
option enabledmaxDepth
and maxChildrenPerLink
values to avoid hitting memory limits, or talking way long timeYes, if you're scraping publicly available data for personal or internal use. Always review Websute's Terms of Service before large-scale use or redistribution.
No. This is a no-code tool — just enter a job title, location, and run the scraper directly from your dashboard or Apify actor page.
It extracts job titles, companies, salaries (if available), descriptions, locations, and post dates. You can export all of it to Excel or JSON.
Yes, you can scrape multiple pages and refine by job title, location, keyword, or more depending on the input settings you use.
You can use the Try Now button on this page to go to the scraper. You’ll be guided to input a search term and get structured results. No setup needed!