Example Website Screenshot Crawler

Automated website screenshot crawler using Pyppeteer and Apify. This open-source actor captures screenshots from specified URLs, uploads them to the Apify Key-Value Store, and provides easy access to the results, making it ideal for monitoring website changes and archiving web content.

AIAUTOMATIONOPEN_SOURCEApify

Try Now →

Website Screenshot Crawler

A template for automated website screenshot capturing. This actor takes screenshots of websites from specified URLs, uploads them to Apify Key-Value Store, and provides screenshot URLs in a dataset. It is ideal for monitoring website changes, archiving web content, or capturing visuals for reports. The actor uses Pyppeteer for browser automation and screenshot generation.

Source Code

You can find the source code for this actor in my GitHub account:

GitHub: https://github.com/DZ-ABDLHAKIM/Example-Website-Screenshot-Crawler

Included Features

Apify SDK - A toolkit for building Apify Actors and scrapers in Python.
Pyppeteer - A Python port of Puppeteer, an open-source tool for automating web browsers using a high-level API.
Key-Value Store - Store screenshots and metadata for easy retrieval.
Dataset - Structured storage for results like screenshot URLs and metadata.
Cookie and Viewport Support - Allows setting cookies and specifying the viewport dimensions before capturing screenshots.

Input

The input for this actor should be JSON containing the necessary configuration. The only required field is link_urls, which must be an array of website URLs. All other fields are optional. Here’s a detailed description of the input fields:

Field	Type	Description	Allowed Values
`link_urls`	Array	An array of website URLs to capture screenshots of.	Any valid URL
`Sleep`	Number	Duration to wait after the page has loaded before taking a screenshot (in seconds).	Minimum: 0, Maximum: 3600
`waitUntil`	String	Event to wait for before taking the screenshot.	One of: `"load"`, `"domcontentloaded"`, `"networkidle2"`, `"networkidle0"`
`cookies`	Array	Any cookies to set for the browser session.	Array of cookie objects
`fullPage`	Boolean	Whether to capture the full page or just the viewport.	`true` or `false`
`window_Width`	Number	Width of the browser viewport.	Minimum: 100, Maximum: 3840
`window_Height`	Number	Height of the browser viewport.	Minimum: 100, Maximum: 2160
`scrollToBottom`	Boolean	Should the browser scroll to the bottom of the page before taking a screenshot?	`true` or `false`
`distance`	Number	Distance (in pixels) to scroll down for each scroll action.	Minimum: 0
`delay`	Number	Delay (in milliseconds) between scroll actions.	Minimum: 0, Maximum: 3600000
`delayAfterScrolling`	Number	Specify the delay (in milliseconds) after scrolling to the bottom of the page before taking a screenshot.	Minimum: 0, Maximum: 3600000
`waitUntilNetworkIdleAfterScroll`	Boolean	Choose whether to wait for the network to become idle after scrolling to the bottom of the page.	`true` or `false`
`waitUntilNetworkIdleAfterScrollTimeout`	Number	Maximum wait time (in milliseconds) for the network to become idle after scrolling.	Minimum: 1000, Maximum: 3600000

For more information about the waitUntil parameter, please refer to the Puppeteer page.goto function documentation.

Output

Once the actor finishes executing, it will output a screenshot of each website into a file stored in the Key-Value Store associated with the run. The screenshot URLs will also be stored in a dataset for easy access.

How It Works

Input Configuration: The actor reads the input data as specified above.
Browser Automation: The actor launches a headless browser using Pyppeteer, loading the target URLs, and capturing screenshots.
Setting Cookies and Viewport: Before navigating to each link, specified cookies are set using page.setCookie(), and the viewport is configured with specified width and height.
Page Navigation: The actor navigates to each URL using page.goto(), waiting for the specified waitUntil event.
Scrolling (Optional): If the scrollToBottom option is enabled, the actor executes a scrolling script that scrolls down the page by the defined distance in pixels.
Screenshot Capture: After the page has fully loaded, the actor waits for the Sleep duration before capturing the screenshot and saves it with a random filename.
Uploading Screenshots: The captured screenshots are read as binary data and uploaded to the Apify Key-Value Store using Actor.set_value(), with URLs stored in the dataset.
Logging and Error Handling: The actor logs the success or failure of each URL processed, ensuring that it can continue processing even if one fails.
Cleanup: After processing all URLs, the actor closes the browser.

This open-source actor effectively automates the process of capturing and storing screenshots of multiple web pages, making it a valuable tool for monitoring website changes, archiving content, or generating visual reports.

Resources

Getting Started

To get started with this actor:

Build the Actor: Define your input URLs and configure optional settings like scrolling and sleep duration.
Run the Actor: Execute the actor on the Apify platform or locally using the Apify CLI.

Pull the Actor for Local Development

To develop this actor locally, follow these steps:

Install apify-cli:

Using Homebrew:

brew install apify-cli

Using NPM:

npm install -g apify-cli

Pull the Actor using its unique <ActorId>:

apify pull <ActorId>

Example Use Cases

Website Monitoring: Capture screenshots periodically to monitor changes to web pages.
Visual Archiving: Store visual representations of websites over time for research or archival purposes.
Reporting: Automatically capture visuals for reports or presentations.

Documentation Reference

Contact Information

For any inquiries, you can reach me at:
Email: fridaytechnolog@gmail.com
GitHub: https://github.com/DZ-ABDLHAKIM
Twitter: https://x.com/DZ_45Omar

Frequently Asked Questions

Is it legal to scrape job listings or public data?

Yes, if you're scraping publicly available data for personal or internal use. Always review Websute's Terms of Service before large-scale use or redistribution.

Do I need to code to use this scraper?

No. This is a no-code tool — just enter a job title, location, and run the scraper directly from your dashboard or Apify actor page.

What data does it extract?

It extracts job titles, companies, salaries (if available), descriptions, locations, and post dates. You can export all of it to Excel or JSON.

Can I scrape multiple pages or filter by location?

Yes, you can scrape multiple pages and refine by job title, location, keyword, or more depending on the input settings you use.

How do I get started?

You can use the Try Now button on this page to go to the scraper. You’ll be guided to input a search term and get structured results. No setup needed!