Scrape all articles from any news website. Extract full text, metadata, keywords, and summaries. Ideal for content analysis, research, and news aggregation.
Crawl an entire news website and extract clean, structured data from all its articles. Get article text, metadata, keywords, summaries, and more – perfect for content analysis, market research, news aggregation, and SEO monitoring. No coding required!
This Actor is designed to efficiently extract data from entire news websites. It crawls all linked articles from a starting URL, making it ideal for:
The Actor pushes data to the dataset as it scrapes, providing results in real-time. Each item represents a single article (or an error) and contains the following fields:
articleURL
: The URL of the scraped articlesourceURL
: The base URL of the news sourcearticleLanguage
: The language of the article (e.g., "en", "es")articleTitle
: The title of the articlearticleAuthors
: A comma-separated list of the article's authorsarticlePublishDate
: The publication date (ISO 8601 format), if availablearticleText
: The full text content of the articlearticleTopImage
: The URL of the main imagearticleAllImages
: A comma-separated list of URLs for all imagesarticleVideos
: A comma-separated list of URLs for embedded videosarticleKeywords
: A comma-separated list of extracted keywordsarticleSummary
: A concise summary of the articlescrapedAt
: The timestamp of when the article was scraped (ISO 8601)scrapeSuccess
: true
if scraped successfully, false
otherwisearticleMetaDescription
: The meta description of the articlearticleMetaKeywords
: A comma-separated list of the meta keywordsscrapeErrorMessage
: An error message if scrapeSuccess
is false
1[ 2 { 3 "articleURL": "https://www.example.com/news/article1", 4 "sourceURL": "https://www.example.com", 5 "articleLanguage": "en", 6 "articleTitle": "Example News Article", 7 "articleAuthors": "John Doe, Jane Smith", 8 "articlePublishDate": "2024-07-27T10:00:00Z", 9 "articleText": "This is the full text of the example article...", 10 "articleTopImage": "https://www.example.com/images/article1.jpg", 11 "articleAllImages": "https://www.example.com/images/article1.jpg,https://www.example.com/images/article2.png", 12 "articleVideos": "", 13 "articleKeywords": "news, example, article", 14 "articleSummary": "A brief summary of the example article.", 15 "scrapedAt": "2024-07-27T12:34:56Z", 16 "scrapeSuccess": true, 17 "articleMetaDescription": "Meta description of the example news article.", 18 "articleMetaKeywords": "example, article, news" 19 } 20]
Find the "News Source Crawler" in the Apify Store
Configure the input:
url
: (Required) The URL of the news website to crawllanguage
: (Optional) The expected language (default: "en")maxArticles
: (Optional) The maximum number of articles to scrapeproxyConfiguration
: (Optional) Select an Apify Proxy configuration or provide custom proxiesRun the Actor
Access results in JSON, CSV, Excel, or other formats, directly from the dataset as the Actor runs
Optional: Schedule the Actor, set up webhooks, or integrate with other Actors
Start crawling news sources today! ➡️
Yes, if you're scraping publicly available data for personal or internal use. Always review Websute's Terms of Service before large-scale use or redistribution.
No. This is a no-code tool — just enter a job title, location, and run the scraper directly from your dashboard or Apify actor page.
It extracts job titles, companies, salaries (if available), descriptions, locations, and post dates. You can export all of it to Excel or JSON.
Yes, you can scrape multiple pages and refine by job title, location, keyword, or more depending on the input settings you use.
You can use the Try Now button on this page to go to the scraper. You’ll be guided to input a search term and get structured results. No setup needed!