💫 Extract PDF Document Contents including Metadata, Images, Pages, Tables, Attachments, etc.
Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems.[2][3] Based on the PostScript language, each PDF file encapsulates a complete description of a fixed-layout flat document, including the text, fonts, vector graphics, raster images and other information needed to display it. PDF has its roots in "The Camelot Project" initiated by Adobe co-founder John Warnock in 1991.[4] PDF was standardized as ISO 32000 in 2008.[5] The last edition as ISO 32000-2:2020 was published in December 2020.
💫 Extract contents from PDF documents
Name | Type | Description |
---|---|---|
url | Array [String] | List of PDF document URL |
content | String | Output pages format (text, svg, png, jpg ) |
images | Boolean (true/false) | Extract embedded images |
attachments | Boolean (true/false) | Extract embedded files |
tables | Boolean (true/false) | Extract tables |
Notes : All extracted resources other than TEXT will be saved to default Key-Value storage.
1[ 2 # URL-1: Metadata 3 { "metadata": { "headers": { ... }, "url": "...", "mime": "..." } }, 4 # URL-1: Page Contents 5 { "index": 0, "content": "...page-0 contents...", "images": [...], "tables": [...] }, 6 { "index": 1, "content": "...page-1 contents...", "images": [...], "tables": [...] }, 7 ... 8 # URL-2: Metadata 9 { "metadata": { "headers": { ... }, "url": "...", "mime": "..." } }, 10 # URL-2: Page Contents 11 { "index": 0, "content": "...page-0 contents...", "images": [...], "tables": [...] }, 12 { "index": 1, "content": "...page-1 contents...", "images": [...], "tables": [...] }, 13 ... 14]
URL : https://www.w3.org/WAI/WCAG21/working-examples/pdf-table/table.pdf
1{ 2 3}
URL : https://apify.com/img/web-scraping/beginners-guide-to-web-scraping.pdf
1{ 2 3}
⚡️ Feel free to reach out to the developer for any issues or suggestions for improvement.
Yes, if you're scraping publicly available data for personal or internal use. Always review Websute's Terms of Service before large-scale use or redistribution.
No. This is a no-code tool — just enter a job title, location, and run the scraper directly from your dashboard or Apify actor page.
It extracts job titles, companies, salaries (if available), descriptions, locations, and post dates. You can export all of it to Excel or JSON.
Yes, you can scrape multiple pages and refine by job title, location, keyword, or more depending on the input settings you use.
You can use the Try Now button on this page to go to the scraper. You’ll be guided to input a search term and get structured results. No setup needed!