arXiv Search Scraper πŸ“š

arXiv Search Scraper πŸ“š

Extract comprehensive research paper data from arXiv search results. Get detailed metadata including titles, authors, abstracts, categories and more. Perfect for academic research monitoring, trend analysis and building paper databases. πŸŽ“πŸ“š

INTEGRATIONSOTHERApify

Scrape research papers, authors, and metadata from arXiv search results. Get detailed information about academic papers including titles, authors, abstracts, categories, submission dates and more.

Features ✨

  • πŸ” Scrape papers from any arXiv search URL
  • πŸ“„ Extract comprehensive paper metadata including:
    • Paper ID and PDF links
    • Title and abstract
    • Author names and profile URLs
    • Research categories and classifications
    • Submission dates and comments
  • ⚑ Fast and efficient pagination handling
  • πŸ”„ Support for multiple search URLs
  • βš™οΈ Configurable maximum items limit
  • 🌐 Proxy support for reliable scraping

Use Cases πŸ’‘

  • Research trend analysis
  • Academic paper monitoring
  • Building paper databases
  • Author tracking
  • Category-based paper collection
  • Literature review automation

Input Parameters πŸŽ›οΈ

The actor accepts the following input parameters:

FieldTypeDescription
searchUrlsArrayList of arXiv search URLs to scrape
maxItemsIntegerMaximum number of items to scrape (optional)
proxyConfigurationObjectProxy settings (optional)

Output πŸ“Š

The actor stores results in dataset with the following fields for each paper:

  • searchUrl: Source search URL
  • arxivId: Unique arXiv paper ID
  • pdfUrl: Direct link to PDF
  • categories: Research categories with codes and names
  • title: Paper title
  • authors: Author details including names and profile URLs
  • abstract: Full paper abstract
  • submissionDate: Paper submission date
  • comments: Additional paper comments

Example Usage πŸ’»

Input Example

A full explanation of an input example in JSON.

1{
2    "searchUrls": [
3        "https://arxiv.org/search/?query=ai&searchtype=all&source=header"
4    ],
5    "maxItems": 60
6}

Output sample

The results will be wrapped into a dataset which you can always find in theΒ StorageΒ tab. Here's an excerpt from the data you'd get if you apply the input parameters above:

And here is the same data but in JSON. You can choose in which format to download your data: JSON, JSONL, Excel spreadsheet, HTML table, CSV, or XML.

1[
2    {
3        "searchUrl": "https://arxiv.org/search/?query=ai&searchtype=all&source=header",
4        "arxivId": "arXiv:2502.21286",
5        "pdfUrl": "https://arxiv.org/pdf/2502.21286",
6        "categories": [
7            {
8                "code": "cs.CR",
9                "name": "Cryptography and Security"
10            },
11            {
12                "code": "cs.LG",
13                "name": "Machine Learning"
14            },
15            {
16                "code": "cs.NI",
17                "name": "Networking and Internet Architecture"
18            },
19            {
20                "code": "doi"
21            },
22            {
23                "code": "10.1109/TNSM.2024.3376631"
24            }
25        ],
26        "title": "Enabling AutoML for Zero-Touch Network Security: Use-Case Driven Analysis",
27        "authors": [
28            {
29                "name": "Li Yang",
30                "url": "https://arxiv.org/search/?searchtype=author&query=Yang%2C+L"
31            },
32            {
33                "name": "Mirna El Rajab",
34                "url": "https://arxiv.org/search/?searchtype=author&query=Rajab%2C+M+E"
35            },
36            {
37                "name": "Abdallah Shami",
38                "url": "https://arxiv.org/search/?searchtype=author&query=Shami%2C+A"
39            },
40            {
41                "name": "Sami Muhaidat",
42                "url": "https://arxiv.org/search/?searchtype=author&query=Muhaidat%2C+S"
43            }
44        ],
45        "abstract": "Zero-Touch Networks (ZTNs) represent a state-of-the-art paradigm shift towards fully automated and intelligent network management, enabling the automation and intelligence required to manage the complexity, scale, and dynamic nature of next-generation (6G) networks. ZTNs leverage Artificial Intelligence (AI) and Machine Learning (ML) to enhance operational efficiency, support intelligent decision-making, and ensure effective resource allocation. However, the implementation of ZTNs is subject to security challenges that need to be resolved to achieve their full potential. In particular, two critical challenges arise: the need for human expertise in developing AI/ML-based security mechanisms, and the threat of adversarial attacks targeting AI/ML models. In this survey paper, we provide a comprehensive review of current security issues in ZTNs, emphasizing the need for advanced AI/ML-based security mechanisms that require minimal human intervention and protect AI/ML models themselves. Furthermore, we explore the potential of Automated ML (AutoML) technologies in developing robust security solutions for ZTNs. Through case studies, we illustrate practical approaches to securing ZTNs against both conventional and AI/ML-specific threats, including the development of autonomous intrusion detection systems and strategies to combat Adversarial ML (AML) attacks. The paper concludes with a discussion of the future research directions for the development of ZTN security approaches.
β–³ Less"
,
46 "submissionDate": "28 February, 2025", 47 "comments": "Published in IEEE Transactions on Network and Service Management (TNSM); Code is available at Github link: https://github.com/Western-OC2-Lab/AutoML-and-Adversarial-Attack-Defense-for-Zero-Touch-Network-Security" 48 }, 49 ... 50]

Frequently Asked Questions

Is it legal to scrape job listings or public data?

Yes, if you're scraping publicly available data for personal or internal use. Always review Websute's Terms of Service before large-scale use or redistribution.

Do I need to code to use this scraper?

No. This is a no-code tool β€” just enter a job title, location, and run the scraper directly from your dashboard or Apify actor page.

What data does it extract?

It extracts job titles, companies, salaries (if available), descriptions, locations, and post dates. You can export all of it to Excel or JSON.

Can I scrape multiple pages or filter by location?

Yes, you can scrape multiple pages and refine by job title, location, keyword, or more depending on the input settings you use.

How do I get started?

You can use the Try Now button on this page to go to the scraper. You’ll be guided to input a search term and get structured results. No setup needed!