Github Repo Markdown Scraper

Transform GitHub repositories into a single, comprehensive markdown document effortlessly. Our tool streamlines analysis and processing, offering configurable file size limits, pattern filtering, and batch processing. Perfect for LLM AI prompts, it handles large repositories with ease.

AIDEVELOPER_TOOLSApify

Try Now →

GitHub Repository Markdown Scraper

This actor scrapes GitHub repositories and converts their contents into markdown format using the gitingest.com service. It's useful for documentation, analysis, or creating searchable content from GitHub repositories.

Features

Process multiple GitHub repository URLs
Configurable file inclusion/exclusion patterns
Adjustable maximum file size limit
Converts repository content to markdown format

Input Parameters

githubRepoUrls (required): Array of GitHub repository URLs to process
patternType (optional): Whether to "include" or "exclude" files matching the pattern (default: "exclude")
pattern (optional): Glob pattern for files to include/exclude (e.g., "*.md", "src/")
maxFileSizeKb (optional): Maximum file size in kilobytes to include in output (default: 50)

Output

The actor outputs JSON objects with the following structure for each processed repository:

1{
2    "url": "https://github.com/user/repo",
3    "markdownContent": "# Repository Content..."
4}

Example Usage

1{
2    "githubRepoUrls": [
3        "https://github.com/username/repository",
4        "https://github.com/username/another-repo/tree/main"
5    ],
6    "patternType": "include",
7    "pattern": "*.md",
8    "maxFileSizeKb": 100
9}

Frequently Asked Questions

Is it legal to scrape job listings or public data?

Yes, if you're scraping publicly available data for personal or internal use. Always review Websute's Terms of Service before large-scale use or redistribution.

Do I need to code to use this scraper?

No. This is a no-code tool — just enter a job title, location, and run the scraper directly from your dashboard or Apify actor page.

What data does it extract?

It extracts job titles, companies, salaries (if available), descriptions, locations, and post dates. You can export all of it to Excel or JSON.

Can I scrape multiple pages or filter by location?

Yes, you can scrape multiple pages and refine by job title, location, keyword, or more depending on the input settings you use.

How do I get started?

You can use the Try Now button on this page to go to the scraper. You’ll be guided to input a search term and get structured results. No setup needed!