top of page

How to scrape developer and API data from GitLab using Minexa.ai

GitLab hosts millions of public projects, and the explore page at gitlab.com/explore/projects surfaces all of them in a structured listing updated in real time. For anyone tracking open-source activity, monitoring API project ecosystems, or building developer intelligence datasets, that listing is a rich and largely untapped data source. The challenge is that browsing it manually does not scale.

This guide shows how to extract structured project data from GitLab using the Minexa.ai Chrome extension, with no code required.

What you get from the GitLab explore page

Each project card on the GitLab explore listing contains more structured information than it appears at first glance. Beyond the project name and description, the page exposes avatar image URLs, project path slugs, topic badge links, open issue counts, open merge request counts, star and fork counts, ISO 8601 creation timestamps, human-readable time-since-posted labels, and unique numeric project identifiers. Minexa.ai surfaces all of these automatically when you point it at the page container.

Video tutorial

Watch the full extraction walkthrough before going through the steps below.

Step-by-step extraction walkthrough

Start by installing the Minexa.ai Chrome extension if you have not already. Then open the GitLab explore projects page in your browser.

Click the Minexa.ai extension icon in your browser toolbar. The extension popup opens and asks you to confirm you are on the right page. Click I'm on the right page to proceed.

Minexa.ai detects the pagination structure on the GitLab listing automatically. You will see the detected pagination logic displayed in the popup. Click Continue to confirm it and move to the next step.

After confirming pagination, the extension presents your scraping mode options. For a full listing extraction, the default list mode is the right choice here.

Select your preferred mode and proceed. The extension then highlights the full data container on the GitLab page automatically, showing you exactly which section it will extract from.

Once the container is confirmed, Minexa.ai identifies all data points within it and displays them. Click Create Scraper and wait a moment while the scraper is generated. All columns are discovered and named automatically.

After the scraper is created, you can review the job summary. From here you can connect Google Sheets for live output or set up a recurring schedule to capture the GitLab listing at regular intervals.

Run the job and the extracted data appears in a table view. Once complete, you can export to Excel or JSON directly from the results screen.

What the extracted data looks like

Below is a sample of two records extracted from the GitLab explore projects listing, with internal prefixes removed for readability.

[
 {
 "project_name": "Ultralytics / yolo-flutter-app",
 "project_url": "/ultralytics/yolo-flutter-app",
 "description": "Flutter plugin for Ultralytics YOLO",
 "topics": ["yolo", "ultralytics"],
 "open_issues": "1 open issue",
 "merge_requests": "1 open merge request",
 "stars_and_forks": "2 stars, 0 forks",
 "timestamp_created": "2026-06-12T18:02:43Z",
 "time_since_posted": "10 seconds ago",
 "project_item_id": "projects-list-item-74599003"
 },
 {
 "project_name": "Filters Heroes / KAD",
 "project_url": "/FiltersHeroes/KAD",
 "description": "Filtry do uBlocka Origin i AdGuarda...",
 "topics": [],
 "open_issues": "0 open issues",
 "merge_requests": "0 open issues",
 "stars_and_forks": "0 stars, 0 forks",
 "timestamp_created": "2026-06-12T18:02:35Z",
 "time_since_posted": "18 seconds ago",
 "project_item_id": "projects-list-item-25806645"
 }
]

Key fields worth noting

The timestamp_created field returns an ISO 8601 datetime string per project. This makes it straightforward to filter projects by creation date or build a time-series view of new repository activity on GitLab. The project_item_id field exposes a unique numeric identifier embedded in the DOM element ID for each project card, which can serve as a stable key for deduplication across repeated runs. The stars_and_forks field encodes both engagement signals in a single aria-label string, giving you community traction data without needing to visit each project page. Topic badge links in the project_details object include the full explore URL per topic, so you can pivot directly to topic-filtered listings for deeper category analysis.

The time_since_posted label reflects the relative time shown on the page at the moment of extraction, which is useful for understanding how recently a project was pushed when combined with the absolute timestamp_created value.

Export and scheduling

Once a scraping job finishes, results can be exported as Excel or JSON from the results view. For ongoing monitoring of the GitLab explore listing, the scheduling feature lets you set a recurring cadence so new project data is captured automatically without reopening the extension each time.

The scraper trained on this page can be reused across any structurally similar GitLab listing URL, including topic-filtered or language-filtered explore pages, without any additional setup.

Install the Minexa.ai Chrome extension and run your first GitLab extraction in under ten minutes.

Recent Posts

See All

Comments


Heading 2

bottom of page