top of page

How to scrape documents and filings data from OpenCorporates using the Minexa API

OpenCorporates publishes one of the most comprehensive indexes of company registers available publicly. The registers page at opencorporates.com/registers lists official corporate filing sources from jurisdictions worldwide, each with its own classification, location, and direct link. For developers building compliance tools, due diligence pipelines, or corporate intelligence feeds, getting that data into a structured format programmatically is the core challenge.

The Minexa API solves this with a two-phase approach: train a scraper once using the Minexa Chrome extension, then call the API to extract data at scale without repeating any setup.

Phase 1: training the scraper

Open the Minexa Chrome extension on the OpenCorporates registers page. The extension detects the repeating list structure automatically.

Navigate to opencorporates.com/registers and confirm the page has fully loaded before opening the extension.

Click I'm on the right page in the extension popup to confirm detection and proceed.

The extension surfaces the pagination structure it detected. For the API workflow, you will define pagination logic via a JS code scenario rather than relying on automatic handling.

Select your scraping mode and confirm. Choose the list extraction option to capture all register rows.

Highlight the full data container. Minexa locks onto the repeating row structure and identifies all data points within it automatically.

Once the scraper is created, you can review all extracted data points and copy your scraper ID. This ID is what you pass to the API in every subsequent call.

Click API Request to view the ready-to-run JSON and Python code samples generated for your scraper.

Phase 2: calling the Minexa API

With your scraper ID in hand, call the https://api.minexa.ai/data endpoint. Pass the scraper ID, the target URLs, and a top_20 columns parameter to retrieve the most relevant fields per row.

Sample extracted data

Here is what a structured extract from the OpenCorporates registers page looks like:

[
  {
    "institution_name": "A T Still University of Health Sciences",
    "institutional_classification": "Special Focus: Medical Schools and Centers",
    "location": "MO",
    "student_access_and_earnings": "Not Classified",
    "website_url": "https://opencorporates.com/registers/detail/1"
  },
  {
    "institution_name": "Aaniiih Nakoda College",
    "institutional_classification": "Professions-focused Associate Small",
    "location": "MT",
    "student_access_and_earnings": "Higher Access, Higher Earnings",
    "website_url": "https://opencorporates.com/registers/detail/2"
  }
]

The website_url field links directly to each register detail page, making it straightforward to chain a second extraction pass for deeper filing data. The classification and location fields are clean strings ready for filtering or grouping in any downstream pipeline.

Export to Excel or JSON directly from the interface, or consume the API response in your own pipeline without any intermediate step.

For a related tutorial covering a similar filings data source, see: How to scrape documents and filings data from Comcast using Minexa.ai.

Recent Posts

See All

Comments


Heading 2

bottom of page