How to scrape environmental data from OpenEI using the Minexa.ai extension

Minexa.ai
3 days ago
4 min read

OpenEI hosts one of the most comprehensive collections of publicly available energy and environmental datasets. The data is there, clearly listed, but getting it into a spreadsheet or a structured file requires clicking through pages and copying entries one by one. This walkthrough shows how to extract that data automatically using the Minexa.ai Chrome extension, with no code required.

Watch the extraction in action

Before going through the steps, watch the full tutorial below. It covers the entire workflow from installing the extension to exporting the final dataset.

https://www.youtube.com/watch?v=Hpixz0mpI9I

What OpenEI contains

OpenEI (Open Energy Information) is a platform maintained to support energy research and policy work. Its data search page at data.openei.org/search lists hundreds of datasets covering topics like solar irradiance, wind resources, electricity rates, and building energy usage. Each entry includes a title, category tags, and a direct link to the dataset detail page.

For researchers, analysts, or anyone building energy-related tools, having this index in a structured format saves significant time compared to browsing page by page.

Step 1: Open Minexa.ai and navigate to OpenEI

Start by opening the Minexa.ai home page. If you have not installed the Chrome extension yet, you can get it directly from the Chrome Web Store.

Install the Minexa.ai Chrome extension

Once the extension is active, navigate to data.openei.org/search. The page will load the full dataset listing, which is what Minexa.ai will detect and extract.

Step 2: Confirm the page and review pagination

Open the Minexa.ai extension popup. It will show a prompt asking you to confirm you are on the right page. Click the confirmation button to proceed.

Minexa.ai will then detect the pagination structure on the OpenEI search page automatically. You will see a list of the pagination method it identified. Review it and click Continue to move forward.

Step 3: Choose scraping mode

After confirming pagination, Minexa.ai asks whether you want to scrape the list page only, or also follow each result link and extract detail page data. For a full dataset index like OpenEI, scraping the list is typically enough to get titles, tags, and links.

You will then be prompted to choose between simple and advanced scraping modes. Simple mode works well for most standard list pages and is the recommended starting point.

Step 4: Highlight the data container and create the scraper

Minexa.ai automatically highlights the container holding the full list of dataset entries. You do not need to click individual fields. The extension recognises the repeating structure and identifies all relevant data points within it.

Click the create scraper button. Within a few seconds, all detected data points will appear in a structured preview.

What the extracted data looks like

Below is a sample of the structured output from the OpenEI extraction. Each row represents one dataset entry from the search results page.

[
  {
    "title": "Commercial and Residential Hourly Load Profiles",
    "category": "Buildings",
    "tags": "energy, load, hourly, commercial",
    "link": "https://data.openei.org/submissions/153"
  },
  {
    "title": "U.S. Solar Resource Data",
    "category": "Solar",
    "tags": "solar, irradiance, GHI, DNI",
    "link": "https://data.openei.org/submissions/40"
  },
  {
    "title": "Wind Integration National Dataset Toolkit",
    "category": "Wind",
    "tags": "wind, WIND toolkit, meteorological",
    "link": "https://data.openei.org/submissions/54"
  }
]

The title field gives the full dataset name as listed on the page. The category field captures the primary topic area assigned to each entry. The tags field surfaces the keyword labels attached to each dataset, which are useful for filtering or grouping records downstream. The link field provides the direct URL to each dataset detail page, so you can follow up on any entry without going back to the search results manually.

Step 5: Complete configuration and run the job

After reviewing the extracted fields, complete the scraper configuration. The summary screen gives you the option to connect a Google Sheet directly or set up a recurring schedule so the extraction runs automatically at a defined interval.

Once saved, the scraper appears in your jobs list with a run button. Click it to start the extraction across all pages of the OpenEI search results.

Step 6: Review and export your data

While the job runs, results populate in a live table. Once complete, the full dataset is ready to export.

Export options include Excel, JSON, and Google Sheets. Each dataset entry appears as its own row, with fields in separate columns, ready to use in any analysis tool or pipeline.

Scheduling for ongoing monitoring

OpenEI adds new datasets regularly. If you want to track what gets published over time, the scheduling feature lets you run the same extraction automatically on a daily or weekly basis. Each run captures the current state of the search results, so you can build a running record of new additions without any manual effort after the initial setup.

If you work with other public data sources in the energy or environmental space, the same workflow applies to any structured listing page. For a related example, see how the same approach works for grant data: how to scrape non-profits and NGOs data from Grants.gov using the Minexa API.

To get started with your own OpenEI extraction, install the Minexa.ai extension and follow the steps above. The scraper trains once and can be reused or scheduled from that point forward.

Get the Minexa.ai Chrome extension

Minexa.ai

How to scrape environmental data from OpenEI using the Minexa.ai extension

Watch the extraction in action

What OpenEI contains

Step 1: Open Minexa.ai and navigate to OpenEI

Step 2: Confirm the page and review pagination

Step 3: Choose scraping mode

Step 4: Highlight the data container and create the scraper

What the extracted data looks like

Step 5: Complete configuration and run the job

Step 6: Review and export your data

Scheduling for ongoing monitoring

Recent Posts

Comments

Heading 2

Minexa.ai

Company

About us

How it works

Pricing

Affiliates

Product

Privacy Policy & GDPR

Terms of Services

Cookies Policy

Cookies Preferences

Support

Api docs

Contact us

Find By Category

Latest Blog Posts

Find By Tag