How to scrape environmental data from OpenEI using the Minexa.ai extension
- Minexa.ai

- 3 days ago
- 4 min read
OpenEI hosts one of the most comprehensive collections of publicly available energy and environmental datasets. The data is there, clearly listed, but getting it into a spreadsheet or a structured file requires clicking through pages and copying entries one by one. This walkthrough shows how to extract that data automatically using the Minexa.ai Chrome extension, with no code required.
Watch the extraction in action
Before going through the steps, watch the full tutorial below. It covers the entire workflow from installing the extension to exporting the final dataset.
What OpenEI contains
OpenEI (Open Energy Information) is a platform maintained to support energy research and policy work. Its data search page at data.openei.org/search lists hundreds of datasets covering topics like solar irradiance, wind resources, electricity rates, and building energy usage. Each entry includes a title, category tags, and a direct link to the dataset detail page.
For researchers, analysts, or anyone building energy-related tools, having this index in a structured format saves significant time compared to browsing page by page.
Step 1: Open Minexa.ai and navigate to OpenEI
Start by opening the Minexa.ai home page. If you have not installed the Chrome extension yet, you can get it directly from the Chrome Web Store.
Once the extension is active, navigate to data.openei.org/search. The page will load the full dataset listing, which is what Minexa.ai will detect and extract.
Step 2: Confirm the page and review pagination
Open the Minexa.ai extension popup. It will show a prompt asking you to confirm you are on the right page. Click the confirmation button to proceed.
Minexa.ai will then detect the pagination structure on the OpenEI search page automatically. You will see a list of the pagination method it identified. Review it and click Continue to move forward.
Step 3: Choose scraping mode
After confirming pagination, Minexa.ai asks whether you want to scrape the list page only, or also follow each result link and extract detail page data. For a full dataset index like OpenEI, scraping the list is typically enough to get titles, tags, and links.
You will then be prompted to choose between simple and advanced scraping modes. Simple mode works well for most standard list pages and is the recommended starting point.
Step 4: Highlight the data container and create the scraper
Minexa.ai automatically highlights the container holding the full list of dataset entries. You do not need to click individual fields. The extension recognises the repeating structure and identifies all relevant data points within it.
Click the create scraper button. Within a few seconds, all detected data points will appear in a structured preview.
What the extracted data looks like
Below is a sample of the structured output from the OpenEI extraction. Each row represents one dataset entry from the search results page.
[
{
"title": "Commercial and Residential Hourly Load Profiles",
"category": "Buildings",
"tags": "energy, load, hourly, commercial",
"link": "https://data.openei.org/submissions/153"
},
{
"title": "U.S. Solar Resource Data",
"category": "Solar",
"tags": "solar, irradiance, GHI, DNI",
"link": "https://data.openei.org/submissions/40"
},
{
"title": "Wind Integration National Dataset Toolkit",
"category": "Wind",
"tags": "wind, WIND toolkit, meteorological",
"link": "https://data.openei.org/submissions/54"
}
]The title field gives the full dataset name as listed on the page. The category field captures the primary topic area assigned to each entry. The tags field surfaces the keyword labels attached to each dataset, which are useful for filtering or grouping records downstream. The link field provides the direct URL to each dataset detail page, so you can follow up on any entry without going back to the search results manually.
Step 5: Complete configuration and run the job
After reviewing the extracted fields, complete the scraper configuration. The summary screen gives you the option to connect a Google Sheet directly or set up a recurring schedule so the extraction runs automatically at a defined interval.
Once saved, the scraper appears in your jobs list with a run button. Click it to start the extraction across all pages of the OpenEI search results.
Step 6: Review and export your data
While the job runs, results populate in a live table. Once complete, the full dataset is ready to export.
Export options include Excel, JSON, and Google Sheets. Each dataset entry appears as its own row, with fields in separate columns, ready to use in any analysis tool or pipeline.
Scheduling for ongoing monitoring
OpenEI adds new datasets regularly. If you want to track what gets published over time, the scheduling feature lets you run the same extraction automatically on a daily or weekly basis. Each run captures the current state of the search results, so you can build a running record of new additions without any manual effort after the initial setup.
If you work with other public data sources in the energy or environmental space, the same workflow applies to any structured listing page. For a related example, see how the same approach works for grant data: how to scrape non-profits and NGOs data from Grants.gov using the Minexa API.
To get started with your own OpenEI extraction, install the Minexa.ai extension and follow the steps above. The scraper trains once and can be reused or scheduled from that point forward.

Comments