How to scrape tax and accounting data from ATO using the Minexa API
- Minexa.ai

- 10 hours ago
- 2 min read
Here is what the Minexa API returns when you point a trained scraper at the Australian Taxation Office tax rates and codes page. Start with the data, then work backward through how it was built.
What the extracted data looks like
Below are three rows from a live extraction of ato.gov.au/tax-rates-and-codes. Fields prefixed with meta__ are removed for clarity.
[
{
"announcement_dates": "24 June 2026",
"element_class": "",
"important_dates": [{"value": "24 June 2026", "tag": "div"}],
"page_titles": "Tax tables for 2024-25",
"tax_rates_description": "See tax tables for the 2024-25 financial year.",
"tax_rates_details": [
{"value": "Tax tables for 2024-25", "tag": "h2"},
{"value": "See tax tables for the 2024-25 financial year.", "tag": "p"}
],
"tax_rates_links": "/tax-rates-and-codes/previous-years-tax-tables/tax-tables-for-2024-25"
},
{
"announcement_dates": "19 March 2026",
"element_class": "",
"page_titles": "Company tax rates",
"tax_rates_description": "Company tax rates from 2001-02 to 2025-26.",
"tax_rates_links": "/tax-rates-and-codes/company-tax-rates"
},
{
"announcement_dates": "",
"element_class": "RclSearchResultsList_folded-result-item__t8ZBo",
"page_titles": "Tax rates 2025-26",
"tax_rates_description": "The following rates of tax apply to companies for the 2025-26 income year.",
"tax_rates_links": "/tax-rates-and-codes/company-tax-rates/tax-rates-2025-26"
}
]The element_class field is the key signal here. Rows where it contains folded-result-item are nested sub-pages under a parent topic. Rows where it is empty are top-level results. This lets you filter the dataset to only primary tax topics or drill into sub-pages depending on your use case.
The tax_rates_links field returns relative paths. Prepend https://www.ato.gov.au to build fully qualified URLs for any follow-up extraction.
Minexa is a template-trained extraction platform. You train a scraper once using the Chrome extension, then call the API repeatedly with any list of URLs that share the same page structure.
Watch the full tutorial
Training the scraper: step by step
Open the Minexa Chrome extension on the ATO tax rates page. The extension detects the page and prompts you to confirm.
The extension then shows detected pagination controls. The ATO results page uses hash-based pagination. When using the API, you will need to pass each paginated URL explicitly or write a JS code scenario to trigger page navigation.
Choose your scraping mode. For this pipeline, selecting the single list option is sufficient since all relevant data is on the listing page itself.
The extension highlights the data container automatically. Confirm it and click create scraper.
Once the scraper is created, all data columns are identified and displayed. Click through to the API request view to get your pre-generated Python code.
API request structure
Once your scraper is trained, call the Minexa API with your target URLs. Replace the scraper_id with the one generated during training.
import requests
url = "https://api.minexa.ai/data/"
api_key = "YOUR_API_KEY"
data = {
"batches": [{
"scraper_id": 6241,
"columns": ["top_30"],
"urls": [
"https://www.ato.gov.au/tax-rates-and-codes",
"https://www.ato.gov.au/tax-rates-and-codes#sortCriteria=%40dateupdated%20descending&firstResult=10"
],
"scraping": {
"js_render": True,
"timeout": 30,
"js_code": [{"wait_time": 2}, {"page_init": True}, {"wait_time": 4}],
"proxy": "verified",
"retry": 3
}
}],
"threads": 4
}
headers = {"Content-Type": "application/json", "api-key": api_key}
response = requests.post(url, json=data, headers=headers)
print(response.json())The threads parameter controls how many URLs are processed in parallel. Increase it based on your plan limit to reduce total run time across large URL sets.
Ready to build this pipeline? Install the Minexa Chrome extension to train your scraper, then use the API for all subsequent extractions.
For more on scraping government tax and financial data, see: How to scrape tax and accounting data from Avalara using the Minexa.ai extension.

Comments