top of page

How to extract jobs data from Jora using the Minexa API

Collecting job market data from Jora manually is slow, inconsistent, and breaks the moment you need more than a handful of listings. The Minexa API gives developers a repeatable, structured extraction pipeline that works across thousands of pages without writing a single selector.

This guide covers the full workflow: training a scraper on Jora, retrieving your scraper_id, and calling the API to extract job listings at scale.

What data Jora exposes

Each listing on a Jora search results page contains a consistent set of fields: job title, company name, location, salary, employment type, job description snippet, requirements, and posting date. These fields are what the Minexa scraper captures and returns as structured JSON on every extraction run.

Step 1: Train the scraper via the Chrome extension

Before making any API call, you need a scraper_id. This is generated once through the Minexa Chrome extension by pointing it at a Jora listing page and selecting the HTML container that holds the full job list.

Navigate to a Jora entry-level jobs page such as https://au.jora.com/Entry-Level-jobs-in-Sydney-NSW, open the extension, and follow the steps below.

Once you click Create Scraper, Minexa analyzes the container and automatically identifies all data points within it. No XPath or CSS selectors needed.

Click API Request in the top right to get your pre-generated Python code including your scraper_id. Copy it directly.

Video walkthrough

Step 2: Call the API with your scraper_id

Once your scraper is trained, pass your scraper_id and a list of Jora URLs to the POST https://api.minexa.ai/data/ endpoint. The request body follows this structure:

{
 "batches": [
 {
 "scraper_id": 4712,
 "columns": ["top_30"],
 "urls": [
 "https://au.jora.com/Entry-Level-jobs-in-Sydney-NSW",
 "https://au.jora.com/Entry-Level-jobs-in-Sydney-NSW?p=2"
 ],
 "scraping": {
 "js_render": true,
 "timeout": 30,
 "js_code": [
 { "wait_time": 2 },
 { "page_init": true },
 { "wait_time": 4 }
 ],
 "proxy": "verified",
 "retry": 3
 }
 }
 ],
 "threads": 5
}

The columns parameter accepts top_n notation. Using top_30 returns the 30 highest-ranked fields Minexa identified during training. The ranking is deterministic, so the same value always maps to the same set of columns. You can also pass explicit column names if you only need specific fields.

Note: pagination on Jora is not handled automatically via the API. You need to build your own URL list covering each page you want to extract, then pass all URLs in the urls array or manage them via a cron job calling the API in batches.

Sample output

Here is what the API returns for two Jora listings:

[
 {
 "job_title": "Entry Level Fire Systems Tester",
 "company_name": "Adept Group",
 "job_location": "Sydney CBD NSW",
 "posted_date": "Posted 3d ago",
 "job_description": "Team player with strong focus on safety and good communication",
 "job_requirements": "Experience in fire systems testing and minor repair works required"
 },
 {
 "job_title": "Settlements Officer (Entry Level)",
 "company_name": "SG Fleet",
 "job_location": "Sydney NSW",
 "posted_date": "Posted 2d ago",
 "job_description": "Maintain paperwork, privacy requirements, and meet month-end deadlines",
 "job_requirements": "Liaise with stakeholders and Financier for lease processing"
 }
]

Some fields like job_descriptions are returned as nested arrays of objects. Each object includes a value, tag, type, and attribute. To extract just the text values in Python: values = [item['value'] for item in row['job_descriptions']].

Reusing the scraper across all Jora pages

A scraper trained on one Jora listing page works on any structurally identical page. Pass as many URLs as needed in the urls array. Up to 50,000 URLs can be submitted in a single batch request. The same scraper_id handles all of them without modification.

If a URL does not match the structure the scraper was trained on, the API returns an explicit error rather than silently returning wrong data. This makes the pipeline straightforward to validate.

The full API documentation is available at minexa.stoplight.io/docs/minexa. To explore plans and credit limits, visit minexa.ai.

Recent Posts

See All

Comments


Heading 2

bottom of page