How to scrape real estate data from OLX India using the Minexa API
- Minexa.ai

- 3 days ago
- 2 min read
OLX India publishes thousands of active property listings daily, covering rentals, sales, PG accommodations, and commercial spaces across every major Indian city. For a developer building a property market dataset, that volume is exactly the challenge: the data is all there, publicly visible, but extracting it page by page manually is not a realistic option at scale.
This walkthrough follows a concrete scenario: setting up a repeatable pipeline that pulls structured property data from OLX India using the Minexa API. The two-phase approach means you train the scraper once using the Minexa Chrome extension, then call the API programmatically for every subsequent run.
Watch the full tutorial first
The video below covers the complete workflow from opening the Minexa extension to exporting structured results.
Phase 1: training the scraper
Open the Minexa home page and install the Chrome extension if you have not done so already.
Navigate to olx.in/properties_c3 and open the extension. Click I'm on the right page to confirm the target URL.
The extension detects pagination automatically. Review the detected next-page logic and click Continue.
Choose whether to scrape the listing page only, or follow each listing link to extract detail page data as well. For most property market datasets, the listing layer alone covers price, location, and configuration.
Select simple scraping mode, then let Minexa highlight the data container automatically.
After confirming the container, Minexa generates all extracted data points. Use the next/prev navigation to review every column before finalising.
Click API request to view the generated Python snippet and your scraper ID, then complete the configuration.
Phase 2: calling the API
Once the scraper is trained, every subsequent extraction is a single POST request. Note that pagination on OLX India must be handled by passing each page URL explicitly or via a JS scenario you define. The API does not auto-paginate.
Here is the request structure:
import requests
response = requests.post(
"https://api.minexa.ai/data",
headers={"Authorization": "Bearer YOUR_API_KEY"},
json={
"scraper_id": 5817,
"urls": ["https://www.olx.in/properties_c3"],
"columns": "top_40",
"threads": 3
}
)
print(response.json())
The scraper_id ties every call to the trained page structure. The columns parameter set to top_40 returns the highest-ranked fields without requiring you to name them upfront.
What the extracted data looks like
Each row in the response represents one property listing. Below are two labelled examples from a live extraction run:
[
{
"rental_price": "Rs 14,500",
"deposit_amount": "Rs 14,500",
"property_details": "3 BHK - 3 Bathroom - 1265 sqft",
"room_rent_description": "Fully furnished accommodation for bachelors",
"house_apartment_description": "/item/for-rent-houses-apartments-c1723...",
"featured_price": "Featured"
},
{
"rental_price": "Rs 9,200",
"deposit_amount": "Rs 9,200",
"property_details": "1 BHK - 1 Bathroom - 300 sqft",
"room_rent_description": "1 rk, 1 room kitchen in prime location",
"house_apartment_description": "/item/for-rent-houses-apartments-c1723...",
"featured_price": "Rs 9,200"
}
]
The deposit_amount field captures the security deposit separately from the monthly rent. The house_apartment_description field holds the relative path for each listing, which you can prepend with the OLX base domain to build direct listing URLs for downstream enrichment. The location_and_price array bundles the featured label, price, configuration, neighbourhood, and posting date as individual span elements.
For more on building real estate data pipelines, see: Scraping real estate data: what actually works and where most pipelines break.

Comments