How to scrape marketplace listing data and why price monitoring tools need it
- Minexa.ai

- 1 day ago
- 6 min read
Marketplace listing pages contain exactly the data that price monitoring tools are built around: current price, original price, discount percentage, seller name, availability status, shipping cost, and condition. The problem is not that the data is hidden. It is visible on every page. The problem is extracting it consistently across thousands of listings, across multiple sources, without the extraction layer introducing errors or requiring constant maintenance.
This post walks through how to do that using the Minexa API, from training a scraper on a single listing page to running batch extraction across a large URL set programmatically.
What data is available on a marketplace listing page
A typical marketplace item detail page exposes a predictable set of fields. Depending on the platform, you can generally extract:
Current listed price
Original or crossed-out price
Discount label or percentage
Seller or merchant name
Product title and category
Availability or stock status
Shipping cost and estimated delivery
Condition (new, used, refurbished)
Rating and review count
SKU or item identifier
For price monitoring specifically, the fields that matter most are current price, original price, availability, and seller. These four, tracked consistently over time across many listings, form the foundation of any competitive pricing dataset.
Why extraction accuracy matters more than extraction speed here
Marketplace pages often display multiple price values on the same page: a sale price, a list price, a bulk discount, a per-unit price. If your extraction layer cannot distinguish between these at the DOM level, you end up with swapped values in your dataset. A sale price recorded under the list price column, or vice versa, corrupts every downstream calculation that depends on it.
LLM-based extraction approaches are particularly prone to this. When two numeric values appear in similar visual positions on a page, a model parsing the text may assign them to the wrong fields based on proximity rather than structure. Minexa avoids this by binding each column to a specific DOM element identified during the training step. The same field always maps to the same structural position, regardless of how similar the surrounding values look.
Step 1: Train a scraper on a listing page
Install the Minexa Chrome extension, then open any item detail page on the marketplace you want to monitor. These pages typically follow a URL pattern like marketplace.com/item/123.
In the extension, click Create Custom Scenario and select Detail Mode. This tells Minexa you are working with individual listing pages rather than a category or search results list.
Hover over the main content block on the page, which is the HTML container wrapping the product title, price section, seller info, and availability. Click to select it. You are selecting the parent wrapper, not individual fields. Minexa identifies all the data points inside that container automatically.
Click Create Scraper and wait a couple of minutes. Once the scraper is created, all relevant columns are discovered and named. Click through until the job is saved, then click API Request in the top right. This generates ready-to-run Python code with your scraper ID already included. Copy it.
The entire training process takes roughly two to five minutes. The scraper you just created can now be reused across every structurally similar listing page on that marketplace without any additional setup.
Step 2: Call the API with your listing URLs
Once you have a scraper ID, batch extraction is a single POST request. Here is the base structure:
POST https://api.minexa.ai/data/
{
"batches": [
{
"scraper_id": 4817,
"columns": ["top_30"],
"urls": [
"https://marketplace.com/item/1001",
"https://marketplace.com/item/1002",
"https://marketplace.com/item/1003"
],
"scraping": {
"js_render": true,
"timeout": 30,
"js_code": [
{ "wait_time": 2 },
{ "page_init": true },
{ "wait_time": 4 }
],
"proxy": "verified",
"retry": 3
}
}
],
"threads": 5
}A few things worth noting about this request:
scraper_id is the stable identifier generated when you trained the scraper. Every API call that processes listing pages from that marketplace references this same ID. If you pass a URL from a different page type by mistake, the API returns an explicit error rather than silently extracting wrong data.
columns set to top_30 returns the thirty highest-ranked fields Minexa identified on the page. For price monitoring, you can also pass an explicit list like ["price", "original_price", "availability", "seller_name"] once you know which column names were assigned during training. Both approaches cost the same.
js_render: true is necessary for most marketplace pages since prices and availability are often loaded dynamically. If the page has strong anti-bot protection, you may need to switch the provider setting to service2 or use residential proxies. The extension shows pre-built scraping configurations you can copy directly for common cases.
threads controls how many pages are processed in parallel. Higher values reduce total run time proportionally.
Step 3: Handle the response and checkpoint your data
The API returns results in pages. Each response includes a next token in the meta field when more results are available. The Python script generated by the extension already handles this loop and writes checkpoint files to JSON, CSV, and Excel after each iteration, so no data is lost if the run is interrupted.
The core loop looks like this:
while not started or next_set:
if next_set:
data["next"] = next_set
response = requests.post(url, json=data, headers=headers)
json_content = response.json()
for extraction in json_content["response"]:
for rows in extraction["results"]:
if rows.get("error"):
continue
iterated_data.append({
k: v if isinstance(v, str) else [x["value"] for x in v]
for k, v in rows.items()
})
next_set = json_content.get("meta", {}).get("next")Each row in the output corresponds to one listing URL. Columns are consistent across every row because the same DOM selectors are applied to every page processed with that scraper ID.
If you want to run extraction on a recurring basis using your own URL list, set up a cron job that calls this endpoint with updated URLs on whatever cadence your monitoring requires. The scraper ID stays the same across every run unless the marketplace redesigns its listing page structure.
View the full API documentation to explore all available scraping parameters and response formats.
What price monitoring tools can do with this data
Structured marketplace data extracted this way feeds directly into the core workflows that price monitoring products are built around.
Price change detection. Run extraction across the same set of listing URLs daily or more frequently. Compare current price against the previous run. When a value changes, trigger an alert. Because Minexa extraction is deterministic, the same field always maps to the same DOM element, so you are comparing like for like on every run.
Discount tracking. Extract both the current price and the original price as separate columns. Calculate the actual discount independently rather than relying on the label shown on the page, which may be rounded or formatted inconsistently across sellers.
Availability monitoring. Track stock status fields alongside price. A listing that drops in price while showing low stock signals different behavior than one that drops in price with full availability. Having both fields in the same structured row makes this analysis straightforward.
Seller benchmarking. When multiple sellers list the same item, extract seller name alongside price and condition. Over time, this builds a dataset showing which sellers consistently undercut the market and by how much.
Cross-marketplace comparison. Train a separate scraper for each marketplace. Each scraper produces output with consistent column names. Normalize the price and availability columns across scrapers to build a unified view of the same product across different platforms.
What happens when a marketplace updates its page layout
When a marketplace changes its listing page structure, the existing scraper will begin returning null values or explicit errors on the affected fields. This is intentional. Minexa fails loudly rather than silently returning wrong data, which means your monitoring pipeline gets a clear signal that retraining is needed rather than quietly accumulating incorrect price records.
Retraining takes the same two to five minutes as the original setup. Open an affected listing page in the extension, select the updated container, create a new scraper. The new scraper gets a new ID. Update the scraper_id in your API request body and verify that the column names you rely on have not changed. That is the full maintenance cycle.
For price monitoring tools where data accuracy directly affects product decisions, this explicit failure behavior is more useful than a system that continues running and returns plausible but incorrect values without any error signal.
The Minexa API documentation covers all available parameters, scraping configurations, and response structures in detail. Start with the Chrome extension to train your first scraper on a listing page, then move the extraction into your pipeline using the generated Python code as a base. You can explore the full setup at minexa.ai.
For more on scraping structured data from detail pages using the Minexa API, see: How to scrape real estate data from OLX India using the Minexa API.

Comments