top of page

How to extract clinical trials data from UCSD Clinical Trials using the Minexa API

Clinical research data published on institutional websites is some of the most structured, high-value information available publicly. Extracting it at scale, however, has traditionally required either manual copy-paste or fragile custom scrapers that break the moment a page layout changes.

This guide shows how to extract clinical trial listings from UCSD Clinical Trials (the official trial browser at clinicaltrials.ucsd.edu/browse/) using the Minexa API — a data extraction platform that replaces selector-writing and LLM-based parsing with a single visual training step followed by programmatic extraction at any scale.

Why UCSD Clinical Trials data matters

The UCSD Clinical Trials browse page lists active and recruiting studies across therapeutic areas, each with fields like trial title, condition, phase, status, eligibility criteria, and study locations. For researchers, CROs, or data teams building trial registries or competitive intelligence tools, this is a reliable and regularly updated source worth monitoring continuously.

Step 1: Train the scraper using the Chrome extension

Before any API call is possible, a scraper must be trained once via the Minexa Chrome extension. This takes roughly 2 to 5 minutes and does not require writing any selectors.

Navigate to the UCSD Clinical Trials browse page, open the Minexa extension, and confirm you are on the right page.

The extension detects the page structure and presents pagination options. Confirm the pagination logic and proceed.

Select the scraping mode, then click on the HTML container that holds the full list of trial entries. Minexa automatically highlights it and discovers all data points within it.

Once the scraper is created, click API Request in the top right. This generates a pre-built Python snippet with your scraper_id already filled in.

Watch the full walkthrough here:

Step 2: Call the Minexa API

With the scraper_id in hand, extraction becomes a single POST request. Pass your list of UCSD Clinical Trials URLs and the API handles fetching, rendering, and structured extraction.

import requests

url = "https://api.minexa.ai/data/"
api_key = "YOUR_API_KEY"

data = {
  "batches": [{
    "scraper_id": 6241,
    "columns": ["top_30"],
    "urls": [
      "https://clinicaltrials.ucsd.edu/browse/"
    ],
    "scraping": {
      "js_render": True,
      "timeout": 30,
      "js_code": [
        {"wait_time": 2},
        {"page_init": True},
        {"wait_time": 4}
      ],
      "proxy": "verified",
      "retry": 3
    }
  }],
  "threads": 4
}

headers = {"Content-Type": "application/json", "api-key": api_key}
response = requests.post(url, json=data, headers=headers)
print(response.json())

The columns: ["top_30"] parameter tells Minexa to return the top 30 ranked fields from the trained scraper. This ranking is deterministic, so it maps to the same fields consistently across every run — suitable for production pipelines.

Ready to set up your API key and start extracting? Get started on minexa.ai.

Sample extracted data

Here is what a structured response looks like for two clinical trial entries from UCSD Clinical Trials:

[
  {
    "trial_title": "A Study of Oral Medication in Adults With Type 2 Diabetes",
    "condition": "Type 2 Diabetes Mellitus",
    "phase": "Phase 3",
    "status": "Recruiting"
  },
  {
    "trial_title": "Cognitive Behavioral Therapy for Insomnia in Cancer Survivors",
    "condition": "Insomnia, Cancer",
    "phase": "Phase 2",
    "status": "Active, not recruiting"
  }
]

Handling nested fields

Some fields on the UCSD Clinical Trials page, such as study locations or eligibility tags, are returned as nested lists. Minexa returns these as a list of objects, each containing a value key. To flatten them in Python:

locations = [item["value"] for item in row["study_locations"]]

This is a deliberate design choice: Minexa preserves the original HTML structure rather than merging or inferring values. For clinical data where field boundaries matter — distinguishing inclusion criteria from exclusion criteria, for example — this prevents the silent field-merging errors that LLM-based extraction pipelines can introduce.

Reusing the scraper across all trial pages

Once trained, the scraper identified by its scraper_id works across every structurally similar page on UCSD Clinical Trials without modification. To process multiple pages, pass all target URLs in the urls array. Up to 50,000 URLs can be submitted in a single batch request.

For ongoing monitoring, set up a cron job on your own infrastructure that calls the Minexa API on a schedule with fresh URLs. The extraction logic itself never needs to change unless the page layout is redesigned.

The full API reference and additional scraping scenario configurations are available in the Minexa API docs. Install the Minexa Chrome extension to train your first scraper and generate the ready-to-run code in under ten minutes.

Recent Posts

See All

Comments


Heading 2

bottom of page