top of page

How to scrape documents and filings data from Eli Lilly using the Minexa API

SEC filings are public, structured, and updated regularly. But pulling them into a usable dataset from Eli Lilly's investor portal manually is slow work. This walkthrough shows how to build a repeatable extraction pipeline using the Minexa API.

The scenario: monitoring Eli Lilly SEC filings at scale

Imagine you need to track every new filing that appears on the Eli Lilly SEC filings page and feed that data into a downstream system automatically. You need filing types, dates, document titles, and links, delivered as clean JSON on demand. That is exactly what this pipeline covers.

The Minexa API workflow splits into two phases: train a scraper once using the Chrome extension, then call the API repeatedly with that scraper ID. You never repeat the setup.

Phase 1: training the scraper

Open the Minexa Chrome extension and navigate to the Eli Lilly SEC filings page. The extension detects the page structure automatically.

Once the extension opens, confirm you are on the correct page. Minexa will then detect pagination and show you what it found before proceeding.

After confirming pagination, choose between scraping the list view or following linked detail pages. For a filings index, the list view captures all the key fields you need.

Select simple or advanced mode, then let Minexa highlight the data container automatically. Click 'Create scraper' and review the extracted data points.

Click 'API request' to get your scraper ID and ready-to-run code samples in JSON and Python.

Phase 2: calling the Minexa API

With your scraper ID in hand, you can now call the Minexa API from any environment. Note that pagination on Eli Lilly's filings page must be handled via a JS code scenario you define in the request. The API does not auto-click pagination the way the extension does.

Here is a minimal Python example using the https://api.minexa.ai/data endpoint:

import requests

headers = {"Authorization": "Bearer YOUR_API_KEY"}
payload = {
  "scraper_id": 4817,
  "columns": "top_40",
  "urls": ["https://investor.lilly.com/financial-information/sec-filings"],
  "threads": 3
}
response = requests.post("https://api.minexa.ai/data", json=payload, headers=headers)
print(response.json())

Set up your own cron job to call this endpoint on whatever schedule fits your monitoring needs. Each run returns the current state of the filings index as structured JSON.

Read the full Minexa API documentation to explore all available request parameters.

What the extracted data looks like

The scraper surfaces filing metadata as clean, structured records. Each row corresponds to one filing entry on the page:

[
  {
    "filing_type": "10-Q",
    "filing_date": "2024-08-06",
    "filing_title": "Quarterly Report",
    "filing_link": "https://investor.lilly.com/..."
  },
  {
    "filing_type": "8-K",
    "filing_date": "2024-07-30",
    "filing_title": "Current Report",
    "filing_link": "https://investor.lilly.com/..."
  }
]

Once the job runs, your data is available in table form and ready to export as Excel or JSON.

For a similar filings pipeline built on a different source, see how to scrape documents and filings data from OpenCorporates using the Minexa API.

Recent Posts

See All

Comments


Heading 2

bottom of page