top of page

How to extract course data from LeCEGEP using the Minexa API

LeCEGEP (lecegep.ca) is the central directory for continuing education and professional development courses offered across Quebec's CEGEP network. Hundreds of institutions list their programs there, covering everything from 3D design and database management to workplace safety and HR certification. The data is public and well-structured, but there is no export button and no official API.

If you need this data in a usable format, whether to map the continuing education landscape, track course availability by institution, or build a program comparison tool, you need a way to extract it reliably. This guide shows how to do that using the Minexa API.

What data is available on LeCEGEP

Each course listing on the formations page exposes the following fields:

  • Course title (e.g. '3ds Max', 'Access - niveau avancé')

  • Institution name (e.g. 'Cégep Limoilou', 'Cégep Marie-Victorin')

  • Course description (truncated preview text)

  • Language options (Français, Anglais, or both)

  • Course type (e.g. 'Formation de perfectionnement professionnel', 'Certification collégiale')

  • Detail page link (relative URL to the full course page)

Step 1: Train the scraper using the Minexa Chrome extension

Before making any API call, you train a scraper once through the browser. Install the Minexa Chrome extension, then navigate to https://www.lecegep.ca/fr/formations.

Open the extension and click 'I'm on the right page'. Minexa scans the page and presents the pagination options it detected.

Confirm pagination and choose whether to scrape the list only or follow each course detail link. For most use cases, the list data alone is sufficient. Then select the simple scraping scenario and let Minexa highlight the data container automatically.

Once you confirm the container, Minexa maps all extracted columns. Click 'API request' to reveal the generated code samples.

Note the scraper_id shown there. You will use it in every subsequent API call.

Step 2: Call the Minexa API

With the scraper trained, you can now extract data from any structurally similar LeCEGEP page by calling the API endpoint. Here is a ready-to-run Python example:

import requests

url = "https://api.minexa.ai/data"
headers = {
  "Content-Type": "application/json",
  "x-api-key": "YOUR_API_KEY"
}
payload = {
  "scraper_id": 6214,
  "columns": "top_40",
  "urls": ["https://www.lecegep.ca/fr/formations"],
  "scraping_params": {"js_render": True}
}
response = requests.post(url, json=payload, headers=headers)
print(response.json())

The columns parameter set to top_40 tells the API to return the 40 most relevant fields detected during training. You can also pass specific column names if you want a narrower output. Supply as many URLs as needed in the urls array to process multiple pages in one request.

Sample output

Here is what two records look like in the structured JSON response:

[
  {
    "course_title": "3ds Max",
    "institution_name": "Cégep Limoilou",
    "course_description": "À la hauteur des grandes productions comme Avatar...",
    "course_link": "/fr/formations/3d-studio-max",
    "language_options": "Français",
    "professional_development_course": "Formation de perfectionnement professionnel"
  },
  {
    "course_title": "Accréditation de professionnel de la paie au Québec (PPQ)",
    "institution_name": "Cégep Marie-Victorin (Centre de services aux entreprises)",
    "course_description": "Obtenez l'accréditation de Professionnel de la paie...",
    "course_link": "/fr/formations/accreditation-professionnel-paie-quebec-ppq",
    "language_options": "Français, Anglais",
    "professional_development_course": "Formation de perfectionnement professionnel"
  }
]

Video walkthrough

Reusing the scraper at scale

Once trained, the scraper works on any page that shares the same structure as the one used during setup. For LeCEGEP, that means every paginated formations page. Build a list of URLs covering the pages you need, pass them in batches through the API, and process the results in your own pipeline. Because each field is tied to a fixed DOM position, the output schema stays consistent across every run with no field-mapping drift to manage.

The Minexa API documentation covers pagination handling, batch request limits, and authentication in full detail.

Recent Posts

See All

Comments


Heading 2

bottom of page