How to scrape marine and aviation data from GISIS using the Minexa API
- Minexa.ai

- 8 hours ago
- 5 min read
The IMO's GISIS portal is one of the most comprehensive public sources of maritime regulatory data on the internet. Ship particulars, maritime security records, casualty reports, treaty statuses, port reception facilities, ballast water management data, and more than twenty other module categories are all publicly accessible from a single index page. The challenge is not finding the data. It is getting it out in a form you can actually use programmatically.
This guide covers how to build a GISIS data extraction pipeline using the Minexa API, a developer-oriented extraction tool that separates the one-time visual training step from the repeatable programmatic extraction. You train a scraper once using the Minexa Chrome extension, get a stable scraper ID, and then call the API as many times as needed from your own code.
Watch the full tutorial first
The video below walks through the complete workflow from opening the Minexa extension to running the API call and reviewing extracted data.
What GISIS exposes and why it matters for developers
The GISIS public index at gisis.imo.org/public/default.aspx lists every available data module in a structured grid. Each module entry carries a short code, a human-readable topic label, a keyword string used for internal search indexing, a relative path to the module's own data page, and an image source path. That combination of fields makes the index page a useful starting point for any pipeline that needs to discover or monitor which GISIS modules are available and where they link.
Modules cover domains including ship and company particulars (SHIPS), maritime security (ISPS), marine casualties and incidents (MCI), port reception facilities (PRF), status of treaties (ST), piracy and armed robbery (PAR), ballast water management (BWM), MARPOL Annex VI, STCW-related information, and more than fifteen additional categories. Each module links to its own sub-portal with deeper data.
Training the scraper: step-by-step reference
The training phase happens once in the browser. After that, everything runs through the API.
Step 1. Open the Minexa home page and install the Chrome extension if you have not already.
Step 2. Navigate to gisis.imo.org/public/default.aspx and open the Minexa extension from your Chrome toolbar. The extension will detect the page automatically.
Step 3. Confirm you are on the correct page by clicking the 'I'm on the right page' button in the extension popup.
Step 4. Review the pagination detection results and click Continue. The extension identifies how the page handles multi-page navigation so extraction covers all available records.
Step 5. Choose whether to scrape the module listing only, or to follow each module link and extract detail page data as well. For building a module index pipeline, scraping the list is sufficient.
Step 6. Select your scraping mode and confirm the job start settings.
Step 7. Highlight the data container. Minexa will identify the repeating module grid and extract all data points from each entry automatically.
Step 8. Review the extracted data points across all pages. At this stage you can verify that all module fields have been captured correctly before finalising the scraper configuration.
Step 9. Click 'API request' to view the generated JSON and Python code samples. Then click 'Complete Configuration' to finalise the scraper and receive your scraper ID.
Extracted fields: reference definitions
The following fields are returned per module entry from the GISIS index page. Field names below have the internal prefix removed for clarity.
module_code: The short uppercase identifier for each GISIS module, such as SHIPS, ISPS, MCI, PRF, ST, PAR, BWM, or MARPOL6. This value matches the path segment used in each module URL and is the most reliable key for joining GISIS data across sources.
topic_description: The human-readable label for each module, such as 'Ship and Company Particulars' or 'Marine Casualties and Incidents'. Useful as a display label in dashboards or as a category field in downstream datasets.
keyword_description: A flat space-separated string containing all search-relevant terms associated with the module. For the SHIPS module this includes terms like 'ihs', 'lloyds register', 'tonnage', 'flag', 'imo number', 'owner', 'manager', and 'beneficial operator'. For the CP module it includes references to PSC, casualty investigation, SOLAS, FAL, London Convention, and piracy. This field is particularly useful for building keyword indexes or search classification layers on top of the GISIS module catalog.
link_href: The relative path to each module's public data page, for example '/Public/SHIPS/Default.aspx'. Prepend the GISIS base domain to construct a fully qualified URL for downstream crawling of individual module pages.
image_source: The relative path to the module tile image, for example 'Shared/Images/Home-SHIPS.png'. Useful if you are building a visual interface or need to match module icons to their entries.
Sample extracted data
Below are three representative rows from a GISIS index extraction run, showing the fields described above.
[
{
"module_code": "SHIPS",
"topic_description": "Ship and Company Particulars",
"keyword_description": "ships ship and company particulars vessels ihs lloyds register tonnage flag imo number name owner manager group beneficial operator fleet",
"link_href": "/Public/SHIPS/Default.aspx",
"image_source": "Shared/Images/Home-SHIPS.png"
},
{
"module_code": "MCI",
"topic_description": "Marine Casualties and Incidents",
"keyword_description": "mci marine casualties and incidents casualty msc mepc accident safety collision explosion sinking capsize fire engine analysis",
"link_href": "/Public/MCIR/Default.aspx",
"image_source": "Shared/Images/Home-MCIR.png"
},
{
"module_code": "BWM",
"topic_description": "Ballast Water Management",
"keyword_description": "bwm ballast water management exemptions ballast water exchange areas additional measures warnings concerning ballast water uptakes",
"link_href": "/Public/BWM/Default.aspx",
"image_source": "Shared/Images/Home-BWM.png"
}
]
API call reference
Once the scraper is trained and you have your scraper ID, all subsequent extractions run through a single POST request to the Minexa API. No browser interaction is required after this point.
The Python snippet below shows the complete request structure. Replace the scraper ID with your own value from the training step.
import requests
url = "https://api.minexa.ai/data"
payload = {
"scraper_id": 6371,
"columns": "top_30",
"urls": [
"https://gisis.imo.org/public/default.aspx"
]
}
headers = {
"Content-Type": "application/json",
"Authorization": "Bearer YOUR_API_KEY"
}
response = requests.post(url, json=payload, headers=headers)
print(response.json())
A few things worth noting for production use. Pagination on GISIS must be handled explicitly: if the module index spans multiple pages in a future state, you will need to define a JS code scenario in your scraper configuration that instructs the API what to click to advance pages. The API does not follow pagination automatically the way the Chrome extension does. For recurring extraction across a fixed URL like the GISIS index, set up your own cron job to trigger the API call on whatever schedule your pipeline requires.
The columns parameter accepts either a named list of fields or a top-N shorthand. Using 'top_30' returns the thirty highest-ranked fields Minexa detected during training, which covers all primary GISIS module fields without needing to enumerate them manually.
After the job runs
Results are returned as structured JSON with one object per module row. Each object contains the fields described in the reference section above, plus metadata fields covering execution time, page number, and row identifiers. The data is ready to insert into a database, pass to a downstream transformation step, or export directly.
If you are also working with maritime regulatory filings data from other sources, the post on scraping documents and filings data using the Minexa API covers the same two-phase developer workflow applied to a different document-heavy source and is worth reading alongside this one.

Comments