To establish an operational image handling service at the iMagine platform that ingests, stores, processes images of marine water samples taken by the Zooscan instrument and uploads the resulting regions of interest to the EcoTaxa platform for later taxonomic Identification.
Zooscan – EcoTaxa pipeline
Development actions during iMagine
Objective and challenge
The objective of this project is to create a handling service on the iMagine platform for processing zooplankton images captured using the Zooscan. The service will ingest, store, and process images of marine water samples, and upload the resulting regions of interest to the EcoTaxa platform for taxonomic identification. The technology used in this project involves processing grayscale images of 356 megapixels using classical image segmentation and measurement methods, enhanced by neural network algorithms, specifically instance segmentation. EcoTaxa utilizes a combination of deep and classic machine learning techniques to predict likely identifications for the uploaded images, which can be validated through a dedicated user interface.
Currently, a technician responsible for digitizing plankton samples spends several hours manually handling and processing the images. They use custom software to correct processing errors and manually separate organisms that touch each other in the images to ensure accurate data. Importing and sorting the images taxonomically on EcoTaxa is also done manually, making the process tedious and lacking automation.To publish and analyze the dataset effectively, metadata such as observed volume, imaging instrument, and imaging settings need to be documented using controlled BODC vocabularies and included in a DarwinCore Archive (DwCA) file. However, researchers, such as plankton ecologists, often have limited time to look up and incorporate the necessary metadata, resulting in a need for better automation in data processing, management, and distribution.
The project will start by curating the training datasets and evaluating various instance segmentation models using the iMagine platform to separate organisms. Simultaneously, specifications for Zooprocess v2, a web application with features and image processing similar to the current Zooprocess software, will be developed. To address metadata challenges and improve compliance with DwCA conventions, relevant metadata will be collected during data acquisition and integrated into the processing pipeline, following the BODC mapping. The developed and trained model will be integrated into Zooprocess v2. Finally, EcoTaxa will be able to generate a DwCA file with accurate identifications and comprehensive metadata. Once the service is ready, it will be deployed, and users will receive training on its usage.
Plankton is an integral and vital component of pelagic food webs and provides many
ecosystem services, such as oxygen production and carbon storage. Plankton indicators are used within several descriptors of the MSFD and WFD. Datasets showing spatial and longterm trends in concentration of phyto- and zooplankton are essential to understand the dynamics of food availability for commercially exploited species and the effects of climate change. Their description at global scale yields an indication of the health of marine ecosystems and their response to anthropic stressors.