Image analysis services for aquatic sciences

Discover the services the eight iMagine use cases are working on.

Mature use case

Marine litter assessment

Objective and challenge

This use case aims to create a functional service on the iMagine platform that can ingest, store, analyze, and process drone images to identify and classify floating litter in bodies of water and on beaches. The goal is to provide standardized data sets on litter for environmental management purposes. The technology behind this service involves using UAV surveys at different altitudes and employing two CNN deep neural networks to quantify and characterize the observed litter. This approach has been successfully applied in various countries through collaborations with the World Bank Group and NGOs, supporting local stakeholders and clean-up operations. The training subset of the model has been made available on Zenodo.

However, the current service needs more user-friendliness and requires several manual steps. To address these issues, the project aims to incorporate the following features into the service using the iMagine platform:

  1. Easy storage and access to custom data
  2. User-friendly API
  3. Ready-to-use environment (e.g., Docker)
  4. Information on image processing requirements
  5. Simplified usage of provided test data for retrained mode
  6. Documentation and step-by-step guides
Development timeline

The development timeline begins with integrating the existing AI model with the iMagine platform and configuring the API for basic inference. Data storage capabilities within the platform will be leveraged. The next step involves developing the service to enable authentication and retraining. More data will be added, and a wider range of models will be made available. The model output will be enhanced with additional metadata. Documentation and step-by-step guides will be created to assist users and promote the service. Once the updated service is accessible to customers, such as through the EOSC Marketplace, user feedback will be monitored and incorporated into future improvements.

Mature use case

Zooscan - EcoTaxa pipeline

Objective and challenge

The objective of this project is to create a handling service on the iMagine platform for processing zooplankton images captured using the Zooscan. The service will ingest, store, and process images of marine water samples, and upload the resulting regions of interest to the EcoTaxa platform for taxonomic identification. The technology used in this project involves processing grayscale images of 356 megapixels using classical image segmentation and measurement methods, enhanced by neural network algorithms, specifically instance segmentation. EcoTaxa utilizes a combination of deep and classic machine learning techniques to predict likely identifications for the uploaded images, which can be validated through a dedicated user interface.

Currently, a technician responsible for digitizing plankton samples spends several hours manually handling and processing the images. They use custom software to correct processing errors and manually separate organisms that touch each other in the images to ensure accurate data. Importing and sorting the images taxonomically on EcoTaxa is also done manually, making the process tedious and lacking automation.To publish and analyze the dataset effectively, metadata such as observed volume, imaging instrument, and imaging settings need to be documented using controlled BODC vocabularies and included in a DarwinCore Archive (DwCA) file. However, researchers, such as plankton ecologists, often have limited time to look up and incorporate the necessary metadata, resulting in a need for better automation in data processing, management, and distribution.

Development timeline

The project will start by curating the training datasets and evaluating various instance segmentation models using the iMagine platform to separate organisms. Simultaneously, specifications for Zooprocess v2, a web application with features and image processing similar to the current Zooprocess software, will be developed. To address metadata challenges and improve compliance with DwCA conventions, relevant metadata will be collected during data acquisition and integrated into the processing pipeline, following the BODC mapping. The developed and trained model will be integrated into Zooprocess v2. Finally, EcoTaxa will be able to generate a DwCA file with accurate identifications and comprehensive metadata. Once the service is ready, it will be deployed, and users will receive training on its usage.

Mature use case

UC3 Marine ecosystem monitoring at EMSO sites

Objective and challenge

The use case involves underwater video monitoring at three EMSO sites: EMSO-Obsea, EMSO-Azores, and EMSO-SmartBay. The aim is to establish an integrated service on the iMagine platform for the automatic processing of video imagery, enabling the identification and analysis of relevant images for ecosystem monitoring.

At the EMSO-Obsea site, there is significant unexploited image data collected from an underwater camera observing various fish species. Applying AI tools to these images would allow the extraction of valuable biological content, creating derived datasets that marine scientists can use to draw ecological conclusions. Manual analysis of the extensive dataset is time-consuming, and analyzing only a subset of the data would result in losing important information. Utilising the iMagine platform, a Deep Learning service will be trained and deployed to obtain species abundance data from existing and future images. These derived datasets will be crucial for studying species presence/absence over time and understanding changes in abundance in relation to environmental parameters, providing insights into the impact of climate change on the local fish community.

For the EMSO-Azores site, the imagery data collected by the observatory is being analyzed through the Deep Sea Spy platform, involving citizen participation. However, the data validation by experts is currently done manually and is time-consuming. Expanding the dataset with annotated and validated submarine images is essential for advancing marine science research. The iMagine AI platform will be used to develop and deploy AI models that can automatically annotate and validate images, improving the efficiency and accuracy of the process.

At the EMSO-SmartBay site, it is crucial to identify poor-quality video footage in real-time and within the Observatory Archive. Factors such as complete darkness, algal growth, suspended particulate matter reduction, and equipment failure can impact the utility of the observatory footage. Manual inspection of the video archive is time-consuming, and detecting interesting observations or occurrences, such as “Novelty” events or counting prawn burrows in the field of vision, is also laborious. The iMagine AI platform can aid in developing and deploying a service that enables quick detection of issues, efficient flagging and referencing of interesting footage, and the detection and enumeration of prawns and prawn burrows.

By leveraging the capabilities of the iMagine platform and implementing AI models, the project aims to automate and enhance the analysis of underwater video imagery at the EMSO sites, facilitating scientific research and improving ecological understanding.

Development timeline

The development roadmap for EMSO-OBSEA involves creating a workflow to automatically process underwater pictures in real-time, extracting fish abundance and taxa information. The workflow consists of two steps: segmentation and classification. Segmentation selects the regions of interest where fish specimens are present, and classification determines the taxa. The resulting abundance and taxa data will be compiled into time series datasets for easier analysis by scientists. From month 6 to 20, state-of-the-art segmentation and classification algorithms will be benchmarked and trained using the available dataset. Adjustments to the dataset may be required for optimal model performance. To improve prediction accuracy, dataset shifts caused by ambient variability will be investigated. Once a final model is developed and deployed, legacy data will be ingested, and the workflow will be put into production for real-time analysis of underwater images. The predictions will be used to extract information on the long-term biological rhythms of the fish community.

For the EMSO-Azores site, the roadmap focuses on creating a pipeline for images annotated in deepseaspy.com. This involves developing software to transform annotations into a suitable format for image segmentation using AI models. Existing labelling tools will be tested and utilised. Software tools for analyzing and validating training and test datasets will be developed or implemented. Multiple segmentation models will be trained using augmentation techniques. Motion detection and video segmentation algorithms will be investigated for species identification.

At the EMSO-SmartBay site, the roadmap begins with integrating the data into the platform and updating the workflows. Video data labelling will be performed to expand the training dataset. Various segmentation, object detection, and classification algorithms will be explored to identify poor-quality video footage. The concept of “Dataset shift,” caused by deteriorating video image quality over time due to factors like algae and dirt, will be investigated. Once a suitable AI solution is found, it will be integrated into the SmartBay service and available around month 20-22. Feedback will be collected, and the new functionality will be utilized.

Overall, the development roadmaps for each site involve training and deploying AI models, implementing data processing pipelines, and utilizing the iMagine platform to automate image analysis, improve data quality, and provide valuable insights for ecosystem monitoring and scientific research.

Mature use case

Oil spill detection

Objective and challenges

This use case wants to enhance the existing oil spill monitoring and forecasting system, OKEANOS, by establishing an operational service on the iMagine platform. This service will utilize satellite imagery to automatically detect oil spills and provide more accurate and localized oil spill forecasts. The technology behind OKEANOS relies on open and quality-controlled inputs such as meteo-oceanographic fields, bathymetry, and coastline geometry, as well as satellite imagery from the Sentinel 1, 2, and 3 constellations.

One of the challenges in oil spill forecasting is the lack of quantification of uncertainties, which stems from the absence of quality-controlled observations and well-established validation methods. The accuracy of oil spill forecasts is also affected by the limitations of ocean and atmospheric models in reproducing small-scale features. Implementing high-resolution meteo-oceanographic models to improve forecast accuracy is expensive and time-consuming. Therefore, this use case aims to leverage the AI platform to address these challenges by improving the algorithms for automatic oil spill detection and classification using satellite imagery and enhancing the accuracy of numerical oil spill forecasts.

Development timeline

The development timeline for this use case involves improving AI/ML algorithms, optimizing workflows, organizing existing data into downloadable collections, and working on high-resolution inputs and accurate forecasts. The accuracy of the forecasts will be quantified, and the data repository of marine products will be prepared in accordance with FAIR principles for easy search and discovery. The use case plans to release two versions of the services before reaching the milestone at 24 months.

Mature use case

Flowcam plankton identification

Objective and challenge

The use case focuses on establishing an operational service on the iMagine platform for analyzing and processing FlowCAM images to determine the taxonomic composition of phytoplankton samples. Phytoplankton plays a crucial role in the aquatic food web, so accurately identifying and classifying these organisms is important. The technology used for this purpose involves a deep learning image recognition algorithm based on a Convolutional Neural Network (CNN) and a NoSQL MongoDB database. The existing workflow will be enhanced using the iMagine AI platform.

Several challenges have been identified within the project, and the iMagine AI platform will be utilized to address them. These challenges include optimizing the data ingestion pipeline, improving metadata and data output formats to comply with community-based standards, enhancing the service to incorporate context input and increase classification accuracy, expanding the training dataset by identifying additional particles, and preparing the data and processing components for seamless integration with the iMagine platform. Additionally, the training set used in the Ecotaxa comparison will be made available, and similar models will be trained.

Development timeline

The development roadmap involves working on all three user stories throughout the project. The initial focus will be on data integration and model development. Once the model meets the minimum performance requirements, it will be provided to end users through the iMagine AI platform. Feedback from users will be collected to guide future iterations of the service and model development, ensuring continuous improvement.

Prototype

Underwater noise identification

Objective and challenge

Underwater sound is crucial for aquatic life and survival. It consists of various sources, including living organisms, natural phenomena, and human-made noises. The European Union’s Marine Strategy Framework Directive (MSFD) recognizes underwater noise as a pollutant. In this context, the iMagine platform is being utilized to create a prototype service for analyzing acoustic underwater recordings. The goal is to identify and recognize marine species as well as other sound types, such as shipping noises.

Currently, there is a collection of 1.5 years’ worth of underwater sound data, with ongoing data collection efforts. However, the process of processing this data and identifying the sources is time-consuming, requiring significant individual effort. Furthermore, the existing process lacks automation. To address these challenges, the use case will leverage the iMagine AI platform and the project’s expertise to enhance data labeling and explore different AI techniques for sound recognition and identification.

Development timeline

To develop the solution, the use case begins with importing raw sound data into their database (MongoDB). They enhance the labeling and validation interface to streamline the process of labeling the data, ensuring greater efficiency in preparing the training dataset. Once they have a sufficient amount of labeled data, they proceed to develop, train, and validate multiple AI models. Once they identify a high-performing AI model, they focus on automating the sound identification process.

Prototype

Beach monitoring

Objective and challenge

Since 2011, SOCIB has been conducting systematic and continuous monitoring of beaches using cameras, generating valuable long-term time-series data. This data is accessible to scientists, coastal management authorities, and citizens, and is currently utilized for shoreline tracking. In the iMagine project, we aim to develop a prototype service that processes video images from beach cameras to monitor the formation and dismantling of seagrass beach berms (Posidonia oceanica) and detect rip currents.

Analyzing the existing system, we identified gaps and bottlenecks. Currently, the shoreline position is manually extracted from the SIRENA video-monitoring system at infrequent intervals (~15 days), even though images are captured multiple times per day. No other beach features are extracted, despite the potential to obtain valuable information about biogeophysical and socioeconomic processes, such as identifying Posidonia berms, rip currents, determining beach width, swash zone, and run-up. By utilizing Deep Learning (DL) techniques for image segmentation, we can automatically extract information on important features like sand, water, white scum, Posidonia berms, humans, and vessels. This automation would allow for shoreline extraction at almost all available timestamps and characterization of Posidonia berms, among other essential aspects of beach monitoring and management. Additionally, DL applied to object identification could aid in the identification of rip currents, which is crucial for emergency services, forecasting models, and early-warning systems.

Development timeline

Our development roadmap begins with enhancing metadata and selecting appropriate deep learning and labeling methods for data preparation and labeling. We then transfer the training dataset to the iMagine AI platform to commence model development, training, and validation. Once the model is ready, we integrate it into the prototype service.

Prototype

Freshwater diatoms identification

Objective and challenge

Diatoms are unicellular microalgae found in various aquatic environments. They are commonly used as bioindicators to assess the ecological health of freshwater bodies like rivers and lakes, as mandated by the EU Water Framework Directive (WFD). The identification of diatom species relies on examining their silica exoskeletons under a microscope using classical light microscopy. However, quantifying important morphological features, such as size and deformations, which are crucial for bioindication, is currently a laborious and time-consuming task.

To address this challenge, the use case aims to develop a prototype diatom-based bioindication service that not only identifies diatom species but also quantifies key morphological features using automatic pattern recognition algorithms on microscope images. The iMagine AI platform will be leveraged for this development.

Currently, the identification of diatom species involves subjective and time-consuming manual processes, susceptible to biases related to operator experience and image quality. By standardizing and automating this process using AI, the accuracy and efficiency can be significantly improved.

Development timeline

A proof of concept has already been developed using a synthetic dataset with a limited number of diatom images. The objectives of the use case include building an end-to-end pipeline for detection, classification, and quantification of diatom traits, utilizing performance metrics relevant to diatom experts. This involves creating a comprehensive and quality-controlled dataset for fine-tuning the convolutional neural networks (CNNs) and deploying the service on the iMagine AI platform.

The development roadmap encompasses establishing an annotation workflow for labeling real microscope images acquired during the project. This annotation process will expand the training sets for diatom classification (currently around 150 species) and create training sets for segmentation, which is necessary for quantifying traits like size and deformations. Concurrently, model development will focus on refining the existing end-to-end pipeline for diatom classification (using a probabilistic approach) and exploring alternative AI approaches for quantifying diatom morphological traits. Once the models are validated, they will be transferred to the iMagine platform, and the prototype service will be implemented.