Ensuring open data from landmark biodiversity projects and smaller genome sequencing initiatives are readily-available to the global scientific community

Graphic on blue background showing lots of species icons and graphic representations of genomic data
Genomic data for biodiversity research is produced and shared worldwide. Credit: Karen Arnott/EMBL-EBI

As major biodiversity projects increasingly use genomic sequencing to catalogue and understand species, EMBL-EBI is ensuring that the data generated are available in a Findable, Accessible, Interoperable, and Reusable (FAIR) way to the global scientific community. In 2022, these collaborations went from strength to strength.

The open data approach enables scientists everywhere to easily access information about the increasing number of species having their genome sequenced. The aim is to better understand these species, how they evolved, and how to best protect them for future generations. The home for this deluge of data is EMBL-EBI, with nucleotide data made available through the European Nucleotide Archive (ENA), genome assemblies accessible through the Ensembl genome browser, and additional data available through dedicated project portals.

European biodiversity reaching first major milestones

One of the first projects of this kind, the Darwin Tree of Life (DToL) initiative aims to sequence, assemble, and annotate the genomes of all 70,000 eukaryotic species in the UK and Ireland. In 2022, the project released its first 500 genome assemblies, with 296 of these annotated and accessible through Ensembl Rapid Release and the DToL Data Portal, both developed by EMBL-EBI.

The DToL Data Portal is an open access platform which brings together all DToL data. Newly-implemented features allow users to track the sequencing progress of their species of interest, and explore the species through an interactive phylogeny and a map of where samples have been collected.

The project is part of the Earth Biogenome Project, a global initiative that uses advances in genome sequencing, informatics, automation, and AI to understand and conserve biodiversity. EMBL-EBI is also supporting the European arm of the Earth Biogenome Project, called the European Reference Genome Atlas (ERGA).

Safeguarding African biodiversity through genomics

The African BioGenome Project (AfricaBP) is an Africa-led effort to sequence the genomes of plants, animals, fungi, and protists that are native to the continent. This is a collaboration between over 100 scientists and 20 research institutes in Africa, supported by EMBL through a Memorandum of Understanding with AfricaBP, which aims to help increase bioinformatics capacity across the continent. As such, Ensembl and EMBL-EBI provide support with the laborious task of genome annotation and co-create workshops and training opportunities for African scientists.

Data sharing policy for the future

The UN Biodiversity Conference and the 15th Conference of the Parties to the Convention on Biological Diversity (COP15), which took place in 2022, saw the adoption of the post-2020 global biodiversity framework – a roadmap to conserve and restore biodiversity during the next decade. In the run-up to these events, EMBL-EBI provided input and advice for the discussions. EMBL-EBI researchers also joined 41 scientists from 17 countries to explain why a policy solution on data derived from genetic resources is urgently needed and proposed a mechanism that would support biodiversity conservation while also better sharing the benefits of these data.

As discussions continue and more genomes are sequenced, EMBL-EBI is scaling up its data resources to support the increasing amount of data facilitating our understanding of biodiversity and powering new applications in the field of biotechnology, agriculture, and climate change.

This article was first published on EMBL News.