This guide cannot capture the breadth and depth of available data out there, but is intended as a starting point. These are searchable collections of databases indexed by topic - they will be the most timely sources to check.
The Dataset Catalog is a catalog of biomedical datasets from various repositories for users to search, discover, retrieve, and connect with datasets to accelerate scientific research. This beta version aims to collect user feedback to inform future product development.
List of NIH-supported data repositories and resources that aggregate information about biomedical data. Each entry has a brief description of the repository and links to data submission and access policies.
The 2024 Nucleic Acids Research database issue contains 180 papers from across biology and neighbouring disciplines. There are 90 papers reporting on new databases and 83 updates from resources previously published in the Issue. Updates from databases most recently published elsewhere account for a further seven.
Global registry of research data repositories.
Partnership of people, institutions and government agencies that supports the conservation of birds and their habitats by improving access to and use of data and tools. Data available on bird-monitoring, banding and citizen-based bird-surveillance.
Single integrated species checklist and taxonomic hierarchy. The Catalogue holds essential information on the names, relationships and distributions of over 1.6 million species.
Provides free access to biological, physical and socioeconomic geospatial data and maps, along with tools to create custom visualizations, drawings and analyses.
GBIF facilitates free and open access to biodiversity data, enabling anyone to discover, use or publish data about all types of life on Earth.
Authoritative taxonomic information on plants, animals, fungi and microbes of North America and the world. Full database or specific taxonomic group data available for download.
International repository for ecological and environmental data. Data originate from field stations, laboratories, research sites and individual researchers around the world.
The Long Term Ecological Research (LTER) Network is a collaborative of researchers and graduate students who focus on long-term ecological processes at 26 LTER sites around the United States, Antarctica, and islands in the Caribbean and Pacific. The LTER Data Portal contains ecological data packages contributed by past and present LTER sites.
Provides collaborative tools for researchers to upload images and morphological data, and use that information to produce, edit, illustrate and annotate phylogenetic matrices. Also a repository for data associated with peer-reviewed publications.
National Center for Biotechnology Information (NCBI) database of names and classification for all organisms represented in NCBI sequence databases.
Part of the European Molecular Biology Laboratory (EMBL), the EBI provides a wide range of databases, software tools, and resources.
ExPASY, the SIB Swiss Institute of Bioinformatics Resource Portal, provides access to databases and software tools in different areas of the life sciences, including proteomics, genomics and phylogeny.
NCBI provides public access to biomedical and genomic information through its multiple databases and tools.
The 2024 Nucleic Acids Research database issue contains 180 papers from across biology and neighbouring disciplines. There are 90 papers reporting on new databases and 83 updates from resources previously published in the Issue. Updates from databases most recently published elsewhere account for a further seven.
Collection of sequences from multiple sources, including GenBank, RefSeq, and Protein Data Bank (PDB). Searching Nucleotide will yield results from each of its component databases, which can also be searched separately. [NCBI database]
Repository for raw sequencing data from next-generation sequencing technologies. [NCBI database]
Universal Protein Resource (UniProt), a collaboration between the European Bioinformatics Institute, the SIB Swiss Institute of Bioinformatics and Protein Information Resource, provides high-quality, freely accessible protein sequence and functional information.
Collection of genomics, functional genomics, and genetic studies with links to their datasets. [NCBI database]
Archive and distribution center for results of studies that investigate the interaction of genotype and phenotype, including GWAS and molecular diagnostic assays. [NCBI database]
Database of genes from a wide range of species, with a focus on genomes that have been completely sequenced. [NCBI database]
"AlphaFold is an AI system developed by Google DeepMind that predicts a protein’s 3D structure from its amino acid sequence."
BMRB collects, annotates, archives, and disseminates spectral and quantitative data derived from NMR spectroscopic investigations of biological macromolecules and metabolites.
Archive of structural information about nucleic acids.
Worldwide repository for information about 3D structures of biological macromolecules. Provides tools for structure visualization.
Database of Drosophila genes and genomes.
Database of laboratory mouse genetic, genomic and biologic data.
Collected by Plant Metabolic Network
"WormBase is an international consortium of biologists and computer scientists providing the research community with accurate, current, accessible information concerning the genetics, genomics and biology of C. elegans and related nematodes."
Database of zebrafish genetic, genomic and developmental data.
A database of genetic and molecular biology data for the model higher plant Arabidopsis thaliana
Comparative grass genomics (rice, maize, sorghum, barley, oats, wheat, rye)
Plant Comparative Genomics portal of the Department of Energy's Joint Genome Institute. Families of related genes representing the modern descendants of ancestral genes are constructed at key phylogenetic nodes. These families allow easy access to clade-specific orthology/paralogy relationships as well as insights into clade-specific novelties and expansions. As of release v11, Phytozome provides access to sixty-five sequenced and annotated green plant genomes.
Provides a broad network of plant metabolic pathway databases that contain curated information from the literature and computational analyses about the genes, enzymes, compounds, reactions, and pathways involved in primary and secondary metabolism in plants.
Collected by Plant Metabolic Network
IPNI provides nomenclatural data (spelling, author, types and first place/date of publication) for the scientific names of vascular plants from family to infraspecific ranks.