This pages details the main updates to the GBIF.org and related infrastructure. Further details are found in the GitHub repositories, including
10 January 2024
Reference databases updated in the Sequence ID tool
- 16S: (Bacteria and Archaea) Genome Taxonomy Database r214
- COI: (Animalia) database updated to International Barcode of Life v2024-01-06
- 12S: MitoFish - Mitochondrial Genome Database of Fish V3.97
- ITS: Unite v9.0 (2023-07-25)
26 October 2023
- The rewrite of the workflows that generate the precalculated maps on GBIF.org was deployed.
20 September 2023
- GBIF.org now processes
scientificNameID
,taxonID
andtaxonConceptID
for configured identifier schemes; initially WoRMS LSID. The details of this are discussed in this issue.
28 August 2023
- New GBIF backbone taxonomy, with three new sources and other updates. Name matching to species aggregates has been improved. Refer to the backbone build log for additional details.
- The field
http://purl.org/dc/terms/identifier
(identifier
) has been removed from interpreted data and downloads, as it had been introduced unintentionally and contained only internal identifiers. - 43 unused Dublin Core terms — which have never been used in Darwin Core, and were always empty — have also been removed from new Darwin Core archive downloads.
- Searching using
gbif_id
is now supported, in the API, website and downloads.
24 August 2023
Two new reference databases added to the Sequence ID tool
- 12S: MitoFish - Mitochondrial Genome Database of Fish
- 18S: PR2 18S rRNA database
12 January 2023
Dataset filters
- Continent interpretation now considers occurrence coordinates. All georeferenced, terrestrial records now have a continent value, and issues are applied where the publisher's value is unexpected.
- A new field
distanceFromCentroidInMeters
is present for occurrences within 5000m of a known country centroid. Particularly for preserved specimens, this can highlight imprecise georeferencing. See the data blog for the motivation for this field. - Coordinate uncertainty, where provided by the publisher, is now taken into account when verifying the country/countryCode values.
Registry
- A new check prevents derived datasets from being created without a related dataset.
Data ingestion
- The Camera Trap Data Package (Camtrap DP) format is now supported.
26 August 2022
- Mechanisms deployed to detect large scale changes in occurrence record IDs before indexing. When detected data managers can intervene and confirm or correct mistakes to better ensure GBIF ID stability.
23 June 2022
- Clustering rules relaxed for iBOL and EMBL(INSDC) datasets to accommodate more sparse data. Now when a record is from there, it is sufficient to have the accepted scientific name and identifier overlap to connect the records.
22 June 2022
- Occurrence search results can now be shown in 4 map projections
19 May 2022
- GBIF data now on Google BigQuery as a public table, updated monthly
2 May 2022
Sequence ID tool
- 16S (Bacteria and Archaea) Genome Taxonomy Database r207
- COI (Animalia) database updated to International Barcode of Life v2022-02-22
28 April 2022
-
The following fields that can contain multiple values, can now be searched using individual values (
datasetID
,datasetName
,otherCatalogNumbers
,typeStatus
,recordedBy
,identifiedBy
,preparations
,samplingProtocol
). For example, searching for records collected by "L. Richardson" now returns records when they were part of a group of people making the observation. (pipelines/665 and pipelines/283) -
Occurrences are now searchable using the Darwin Core
datasetName
and `datasetID fields. This search respects what the record states the value is, allowing different values within a dataset registered on GBIF to support search within aggregated datasets. (pipelines/662) -
The
datasetName
is now included in the occurrence download (pipelines/270) -
Occurrence records are now searchable using the Darwin Core term
otherCatalogNumbers
in the API (pipelines/664) -
The
preparations
field is now correctly populated (pipelines/667)
11 March 2022
- Rules for clustering tightened, to avoid over-eagerly clustering records that have the same species and catalogue number but nothing else to support the link
1 March 2022
- The IPT 2.5.7 released, addressing 3 bugs and 2 minor improvements
28 February 2022
- Datasets can now declare which country should be attributed as publishing the data on a record by record basis. Previously, only eBird had this capability, now others can, such as iNaturalist
7 February 2022
- Changes to the map server have been deployed, to provider higher resolution maps to all hosted portals, such as on GBIF.us
17 February 2022
- Changes to the content model deployed allowing the communications team more control of the GBIF.org homepage. Deployed with first changes to the styling.
3 February 2022
- The data validator has been updated to be consistent with GBIF.org indexing. Within the tool, logged in users can now find their historical validation reports
31 January 2022
- The occurrence index has been updated to support the latest version of Darwin Core, released last year.
- A new basis of record, Material Citation, replaces the obsolete Literature basis of record. Additionally, the "Unknown" basis of record will no longer be used, instead records will be shown with an "Occurrence" basis of record.
- The existing term Establishment Means now has a vocabulary in Darwin Core. This replaces the GBIF enumeration used for this term.
- New terms Degree of Establishment and Pathway are now available, and have their own vocabularies.
- The new terms Vertical Datum, Verbatim Identification, Subfamily, Infrageneric Epithet and Cultivar Epithet may be used on occurrences, although the taxonomic terms are not yet supported by the GBIF Backbone Taxonomy.
29 January 2022
- Changes to data ingestion applied that aborts the process if >5% of record IDs are seen to change, allowing data managers to verify before proceeding.
13 January 2022
- GRSciColl now supports the ability to select a GBIF dataset or publishing organisation as the "master" source of information for a GRSciColl collection or institution record. Changes made in the organisation's registration or dataset metadata will automatically be reflected on the GRSciColl entity.
2021
3 December 2021
-
A new IPT release, (2.5.2), addresses 26 issues. Most improvements are for the new user interface (including bugfixes), and to deployment / server administration.
-
A new backbone is live, with a new WCVP Fabaceae source and additional Plazi publications. For further details, see the build log
28 October 2021
- New filters and facets for the literature service (gbifTaxonKey, gbifOccurrenceKey, gbifHigherTaxonKey, citationType)
- New geo distance filter/predicate for occurrence search and download (geoDistance)
GrSciColl
- New model for GRSciColl contacts that replaces the current staff members (#379)
- Number of specimens in institutions made optional (#389)
- Taxonomic coverage added to the collections search (#390)
- Lookup now accepts alternative codes + ID matches as exact (#381)
17 September 2021
New backbone live
- Improved parsing and matching of operational taxonomic units (OTUs) from
Genome Taxonomy Database (GTDB) - Update of Systema Dipterorum
- First update of the PaleoBiology Database (PaleoBioDB) and Index Fungorum (via Species Fungorum for CoL+) in several years
- Addition of United Kingdom Species Inventory (UKSI)
- Addition of three national checklists (plants, birds, legal) from Colombia
- Updated Fabaceae taxonomy via RBG Kew's World Checklist of Vascular Plants
- Even more updates than usual from Plazi
- Resolution of issues from user feedback in the GitHub project's "Done" column
Refer to the backbone build log for additional details.
31 August 2021
Integrated Publishing Toolkit (IPT)
A new version of the IPT has been released (2.5.0), addressing 81 issues. New/improved features include:
- A fresher-looking user interface, which should still be familiar to existing users
- The user manual has been converted from the GitHub Wiki to AsciiDoctor/Antora
- Source data files can now be downloaded by a resource manager
- Auto-publishing can now be set to specific, future dates
- Archive mode can be limited to a set number of old archives to retain
- A new health/troubleshooting page reports common system problems, like running out of disk space or incorrect filesystem permissions
- The administration contact (for forgotten passwords) is now configurable
- Database (JDBC) drivers have been updated
- A URL can now be used as a data source
2 July 2021
Occurrence images
- Occurrence records with IIIF manifest given in Audubon Core extension or Dynamic Properties now display draggable IIIF icon with link to viewer (example)
11 June 2021
GRSciColl
- New service in the GRSciColl API to suggest changes such as creating new entities or modifying the existing ones. Available in the registry UI
- The IH synchronization now uses machine tags instead of identifiers. This allows to disconnect an entity from IH but keeping the IH identifier.
- New audit log in GRSciColl to track all the changes done in the catalogue: https://api.gbif.org/v1/grscicoll/auditLog
- New permissions model for GRSciColl that includes country scopes, namespace rights and a new Mediator role #310
- The filter by code in the GRSciColl institutions and collections is now case insensitive: https://api.gbif.org/v1/grscicoll/collection?code=naic
- Now possible to filter GRSciColl staff by identifiers and machine tags, e.g.: https://api.gbif.org/v1/grscicoll/person?identifierType=IH_IRN
- New service to download GRSciColl institutions and collections in CSV or TSV format, e.g.: https://api.gbif.org/v1/grscicoll/institution/export?active=true
- More fields to filter searches of GRSciColl institutions and collections #357 #269
31 May 2021
Dataset filters
- Dataset search API supports filters and facets by networkKey, hostingCountry and endorsingNodeKey
Dataset export services
- Search export service, accepts the same parameters as the search service but the result is exported into tsv or csv file, facets and paging parameters are ignored, e.g. https://api.gbif.org/v1/dataset/search/export?q=inaturalist (also available in UI)
- Dataset occurrence download usages, exports datasets used in a download into tsv or csv formats, e.g. https://api.gbif.org/v1/occurrence/download/0220580-200613084148143/datasets/export?format=TSV (also available in UI)
Download statistics
- New download statistics api, accepts the same filters as the downloadsByDataset service, e.g. https://api.gbif.org/v1/occurrence/download/statistics?datasetKey=0001480b-76ca-4f30-86bc-f4292481554b
- Export service for download statistics, accepts the same filters as the download/statistics and export the results into csv and tsv formats, e.g. http://api.gbif.org/v1/occurrence/download/statistics/export?datasetKey=0001480b-76ca-4f30-86bc-f4292481554b
Miscellanous
- The latest release of the ChronometricAge Extension is now supported and datasets using them can now be filtered, using the occurrence DWCA_EXTENSION search filter.
- New occurrence lookup service, occurrence records can now be looked up by using: datasetKey/occurrenceId, e.g. https://api.gbif.org/v1/occurrence/0001480b-76ca-4f30-86bc-f4292481554b/651D49B2-FF77-7F3F-E053-2614A8C050DE and in UI https://www.gbif.org/occurrence/0001480b-76ca-4f30-86bc-f4292481554b/651D49B2-FF77-7F3F-E053-2614A8C050DE
21 May 2021
Features
- Search occurrences using modification date stated by publisher #219
- Download filters support search “field has a value” using the
isNull
predicate #244 - Registry console supports user filtering by roles and editor scopes #330
- API response for dataset citation now includes authors as objects, if they are also contacts and indication if the citation was provided or generated #351
- Dataset search API supports filters and facets by installationKey and endpointType #148
Bug fixes
- Creating a network constituent for a non existing network no longer throws error #349
- Network suggest no longer includes deleted entities #308
- Consistent behaviour on GBIF.org and Registry management console for publisher search #198
17 May 2021
- First GBIF Parquet export added to the Amazon Public Data Catalog, with data available on 5 continents
5 May 2021
API and processing
- ability to search for datasets by the network they belong (e.g. OBIS)
- network facets added in the dataset search API (e.g. http://api.gbif.org/v1/dataset/search?facet=networkKey&limit=0)
- events added triggering occurrence dataset reprocessing for changes in dataset network membership
Derived datasets
- Service and user interface to allow authenticated users to create DOIs for derived datasets
20 April 2021
- New Parquet download format added to the API
- First GBIF Parquet export added to the Microsoft Planetary Computer data catalogue.
22 March 2021
Sequence ID tool
- Classification of Bacteria and Archaea by 16S sequences matched against the Genome Taxonomy Database r95
- ITS (Fungi) database updated to UNITE v8.2
- COI (Animalia) database updated to International Barcode of Life v2021-02-08
11 March 2021
New backbone live
- Data source replacements, primarily for Fabaceae family and the prokaryotic kingdoms Bacteria and Archaea
- Improvement for stable identifiers, esp relating to OTUs
- Algorithm improvements (misplaced taxa)
- Removal of names / terms on a denylist
- Please refer to the backbone build log for additional details
23 February 2021
- Support for registering dataset endpoints in Catalogue of Life Data Package format
- Flagging of potential duplicates added to assist editors in deduplication entries in the GRSciColl catalogue. E.g. Reuse of the code PCU
- Ability to restrict permissions for GRSciColl editors to institution or collection, allowing more people to participate
- Schema.org metadata tags revised on the dataset and taxon pages to improve search engine discoverability
11 February 2021
- Quarterly trends now include summaries by GBIF Region (e.g., Latin America and the Caribbean)
26 January 2021
- Improvements to the handling of networks (groupings of datasets) including
- Listing in search e.g. searching for Arctos
- Listing the publishers, and the datasets in the summary e.g. OBIS network
- Ability to control if they are visible on a dataset page
- Ability to assign editorial control to trusted users in the registry
- Support for DOIs for adhoc data exports by GBIFS staff (example https://doi.org/10.15468/dd.jskxae)
- This service is a precursor for GBIF to offer public datasets on cloud environment
- Bug fix for BioCASe protocol metadata synchronisation
- Added the literature vocabularies type, topic and relevance to the API to support analyses by external data scientists
- Added an experimental API categorisation of the griddedness of datasets (e.g. this example)
- Based on exploratory work documented in this blog post
- Added capability to associate ROR and GRID ids to organisations in the GBIF registry
2020
17 December 2020
- Search capability to find records that participate in a cluster, e.g. 9M specimen-related occurrences that cluster
- Search for records that have content in any Darwin Core Archive extension. For example, records with the OBIS Extended Measurements and Facts
- A dashboard (metrics) is added to the institution (e.g. Kew Gardens) and collection (e.g. SAIAB Algae) pages summarizing the digitized occurrence records. Note that records may come from multiple datasets
- Improvements to date interpretation, including the ability to disambiguate date formats (dd/mm/yyyy vs MM/dd/yyyy) using the GBIF Registry and machine tags
15 December 2020
- Search for occurrence records by hosting organization e.g. map of records hosted by GBIF France or through the API
- Search for records by life stage added, such as images of records in nymph stage. Interpretation of this content is backed by the vocabulary server that is part of the registry. GBIF intend to open up vocabularies for collaborative editing when ready, and are working with the TDWG Data Quality Group on this topic.
14 December 2020
- API deployed to support Literature search by DOI. This API is documented in GitHub but documentation will be moved to the GBIF API documentation shortly
8 December 2020
- The new Catalogue of Life website is live. This is the first deployment that is powered by GBIF and hosted on GBIF infrastructure. In addition to the public website are the common repository known as the checklistbank, and a new API which is supported in the rOpenSci client.
2 December 2020
- Extension data now shown on all occurrence pages e.g. measurements example
- Specimen-related occurrence records now link to the collection catalogue entries in addition to the dataset they originate from e.g. this record from SAIAB. Matching uses a variety of fields including
collectionCode
,institutionCode
,collectionID
andinstitutionID
. See the FAQ on how to improve matching - New API to improve searching against the Collection Catalogue, e.g searching for "K"
- Elasticsearch updated to version 7.10.0
9 November 2020
- iDigBio collection catalogue imported to the GBIF Registry. This is visible on the GRSciColl pages; the API now powers the iDigBio Collections Catalogue portal. Data management is now shared across teams.