There is increasing demand from the scientific community for strong linkage between papers published in the scientific literature and the data upon which they are based, and for a mechanism to reward data collection through citation. Data are valuable research objects when they are well-described and preserved; publishing data helps clarify who should receive credit when others use that data.
NGDC now has the ability to issue a digital object identifier (DOI) for any datasets it holds that meet certain rigorous management criteria. This is a result of collaboration between the NERC/UKRI data centres, the British Library and DataCite. Through this agreement, NGDC uses the DataCite Metadata Schema to issue DOIs for datasets.
NGDC cited data catalogue
A list of all the formally cited datasets held by the NGDC showing the title, author(s) and DOI can be accessed on the NGDC cited data catalogue. The DOI links to the landing page with metadata links and direct access to the data, where appropriate.
Benefits of generating a data DOI
A DOI allows scientists to cite datasets in the same manner as a scientific journal article, enabling credit to be assigned to the dataset creators and ensuring the discoverability, permanence and stability of the dataset. This recognises the value of the data and the effort that has gone into its creation, capture and effective management. DOIs also allow formal publication of the dataset in data journals.
Citing your datasets allows you to:
- receive credit for your datasets
- track the impact of your data via the NERC EDS citation API
- publish in data journals
- comply with open journal requirements
- enhance the findability of your data for reuse in the future
NGDC data citation process
Data submitter requirements
Datasets must be fully ingested into a data centre before a DOI can be minted. Legacy datasets that have already been ingested into a data centre may also be assigned DOIs.
For a dataset to be assigned a DOI, it must:
- be provided to the data centre in good condition
- be stable and complete
- for long-term datasets, it is common practice to divide the dataset into appropriate temporal/categorical divisions and apply DOIs accordingly
- have appropriate and complete metadata
- be of a suitable level of technical quality and submitted in an appropriate format (NGDC can advise)
The dataset submitter will be responsible for ensuring the data meets the required level of quality. Details of the minimum requirements for data are provided in the NERC EDS DOI guidance, with further information provided by the relevant, sector-specific data centre, including the dataset standards section.
NGDC requirements
When NGDC assigns a DOI to a dataset, it is providing certain assurances to the subsequent data user. These assurances include that the dataset cited is:
- stable: it is not going to be modified
- complete: it is not going to be updated or added to
- permanent: NGDC has committed to making the dataset available for the foreseeable future
- of good technical quality: NGDC is giving its stamp of approval, saying that the dataset is complete and that all the necessary metadata is available
Therefore, when a dataset is assigned a DOI, NGDC confirms that:
- the dataset will be available for the foreseeable future
- there will be bitwise fixity of the dataset
- there will be no additions or deletions of files or records
- there will be no changes to the directory structure in the dataset ‘bundle’
- upgrades to versions of data formats will result in new editions of dataset
NGDC will provide a full catalogue page (landing or splash page), which will appear when any user clicks on the DOI hyperlink.
Modifications and versioning
Once a dataset has been deposited with NGDC and a DOI has been issued, the dataset cannot be modified. If there are updates or changes to the dataset, a new version of the dataset will need to be deposited and NGDC will:
- assign a new version number (a simple integer sequence only)
- assign a new DOI
- create a new landing page for the new version of the dataset that includes its full version history
- modify the landing page of the previous version of the dataset to provide a link to the new version
- store the new dataset in addition to previous versions
Ingestion procedures
NGDC will accept data according to NERC data policy and the NERC/UKRI or NGDC data value checklist, depending on which is most appropriate. It will also ensure that the data meets the NGDC collection policy.
Dataset standards
An objective of good data management is to ensure that data can be re-used with confidence decades after collection and without the need for any kind of communication with the scientists who collected that data.
The following good practice, adopted across all the NERC/UKRI environmental data centres, must be met for a dataset to be accepted.
- The format must be well documented and conform to widely accepted standards
- The format must be readable by tools that are freely available now and are likely to remain freely available in the future
- Data files should be named in a clear and consistent manner throughout the dataset, with file names (rather than path names) that reflect the contents and uniquely identify the file
- File name extensions should conform to appropriate extensions for the file type
- File names should be constructed from lower case letters, numbers, dashes and underscores, and be no longer than 64 characters
- Parameters in data files should either be labelled using an internationally recognised standard or by local labels that are accompanied by clear, unambiguous, plain-text descriptions
- Units of measure must be included for all parameters and clearly labelled
- Data must be accompanied by sufficient usage metadata to enable its reliable re-use. Some of this may be embedded within the data files. If not, it should be included as additional documents
The technical experts in NGDC are responsible for ensuring that the dataset meets the required level of technical quality before a DOI can be issued to it.
For further information on the citation process or deposit of research datasets please contact NGDC directly (ngdc@bgs.ac.uk).