File Format
The choice of file formats plays an essential role for long term data storage and archiving, data sharing, searchability, accessibility, and has a significant impact on data reusability.
It is advisable to consider open file formats whenever possible that allows the data to be imported and accessed by different tools and is not vendor locked in case a tool is no longer supported. Consider the following:
- What are the standard file formats in your field
- Convert data to standard format
- Which format is required for data deposition i.e. repository requirements, archival compression
- Consider exporting or converting from original format to a more open/preferred format but keep in mind that some data might be lost or altered during the process e.g. text formatting in documents, decimal point formatting, date and time values.
- When archiving data, combine the whole project (i.e. raw data, analysis, documentation, code and software) in one package.
- For software consider the use of containers to enable interoperability and long term re-use
- Keep in mind there are no standard preferred file formats, and none are perfect, but consider choosing open formats that are most applicable for your use and field.
Example:
- Recommended file formats by the UK data archive: https://www.ukdataservice.ac.uk/manage-data/format/recommended-formats.aspx
- Data types and formats comparative table by Oregon state university: https://guides.library.oregonstate.edu/research-data-services/data-management-types-formats
- Oregon State University library guide: https://guides.library.oregonstate.edu/ld.php?content_id=26410655
- Kings College London guide: https://www.kcl.ac.uk/researchsupport/managing/preserve
- Dryad guide: https://datadryad.org/stash/best_practices#accessible
- Australian National Data Services (ANDS) guide: https://www.ands.org.au/__data/assets/pdf_file/0003/731775/File-Formats.pdf
- JISC guide: https://rdmtoolkit.jisc.ac.uk/collect-and-capture/file-management-and-formats/
Tools:
- Docker: https://www.docker.com/resources/what-container
- Jupyter : https://jupyter.org/index.html
- Fido: https://github.com/openpreserve/fido
- Singularity: https://github.com/hpcng/singularity
- Vagrant: https://www.vagrantup.com/intro/vs/docker.html