Skip to Main Content

Research Data Management

File Naming Systems & Organization

  • Decide on a naming convention before data collection starts
  • Use consistent, descriptive file names. Make it easy to predict what a file contains.
  • Develop a file naming scheme that makes sense to you.
  • Consider including:
    • Project name or project number
    • Name of file creator
    • Sequence ID
    • Accession Number
    • Location or spatial coordinates
    • Date or date range of project
    • Version number of file
  • Consider how files sort when deciding what element of the file name will go first.
  • Establish a folder hierarchy that aligns with the project. Example: [Project] / [Experiment] / [Instrument or Type of File]
  • Include an explanation of your naming convention along with any abbreviations or codes in your readme.txt file. 

Storage & Backups

  • Keep multiple copies of your data: Here, Near & Far
  • Automatic backup is better than manual
  • Periodically test your backup restore
  • Contact UCSC campus ITS for optimal data storage & backup options.

Confidentiality and Privacy

Get Credit

  • Cite Your Data
    • Get a persistent identifier such as a DOI or ARK using the EZID service for your data.
    • Contact the library to obtain an EZID.
  • Disambiguate yourself 
    • ORCID provides a persistent identifier that distinguishes you from other researchers. Register for a free account.

Documentation

Include a brief descriptive document (often called a Readme.txt file) to help others understand your additional files and data.

File Naming Conventions

  • Keep the filename short (aim for less than 25 characters)
  • Use underscores instead of spaces
  • Avoid special characters such as: " / \ : * ? < > [ ] & $ .
  • Use the dating convention: YYYY-MM-DD or YYMMDD
  • Use the 3-letter file extension to indicate the file format, such as .txt, .pdf, or .csv.
  • When using number, use leading zeros to make sure files sort in sequential order. Use 001, 002, ...020, 021 … instead of 1, 2… 20, 21…

Case Study: File Naming Done Well - examples of a methods to name files. File names can include study site, water depth, date, and more.

Some Recommended File Formats

Whenever possible use uncompressed, non-proprietary (open) formats.

  • Containers: TAR, GZIP, ZIP
  • Databases: XML, CSV
  • Geospatial: SHP, DBF, GeoJSON, KML, NetCDF, GeoTIFF/TIFF, NetCDF, HDF-EOS
  • Moving images:  MOV, AVI, MXF
    • eScholarship requires MP4 to embed
  • Presentations: PDF
  • Sounds: WAV, AIFF, MXF
    • eScholarship requires MP3 to embed
  • Statistics: ASCII, DTA, POR, SAS, SAV
  • Still images: TIFF, JPEG 2000, JPEG, PDF
  • Tabular data: CSV
  • Text: XML, PDF/A, HTML, ASCII, UTF-8
  • Web archive: WARC

For more guidance on appropriate formats, see the Library of Congress’ Recommended Formats Statement and Archivematica’s Format Policies page for access and preservation.

Copyright and Intellectual Property

  • Data is not copyrightable. However, a presentation of data (such as a chart or table) may be.
  • Data can be licensed. Some data providers apply licenses that limit how the data can be used to protect the privacy of study participants or to guide downstream uses of the data (e.g., requiring attribution or forbidding for-profit use). Check license terms of use before republishing.
  • Most databases to which the UC Libraries subscribe are licensed and prohibit redistribution of data outside of UC. For more information on terms of use for databases licensed by the Libraries, contact us.
  • Publish your data under a Creative Commons license to make your wishes explicit.