Skip to Main Content

BIOE 183W Research in EEB

File Naming Conventions

  • Keep the filename short (aim for less than 25 characters)
  • Use underscores instead of spaces
  • Avoid special characters such as: " / \ : * ? < > [ ] & $ .
  • Use the dating convention: YYYY-MM-DD or YYMMDD
  • Use the 3-letter file extension to indicate the file format, such as .txt, .pdf, or .csv.
  • When using number, use leading zeros to make sure files sort in sequential order. Use 001, 002, ...020, 021 … instead of 1, 2… 20, 21…

Case Study: File Naming Done Well - examples of a methods to name files. File names can include study site, water depth, date, and more.

Some Recommended File Formats

Whenever possible use uncompressed, non-proprietary (open) formats.

  • Containers: TAR, GZIP, ZIP
  • Databases: XML, CSV
  • Geospatial: SHP, DBF, GeoJSON, KML, NetCDF, GeoTIFF/TIFF, NetCDF, HDF-EOS
  • Moving images:  MOV, AVI, MXF
    • eScholarship requires MP4 to embed
  • Presentations: PDF
  • Sounds: WAV, AIFF, MXF
    • eScholarship requires MP3 to embed
  • Statistics: ASCII, DTA, POR, SAS, SAV
  • Still images: TIFF, JPEG 2000, JPEG, PDF
  • Tabular data: CSV
  • Text: XML, PDF/A, HTML, ASCII, UTF-8
  • Web archive: WARC

For more guidance on appropriate formats, see the Library of Congress’ Recommended Formats Statement and Archivematica’s Format Policies page for access and preservation.

Documentation

Include a brief descriptive document (often called a Readme.txt file) to help others understand your additional files and data.

Spreadsheets

Data can be more efficiently analyzed and better understood in the future if initially set up for a machine to read:

Do:

  • Provide CSV files (easily converted by most all spreadsheet software) when possible
  • Make the top row a header with variable names
  • Put a value in each cell so rows are associated with headings and can stand alone as a recording of your observation, count, etc.  
  • Consider how to convey null values so they are not mistaken for zero results
  • Express dates using accepted standards e.g. YYYYMMDD

Don’t:

  • Include notes in cells (add notes to a separate readme file)
  • Use formatting comments or color coding to convey information; they don’t translate well to other software
  • Use blank spaces or symbols in column names
  • Leave cells blank (avoid misinterpretation as zeros)

Why Best Practices?

Using best practices makes it easier for you to find, use, machine analyze, and ultimately upload your data to an online archive.  It will also make it easier for your collaborators or other researchers not involved with the project to understand and use your data in the future.