Skip to main content

University
Library

Electronic Theses & Dissertations (ETDs)

Supplementary Files

Supplementary files are separate attachments that enhance or support your thesis and may include computer code, research data, audio or video files, and images or maps that are not part of the primary pdf.

File Naming Systems & Organization

  • Decide on a naming convention before data collection starts
  • Use consistent, descriptive file names. Make it easy to predict what a file contains.
  • Develop a file naming scheme that makes sense to you.
  • Consider including:
    • Project name or project number
    • Name of file creator
    • Sequence ID
    • Accession Number
    • Location or spatial coordinates
    • Date or date range of project
    • Version number of file
  • Consider how files sort when deciding what element of the file name will go first.
  • Establish a folder hierarchy that aligns with the project. Example: [Project] / [Experiment] / [Instrument or Type of File]
  • Include an explanation of your naming convention along with any abbreviations or codes in your readme.txt file. 

Some Recommended File Formats

Whenever possible use uncompressed, non-proprietary (open) formats.

  • Containers: TAR, GZIP, ZIP
  • Databases: XML, CSV
  • Geospatial: SHP, DBF, GeoJSON, KML, NetCDF, GeoTIFF/TIFF, NetCDF, HDF-EOS
  • Moving images:  MOV, AVI, MXF
    • eScholarship requires MP4 to embed
  • Presentations: PDF
  • Sounds: WAV, AIFF, MXF
    • eScholarship requires MP3 to embed
  • Statistics: ASCII, DTA, POR, SAS, SAV
  • Still images: TIFF, JPEG 2000, JPEG, PDF
  • Tabular data: CSV
  • Text: XML, PDF/A, HTML, ASCII, UTF-8
  • Web archive: WARC

For more guidance on appropriate formats, see the Library of Congress’ Recommended Formats Statement and Archivematica’s Format Policies page for access and preservation.

Where to Deposit Supplementary Files

There are a variety of places to deposit your files (data, code, images, videos, etc.) including eScholarship.

If you’d like to consult with someone about where to deposit, contact: research@library.ucsc.edu.

Documentation

Include a brief descriptive document (often called a Readme.txt file) to help others understand your additional files and data.

File Naming Conventions

  • Keep the filename short (aim for less than 25 characters)
  • Use underscores instead of spaces
  • Avoid special characters such as: " / \ : * ? < > [ ] & $ .
  • Use the dating convention: YYYY-MM-DD or YYMMDD
  • Use the 3-letter file extension to indicate the file format, such as .txt, .pdf, or .csv.
  • When using number, use leading zeros to make sure files sort in sequential order. Use 001, 002, ...020, 021 … instead of 1, 2… 20, 21…

Case Study: File Naming Done Well - Excellent example of a method to name thousands of image files. File names include study site, water depth, date, and more.

Spreadsheets

Data can be more efficiently analyzed and better understood in the future if initially set up for a machine to read:

Do:

  • Provide CSV files (easily converted by most all spreadsheet software) when possible
  • Make the top row a header with variable names
  • Put a value in each cell so rows are associated with headings and can stand alone as a recording of your observation, count, etc.  
  • Consider how to convey null values so they are not mistaken for zero results
  • Express dates using accepted standards e.g. YYYYMMDD

Don’t:

  • Include notes in cells (add notes to a separate readme file)
  • Use formatting comments or color coding to convey information; they don’t translate well to other software
  • Use blank spaces or symbols in column names
  • Leave cells blank (avoid misinterpretation as zeros)

Why Best Practices?

Using best practices makes it easier for you to find, use, machine analyze, and ultimately upload your data to an online archive.  It will also make it easier for your collaborators or other researchers not involved with the project to understand and use your data in the future.