Naming Requirements

Naming Convention

SPARC standards were designed to promote the re-use of public data for further research, and as such, it is imperative to use a consistent and predictable naming scheme for all files. This makes it easier not only for computers to process the data, but for other investigators to understand it.

For the SDS, it is absolutely critical that the naming used within metadata files is consistent with the naming used for all folders (i.e., subject or sample names).

You can be flexible with your subject names, but you must use that same EXACT name when labeling your folders, so we – and other investigators - can easily relate the metadata contained in the descriptive file to the contents of the folder. This is also how we map metadata records computationally with individual files. All to say - consistency is critical.



Top Level Folders and Files

  • These unique folders will have a standardized name that corresponds to the exact names or IDs as referenced in the metadata file.
  • For the SDS metadata identifiers (i.e., subject labels, sample labels, performance labels, etc.): These folder names can only include letters and numbers and the dash character
  • There are standardized prefixes for each type of data folder. Refer to Naming Convention section below.
  • For the folder and file names within those SDS-defined folders, there are no restrictions, but we strongly suggest avoiding special characters such as !@#$%^&*()+=/|"'~. These suggestions are expected to be enforced in future versions of SDS to enable interoperability across operating systems

Folder and File Names

  • When naming the dataset sub-folders (i.e., folders that are NOT mapped to SDS IDs), it is imperative to keep a consistent naming scheme.
  • All file names and folder names not mapped to a SDS metadata entity should include only alpha and numeric characters (0-9, A-Z, a-z), and the dash character (-). Avoid using characters !@#$%^&*()+=/\|"'~;:<>{}[]? See the section Primary Folder for more details.
  • There is no limit to the character number.
  • Each data file must be listed in the main manifest with an adequate description.

Subjects, Samples, and Performances Folders Naming Constraints

  • Must have prefixes: ‘sub-‘ for subjects, ‘sam-‘ for samples, ‘perf-’ for performances.
  • Folder names must reflect EXACT subject, sample, and performance IDs. Failure to comply with this requirement is the largest source of errors in submitted datasets.
  • There is no limit to the character number.
  • Can include only alpha and numeric characters (0-9, A-Z, a-z), and the dash character (-).
  • Special characters and empty spaces are NOT ALLOWED.
  • Sample and performance folders should be placed inside the corresponding subject folders.

Subject, Sample, and Performance Identifiers (IDs)

  • Must be unique for the dataset.
  • Must have prefixes: ‘sub-‘ for subjects, ‘sam-‘ for samples, ‘perf-’ for performances.
  • The corresponding data folder names must use the exact subject, sample, and performance IDs. Failure to comply with this requirement is the largest source of errors in submitted datasets.
  • Can include only alpha and numeric characters (0-9, A-Z, a-z), and the dash character (-).
  • Special characters and empty spaces are not allowed.
  • There is no limit to the character number.