Prepare Your Metadata Files

These will be how you make your dataset findable and understandable on the SPARC Portal

This document is part of a series related to the Data Submission to SPARC Process:

And with that, you’re ready to start preparing your metadata! Which means the time is finally nigh… Whether you’re preparing your metadata files using SODA guided mode or manually, you will need to follow these requirements:The next step in the process is to prep your metadata files. These required assets are descriptive files for your data, subjects, samples, etc., and will be submitted with your dataset.

This page will guide you through exactly what those files are. Please read through it fully, as it has requirements/guidelines you’ll need to know. Then, you’ll find two links to choose from for the preparation – one for SODA, and one for manual file formatting.

Checklist

Open Checklist

But first, let’s check to make sure you’ve done everything up to this point:

  • Talked to curation team
  • Requested access to the appropriate Pennsieve workspace
  • Experimental protocol has been created on Protocols.io
  • All required metadata files have been completed
    • Temporary link to unpublished protocol has been added to dataset_description file
  • All folders/metadata files are named as set forth in the SDS file system
  • All subject & sample names are CONSISTENT across all references in the SDS
    • All human subjects have been de-identified
  • All data, metadata and associated files/info have been organized into the SDS file system
  • All experimental data has been organized by subject and sample in the Primary Folder
  • All required top-level folders include required manifest files
  • Dataset has been uploaded onto Pennsieve
  • Verify the completeness of the upload
  • Dataset has been submitted for review

Completed all the checked boxes? Continue onward!


SPARC Dataset Structure (SDS) Template

To start, let’s talk about the SPARC Dataset Structure (SDS).

The SDS is the organizational file system required for all dataset submissions. This promotes consistency and ensures other investigators can easily understand your data. Part of that ease comes from the organization itself, and part of it comes from the metadata files incorporated in it.

If you use SODA guided mode for this step, it will guide you through the preparation of your metadata files and automatically organize them in the SDS format. If you choose to do it manually, you will fill out spreadsheet templates, which you can find on our SDS page.

Either way, you will see multiple kinds of metadata files. So let’s talk about which ones are required.


Required Metadata Files

Submissions to the SPARC Portal must include certain metadata files. Some are universally required, while others are only required if your data calls for it. An overview of these files is below, but you can get more details on our SDS page here.

Required or Conditionally Required Files

  • dataset_description - mandatory file containing study metadata describing the dataset.
  • submission - mandatory file containing information on the submission and related milestones.
  • subjects - mandatory file, required only for datasets derived from subjects. It must provide all necessary information on the subjects involved in the data collection.
  • samples - mandatory file, required only if data were obtained from samples. Contains information on the samples involved in the data collection.
  • performances - mandatory file, required only if data were gathered from performances, i.e. multiple visits, runs, sessions, or execution of one type of experimental protocol.
  • README - file that can be used to provide additional information to new users of the dataset. For example, essential guidelines on how to use the dataset, how to open files, what special software/equipment is necessary to read the data, recommendations, limitations or constraints related to data usage, etc.
  • resources – suggested file containing information on resources used in the data collection, such as reagents, equipment, nucleotide sequences, software, etc.
  • manifest* – required for every top-level folder and includes a brief description for every file in the folder. It is acceptable to have a single manifest file located at the root of the folder tree.
  • code_description – Computational datasets, will require additional metadata, which is outlined here. This file is only required if code was used in the generation of the data.

*Manifest Files

Please note, Manifest Files are best created AFTER your top-level folders are organized, as they describe the contents of the folder. So don’t worry about completing those in this step – we will cover that in the “Organize Your Files” step.

Of course, it’s worth mentioning - SODA will automatically make the required manifest files and prompt you to add a description of each file in a systematic way if (when!) you upload through that platform. When you move files around using SODA the manifest will be updated automatically for you.

Data-Dependent Required Files

The following files are only required under certain parameters:

  • code_parameters – only required if code was used in the generation of the data.
  • CHANGES – only required if there’s necessary information on the history of the dataset or any changes to the dataset since an initial publication.

Requirements of Preparing Metadata Files

Whether you’re preparing your metadata files using SODA Guided Mode or manually, you will need to follow these requirements:

  • Do not rename the key folders and files provided in the template folder.
  • The dataset_description template should not be changed at all.
  • Do not add, edit, or delete required columns/rows.
    • Blue and green fields are mandatory and should not be altered.
    • Yellow fields are optional and can be altered.
    • Do not change any column headings in any Excel file.
  • To add additional metadata in the subject and sample files, please append new columns to the right side.
  • Leave fields empty when there is no information available at the time of submission.
  • Reference “Naming Conventions” section of Organize Your Files for naming requirements.

Prepare Your Metadata File with SODA

And with that, you’re ready to start preparing your metadata! Which means the time is finally nigh…

For you to use SODA! The moment we’ve all been waiting for. At the following link, you will find true joy in the form of documentation that walks you through preparing your metadata on SODA.

And once you’re done doing that, you can actually skip the next step of Organizing Your Files, because SODA Guided Mode will walk you through that step as well! So once you’ve prepped your metadata AND organized it and your experimental data into the appropriate folders, you can move on to the final step: Upload Your Data.

Be sure to check out that page for all last minute checks before Disseminating Your Data on SODA. See you soon friend.


Image data: Convert to fit SPARC Standards

If you have image data, see Microfile+: Convert Microscopy Image Data and Metadata to SPARC Standard Formats for details on how to ensure your microscopy data displayed on the SPARC Portal will have relevant context, and imaging metadata standards.


Additional Resources

A list of additional resources you may find useful while preparing your metadata:

  • Should you have any questions or issues with the process:
  • SPARC Optical Microscopy Imaging Data and Imaging Metadata Standard (1.0). Zenodo. https://doi.org/10.5281/zenodo.5347993.
  • To read more about SDS, click here.
  • Technical users can view the schema used to validate SDS here.
  • More info about each metadata file can be found here.
  • For Tools and General Resources on SPARC, click here.

Next Step

Finished prepping your metadata? The next step then is to BRUSH YOUR SHOULDERS OFF! Because it’s always good to celebrate a win.

And now, it’s time to Organize Your Files, which you can start by clicking right here.


What’s Next