Navigating Datasets on SPARC

Learn SPARC's file structure to simplify browsing and use of SPARC datasets.

Anatomy of a SPARC Dataset

SPARC Portal hosts a growing collection of diverse datasets spanning anatomy, physiology, modeling, and simulation.

The Dataset Landing Page

The Dataset Landing Page (Figure 1.1) provides basic information about the dataset, such as its title and ORCID-linked contributors (1.1), its DOI and size (1.2), version (1.3), how to get access to the dataset (1.4), related publications (1.5), usage rights (1.6) and metrics (1.7), informational details tabs (1.8), and Curator's Notes (1.9).

The Dataset Landing Page

Figure 1: The Dataset Landing Page

Dataset Details

The informational tabs (Figure 1.8) provide you with summary details about the dataset:

  • Abstract includes a curated summary of the experimental design, subject species, published protocol link, sample types, and data overview.
  • About lists dataset publication details, such as corresponding author, funding program and award, and associated projects.
  • Cite provides formatted citations to help you easily cite the dataset. Learn more about how to cite SPARC data and resources.
  • Files displays the dataset files and folders. SPARC implemented a file structure and naming convention, SPARC Data Structure (SDS), to provide consistency across all datasets. Learn more about the SDS Standard and it helps you browse files, below.
  • Gallery offers a web preview of the microscopy image data along with anatomical scaffolds, flatmaps, segmentation, and videos, if available. Learn more about how to use the Gallery Viewer
  • References lists the primary publication associated with the dataset and any other related publications
  • Versions A dataset version refers to a DOI-specific, version-controlled iteration of a dataset. Learn more about SPARC Publication and Versioning

Browsing Dataset Files

Dataset Details Files Tab

The Files tab provides functionality to browse available data files and metadata through the portal and options to download individual data files or the complete datasets (Learn more downloading and using SPARC data).

File browser. If a viewer is available for a data type (e.g., mp4 horizontal arrow), it is indicated by the icon in the Action column (slanted arrow)

Files tab provides a browser, exposing the top-level hierarchy of the SDS

SPARC Dataset Structure (SDS) Organization

Data files are organized into folders by the investigators and curated according to the SPARC Dataset Structure (SDS). Learn more about how the SDS is used in the data submission and curation process in the Submission Overview.

Generalized diagram of the SDS

Generalized diagram of the SDS

Example of data files at the end of the path.

Example of data files at the end of the path.

Understanding SDS Folders and Metadata Files

A primary strength of SDS is that it is versatile enough to can accommodate the variability in the multi-modal, multi-species data published on the SPARC Portal. Dataset folders contain data, code, and supporting materials. These folders are accompanied by documents and spreadsheets containing critical metadata needed to understand the content of the folders. The dataset's experimental approach and the nature of the data determine which folders and structured metadata files are required by the standard. Therefore, not all datasets have all of the folders and files

source folder not pictured

source folder not pictured

Data Folders

  • primary: The primary folder is required for all experimental datasets and contains minimally processed (i.e., de-identified, versions of the main data products, e.g., images, spreadsheets, physiological traces, etc.
  • source: The source folder is only required if unaltered, raw files from an experiment are included in the dataset
  • derivative: This folder contains products derived from the original data, e.g., measurements from images, 3D reconstructions, or converted files

Example: Imaging data

  • primary: reconstructed DICOM or NIfTI files
  • source: “truly” raw k-space data for a Magnetic Resonance (MR) image that has not yet been reconstructed
  • derivative: analyses of the DICOM or NIfTI files

Metadata files:

  • README: provides instructions on using the dataset, descriptions of the file directories, challenges and limitations in obtaining data, information on missing data points or dropped subjects, etc.
  • dataset_description: provides general information about the dataset
  • subjects: lists subjects by their identifiers along with key details such as age, weight, and experimental groups
  • samples (if applicable): lists specimens used in the study by their identifiers and key details
  • performances (if applicable): describes data that were gathered from multiple distinct performances of one type of experimental protocol on the same subject or same sample (i.e. multiple visits, runs, sessions, or executions)

Inside the Data Folders

Within the data folders, data files are organized by subjects and/or samples. Each subject/sample has its folder named accordingly to the identifiers found in the subject and/or sample metadata files. The Subject (sub-) folders contain data collected directly from the subject, e.g., recordings from the brainstem in vivo. Data collected at multiple time points can be found in the performances (perf-) folder and are accompanied by the performances metadata file. Data derived from specimens, e.g., microscopy images, are available in the sample (sam-) folders.

Some investigators can further organize the data within the subject and/or sample folder, depending on the type of data.

Descriptive Folders

Protocols: This folder is optional and contains supplementary files to accompany the experimental protocols submitted to Protocols.io.

Docs folder: Here, you can find supplementary material necessary to understand the dataset, e.g., figures or diagrams.

Code folder: If a code is a part of the dataset, you can find it in the code folder. This folder will also contain a README file that provides information on how to install and run the code, what are the inputs, outputs, expected results, and any dependencies.

Manifest file

Within each top level folder, a manifest file lists all files within the folder and provides additional information about the files.

data structure 4

browsing files within a dataset

Viewing Files

You can select an individual file to open a file details page.

need example image

This page provides some high level information about the file and allows you to download the file in question.

calls to action

Additionally, for some file types (e.g., Microsoft Office, Biolucida), there is a viewer that allows you to open the file on the Portal.

learn more about viewers and tools that allow you to explore data directly on the portal

File browser. If a viewer is available for a data type (e.g., mp4 horizontal arrow), it is indicated by the icon in the Action column (slanted arrow)

File browser. If a viewer is available for a data type (e.g., mp4 horizontal arrow), it is indicated by the icon in the Action column (slanted arrow)

Working with a SPARC dataset

The SPARC Dataset viewer is a beta version web-based tool that allows one to quickly visualize SPARC datasets in a graphical viewer. It gives users a visual overview of the folders, files, subjects, samples, and metadata associated with datasets that adhere to the SPARC dataset structure.


Did this page help you?