Navigating Datasets on SPARC

Anatomy of a SPARC Dataset

SPARC Portal hosts a growing collection of diverse datasets spanning anatomy, physiology, modeling, and simulation.

The Dataset Landing Page

The Dataset Landing Page (Figure 1.1) provides basic information about the dataset, such as its title and ORCID-linked contributors (1.1), its DOI and size (1.2), version (1.3), how to get access to the dataset (1.4), related publications (1.5), usage rights (1.6) and metrics (1.7), informational details tabs (1.8), and Curator's Notes (1.9).

Dataset Details

The informational tabs (Figure 1.8) provide you with summary details about the dataset:

Abstract includes a curated summary of the experimental design, subject species, published protocol link, sample types, and data overview.
About lists dataset publication details, such as corresponding author, funding program and award, and associated projects.
Cite provides formatted citations to help you easily cite the dataset. Learn more about how to cite SPARC data and resources.
Files displays the dataset files and folders. SPARC implemented a file structure and naming convention, SPARC Data Structure (SDS), to provide consistency across all datasets. Learn more about the SDS Standard and it helps you browse files, below.
Gallery offers a web preview of the microscopy image data along with anatomical scaffolds, flatmaps, segmentation, and videos, if available. Learn more about how to use the Gallery Viewer
References lists the primary publication associated with the dataset and any other related publications
Versions A dataset version refers to a DOI-specific, version-controlled iteration of a dataset. Learn more about SPARC Publication and Versioning
Metrics This updated section now highlights key indicators such as citations, downloads, and protocol views. It offers valuable insight into how your datasets are being used and recognized within the scientific community.

Browsing Dataset Files

Dataset Details Files Tab

The Files tab provides functionality to browse available data files and metadata through the portal and options to download individual data files or the complete datasets (Learn more downloading and using SPARC data).

File browser. If a viewer is available for a data type (e.g., mp4 horizontal arrow), it is indicated by the icon in the Action column (slanted arrow) — Files tab provides a browser, exposing the top-level hierarchy of the SDS

SPARC Dataset Structure (SDS) Organization

Data files are organized into folders by the investigators and curated according to the SPARC Dataset Structure (SDS). Learn more about how the SDS is used in the data submission and curation process in the Submission Overview.

SPARC datasets must follow a Naming Convention and contain:

Example of data files at the end of the path.

Understanding SDS Folders and Metadata Files

A primary strength of SDS is that it is versatile enough to can accommodate the variability in the multi-modal, multi-species data published on the SPARC Portal. Dataset folders contain data, code, and supporting materials. These folders are accompanied by documents and spreadsheets containing critical metadata needed to understand the content of the folders. The dataset's experimental approach and the nature of the data determine which folders and structured metadata files are required by the standard. Therefore, not all datasets have all of the folders and files

Data Folders

primary: The primary folder is required for all experimental datasets and contains minimally processed (i.e., de-identified, versions of the main data products, e.g., images, spreadsheets, physiological traces, etc.
source: The source folder is only required if unaltered, raw files from an experiment are included in the dataset
derivative: This folder contains products derived from the original data, e.g., measurements from images, 3D reconstructions, or converted files

Example: Imaging data

primary: reconstructed DICOM or NIfTI files
source: “truly” raw k-space data for a Magnetic Resonance (MR) image that has not yet been reconstructed
derivative: analyses of the DICOM or NIfTI files

Metadata files:

README: provides instructions on using the dataset, descriptions of the file directories, challenges and limitations in obtaining data, information on missing data points or dropped subjects, etc.
dataset_description: provides general information about the dataset
subjects: lists subjects by their identifiers along with key details such as age, weight, and experimental groups
samples (if applicable): lists specimens used in the study by their identifiers and key details
performances (if applicable): describes data that were gathered from multiple distinct performances of one type of experimental protocol on the same subject or same sample (i.e. multiple visits, runs, sessions, or executions)

Inside the Data Folders

Within the data folders, data files are organized by subjects and/or samples. Each subject/sample has its folder named accordingly to the identifiers found in the subject and/or sample metadata files. The Subject (sub-) folders contain data collected directly from the subject, e.g., recordings from the brainstem in vivo. Data collected at multiple time points can be found in the performances (perf-) folder and are accompanied by the performances metadata file. Data derived from specimens, e.g., microscopy images, are available in the sample (sam-) folders.

Some investigators can further organize the data within the subject and/or sample folder, depending on the type of data.

Descriptive Folders

Protocols: This folder is optional and contains supplementary files to accompany the experimental protocols submitted to Protocols.io.

Docs folder: Here, you can find supplementary material necessary to understand the dataset, e.g., figures or diagrams.

Code folder: If a code is a part of the dataset, you can find it in the code folder. This folder will also contain a README file that provides information on how to install and run the code, what are the inputs, outputs, expected results, and any dependencies.

Manifest file

Within each top level folder, a manifest file lists all files within the folder and provides additional information about the files.

data structure 4 — browsing files within a dataset

Viewing Files

You can select an individual file to open a file details page.

need example image

This page provides some high level information about the file and allows you to download the file in question.

calls to action

Additionally, for some file types (e.g., Microsoft Office, Biolucida), there is a viewer that allows you to open the file on the Portal.

learn more about viewers and tools that allow you to explore data directly on the portal

Working with a SPARC dataset

The SPARC Dataset viewer is a beta version web-based tool that allows one to quickly visualize SPARC datasets in a graphical viewer. It gives users a visual overview of the folders, files, subjects, samples, and metadata associated with datasets that adhere to the SPARC dataset structure.