SPARC Portal Data Repository Structure

Dataset storage: what is the difference between the SPARC Portal and Pennsieve?

Overview

This document describes where SPARC data is stored and how Pennsieve is related to:

  1. The Pennsieve Data Management Application
  2. The SPARC SODA Application
  3. The Pennsieve Discover Application
  4. The SPARC Portal

Pennsieve

The Pennsieve platform provides all core functionality for storing, managing, sharing, and publishing SPARC datasets as well as other datasets that are managed through other programs, or by individual labs. The platform has been developed over the last six years to support multiple data repositories, or organizations through a single set of services. This means that different institutions, labs, NIH programs, or foundations can leverage the same infrastructure to set up a repository and data management solution without having to set up the infrastructure for each project independently. We typically refer to this as a multi-tenant data ecosystem.

Pennsieve is developed to be extended by external applications by making its API publicly accessible. This allows anyone to build applications that use the Pennsieve data ecosystem under the hood. The SPARC program uses this to:

  1. standardize data import through the SODA application, and
  2. develop a SPARC specific view of all SPARC related data in the SPARC Portal

The image below shows the relationship between the various applications:

SPARC Portal vs Discover diagram

Non-public Data Management

Before datasets are published, there are mechanisms to upload, curate, and share datasets with select users. The Pennsieve data ecosystem provides a web application to manage these "workspaces." A separate, more opinionated mechanism was developed specifically for the SPARC program. Both are described below.

Pennsieve Data Management Application

The Pennsieve Data Management Application is the web application that you see when you go to https://app.pennsieve.io. This web application leverages the Pennsieve APIs to interact with the Pennsieve data ecosystem and allows the user to create datasets, upload data, manage users, publish data, and more. This application is set up to be non-opinionated about the way that users describe their data. That is, it provides full flexibility for users to create a metadata schema, and has no restrictions on folder hierarchies.

The SPARC SODA App

The SPARC SODA app is a downloadable application that is used by the SPARC program to interact with the Pennsieve data ecosystem. It is an opinionated mechanism that enforces some structure to data uploaded to the Pennsieve data ecosystem such that datasets are in compliance with the SPARC standards. It does not cover all the functionality of the Pennsieve Data Management Application, but is specialized to facilitate uploading SPARC datasets.

Accessing Public Data

The Pennsieve Data Ecosystem provides a rich set of APIs and mechanisms to interact with public datasets. All Pennsieve public data can be accessed through the Pennsieve Discover web application. Over the last couple of years, the SPARC DRC developed a dedicated web application, the SPARC Portal, which hosts public SPARC-related data and services.

Pennsieve Discover

Pennsieve Discover is the web application that you see when you go to Pennsieve. It provides a way for anybody to find all datasets that are published through the Pennsieve Data Ecosystem. Pennsieve Discover has a narrow scope in that it only provides access to datasets and does not have any other functionality.

SPARC Portal

The SPARC Portal is the web application that you see when you go to https://sparc.science. It combines a lot of information about the SPARC project, including a way to find public SPARC datasets that are hosted on the Pennsieve Data Ecosystem. In addition to listing public SPARC datasets, it also contains information about SPARC events, news, visual maps and other interactive components specific to the SPARC program.