Public Datasets Download and Access

The SPARC Portal allows access to public datasets through a browser or Amazon S3.

Users can get access to public datasets directly through the SPARC Portal. All datasets, regardless of size, can be accessed on Amazon's S3 service using your own AWS account.

The SPARC Portal displays the most current version of a datasets and provides several ways to access and download files in any version of that dataset.

Navigating a SPARC Dataset

Downloading Full Datasets

Browser Download - Available only for Datasets under 5GB

For datasets and files under 5GB, SPARC provides users with a mechanism to download the dataset directly from the dataset landing page.

Datasets smaller than 5GB can be downloaded directly through the browser. Clicking the Download full dataset button will immediately start the download process through the web browser. Please note that the files will be compressed upon download. Datasets larger than 5GB can only be accessed from Amazon Web Services (AWS) S3 Requester Pays service (details below!).

Accessing Current Dataset Versions on Amazon Web Services (AWS) S3

The most current version of all datasets are immediately available directly on the Amazon Web Services (AWS) S3 Requester Pays service. This means that any costs associated with downloading the data will be charged to your AWS account. Please note that this is the only way to access full datasets that are larger than 5GB.

For up-to-date transfer pricing, please visit the AWS Pricing documentation. Step-by-step tutorials on how to create a free AWS Account and how to access datasets within S3 are available below.

Accessing Older Versions of Full Datasets on Amazon Web Services (AWS) S3

The SPARC Portal's storage mechanism for published datasets are provided by DAT-Core's Pennsieve Platform. It ensures that published files are stored only once in the AWS cloud. Older versions of datasets less than 5GB and all versions of individual files are immediately available for users to access and download through the web browser. However, in order to access any full dataset of an older version, it must be temporarily restored as a whole package before it can accessed on the AWS S3 Requester Pays service in its entirety. To restore an older version of a dataset to access, navigate to the version that you would like to access and Request Access.

After submitting your request, the Pennsieve Platform will get to work! Restoring older versions may take up to 24hrs to complete and depend on the datasets's size.

Once the older version is ready, you will be notified via email from [email protected]with instructions on how to access it on AWS S3. Don't forget to check your spam folder!

📘

The older version of the dataset will be temporarily available for only 14 days.


Downloading Files from a Dataset

Users can also download individual files and folders within a dataset.

Individual File Downloads

To download a single file, click on the down arrow icon in the Action column of a dataset's Files tab. For files smaller than 5GB, the file will immediately begin to download to your local machine. Please note that the files will be compressed upon download. Files larger than 5GB are made available directly on Amazon’s S3 service (see above for details).

For some files types, such as TIFF, JPEG2000, or JSON, clicking on the file name will open a new page listing the file details. From there, you can also click the Download button in the top right.

Multiple File Downloads

To download multiple files and folders at a time, click the checkbox next to each file to be downloaded. Next, click the Download Selected Files and Folders button in the bottom left of the Files tab.

A confirmation screen will appear. Name the ZIP folder and click Download to initiate your immediate download.

📘

Single files will be downloaded and saved in their original file format. Files and/or folders downloaded together will be saved as a ZIP folder containing the selected files/folders.

Download or Transfer fees

For datasets and files under 5GB, SPARC covers the egress fees and provides users with a mechanism to download the dataset directly from the dataset landing page.

SPARC is built on AWS infrastructure and is subject to AWS egress fees anytime data is downloaded or transferred from its AWS S3 storage bucket.

For up-to-date transfer pricing, please visit the AWS Pricing documentation.