FAQs: SPARC Data Submission

Tips and suggestions for submitting datasets to SPARC

Uploading data

Can I upload to SPARC from the Cloud? My data are in BOX, Google Drive, or other cloud-based services.

We understand that you can't keep all data on your laptop. Many investigators utilize institutional resources, such as Box, Google Drive, Amazon Web Storage (AWS), and local storage servers. SPARC utilizes the Pennsieve platform (maintained by DAT-Core) for data upload. Pennsieve relies on AWS. We are currently working on a solution for loading data directly from other AWS buckets. For questions about this process, please contact DAT-Core. If you'd like the Curation team to help you manage this today, please contact ([email protected]). Our team is here to help you.

How do I upload large datasets?

Large datasets that are over 250GB and/or contain 1000 or more files or folders are becoming more and more common among SPARC researchers. The SPARC DRC is constantly improving our tools and resources to make data sharing more efficient and less painful for our users.

If you are uploading a large dataset, keep in mind how much it will cost users to download your files.

How long will it take me to upload my dataset?

Uploading large datasets can sometimes take a hours. Many factors influence how long it might take for your dataset to be uploaded, such as the size and number of your dataset's files, your internet speed, and your local computing hardware and connections to your data storage location. For example:

  • Uploading a data folder of large imaging files consisting of 308GB with a 1000 Mbps hardwired internet connection took 1hr 20mins
  • Uploading a dataset consisting of 55,000 files and 1.5TB stored on an external hard drive usb connected to a local desktop took about 18hrs to upload on 1000 Mbps hardwired internet connection.

Here are a few tips to keep upload time to a minimum:

  • Be sure you are using the most up-to-date version of the SPARC upload software: SODA and the Pennsieve Agent
  • Utilize your institution's hardwired internet connection to access transfer speeds of around 1000 Mbps (most wireless networks limit upload speed and total amount of data transferred within a given time)
  • Your computer or server will need to be on and connected to the internet for the entirety of the upload. Try to use computers that will not need to be interrupted for hours at a time. New feature alert: uploads that are stopped or interrupted can now be restarted without staring over from scratch!

Have a single file over 220GBs?

The standard settings of the SPARC upload software may need to be changed to allow for files over 220GBs to be uploaded without interuption. Please email DAT-Core before you begin your upload with your specific files sizes so we can help you configure your settings for a smooth upload experience.

Need help or have questions?

Email DAT-Core Our team is here to help you.

How much will it cost for users to download my large dataset?

As you upload your dataset, keep in mind that people pay to download your dataset, which could influence its reuse. As of June 2024, downloading costs are ~$90/TB. To encourage reuse of large datasets, include meaningful descriptions, methods, and metadata, as well as opportunities for people to preview some of your data on the SPARC Portal.

Published Data

What if my data were published under embargo, but I need them to be available to the public immediately?

Please note: Only the embargoed dataset owner can request its release.

To request a release of embargoed dataset, please:

  • Log in to your Pennsieve account.
  • Click Publishing on the left sidebar (or a Globe icon if your sidebar is collapsed).
  • Click the Publishing tab above the dataset titles.
  • Find the dataset you wish to release.
  • Click the Request Release option.

Preparing datasets for upload

Are there naming constraints for files and folders?

Yes, review File and Folder Naming Constraints for this information.

Does SPARC have recommendations for dataset titles and descriptions?

Yes. Visit Guidelines for Effective SPARC Dataset Titles and Descriptions for this information.

Where do I find information about how to prepare my dataset for sharing on the SPARC Portal?

Visit Data Submission Walkthrough Intro for the full process, and Organize Your Files for information about how to organize your files according to the SPARC Dataset Structure (SDS).

Questions related to protocols

What makes a successful protocol?

Well, one that others easily understand, of course!

The following tips will help ensure other investigators can understand your protocol:

  • Give step-by-step instructions to avoid lengthy blurbs of text.
  • Break methods into easy-to-follow sections and introduce relevant headings.
  • For experimental procedures requiring multiple protocols, create separate protocols and combine them into a collection.
    • These separate protocols can then be reused to support other datasets.
  • Indicate the chronology of steps and methods.
  • Emphasize the key steps created in the protocol.
  • Add applicable warnings, tips, and safety information.
  • Provide additional links to manuscripts when applicable.
  • A picture is worth a thousand words. Add photos, graphics, and movies when possible.

Can I use the methods section in my published manuscript as a protocol?

No, materials and methods sections from published manuscripts or IRB applications cannot be accepted. A DOI or link to the published detailed protocol itself must be provided.

What if I published a protocol on another protocol platform?

While Protocols.io is preferred, we do accept protocols published through the following platforms as well:

  • Bio-protocol
  • BioTechniques
  • Current Protocols
  • Journal of Visualized Experiments (JoVE)
  • MethodsX
  • Nature Protocols
  • Springer Nature Experiments
  • STAR Protocols

What if my dataset is embargoed, and I do not want to make my protocol public until the dataset is published?

Protocols for embargoed datasets can be temporarily kept private. You can create them in your private folder in your dedicated SPARC or RE-JOIN workspace. Just be sure they are published a week before the embargo ends.

Publishing data will make them findable and available for anyone within and outside the SPARC or RE-JOIN workspace communities.

How do I edit a protocol with an assigned DOI in Protocols.io?

To edit a protocol with assigned DOI, please:

  • Log in to protocols.io and select the protocol you wish to edit.
  • Click the COPY / FORK button to the right of the protocol title.
  • Follow the prompts.
  • Edit the protocol.
  • Before you publish the new version, email the shareable link to the curation team ([email protected]).
  • After the curation review, follow the steps described in Step 5.

How do I create a collection of protocols in Protocols.io?

Many experimental procedures will require multiple protocols. In this case, you can create separate protocols, then combine them into a collection. Each protocol in the collection will receive its own DOI (so they can be used to support other datasets), and the complete collection itself will receive its own DOI.

To create a collection:

  1. Create separate protocols following the steps described in Step 2: Create a New Protocol.
  2. Once all protocols are created click the “NEW +” button in the upper-right corner of the screen.
  3. Choose “New Collection” from the drop-down menu, then click “Create”.
  4. Add a descriptive title for your collection.
    1. Avoid abbreviations of anatomical regions and techniques.
    2. Do not include the lab name or grant number.
    3. Include information about the technique used, species, and purpose.
  5. Click on the slider to select “Search your files”.
  6. Type the name of a protocol you wish to add to the collection, then select it from the list.
  7. Repeat steps 5 & 6 until all protocols have been entered.
  8. Add Description, Guidelines & Warnings, and Material Sections as described in Step 2: Create a New Protocol.