SPARC/dbGaP Quick-Reference

This information is provided for general demonstration purposes and is not a substitute for regulatory guidance. Ultimate responsibility for compliance with all applicable mandates — including the NIH Genomic Data Sharing Policy — rests solely with the investigator and Principal Investigator.


Table 1. SPARC and dbGaP Repositories at a Glance

SPARC — Open-access repositorydbGaP — Controlled-access repository
Pennsieve / sparc.science
Non-identifying derived data, metadata, code, protocols
NIH GDS Policy
Human read-level, alignment, variant, and phenotype data

Table 2. Deposit in SPARC

Open-access · Derived, processed, and non-identifying data

Data category Examples / content Typical files / extensions
Derived / processed transcriptomics (bulk RNA-seq) Gene expression summaries (non-identifying) counts.tsv / .csv
tpm.tsv / .csv
DEG_results.tsv / .csv
GSEA / GO / KEGG_results.tsv / .csv
Derived / processed single-cell & single-nucleus RNA-seq Cell × gene matrices, cell-type labels, marker genes matrix.mtx(.gz)
features.tsv(.gz)
barcodes.tsv(.gz)
*_feature_bc_matrix.h5
cell_annotations.csv
markers.csv
Derived ATAC / accessibility outputs Peaks and summarized accessibility peaks.bed
peak_matrix.mtx / .h5
motif_enrichment.tsv
summary plots (.pdf / .png)
Visium spatial (Space Ranger summaries) Spot × gene matrices + spatial metadata filtered_feature_bc_matrix/*
raw_feature_bc_matrix/*
spatial/scalefactors_json.json
tissue_positions*.csv
.cloupe
web_summary.html
Xenium summaries Cell × gene matrices, segmentations, QC, de-identified images cell_feature_matrix/* (MEX/H5/Zarr)
cell_boundaries.parquet
nucleus_boundaries.parquet
QC html / csv
de-ID images (PNG / TIFF / OME-TIFF / JP2)
QC reports (no reads) Pipeline summaries and metrics web_summary.html
metrics_summary.csv
multiqc_report.html
QC reports (.pdf)
Analysis code & workflows Reproducibility artifacts .R / .py / .ipynb
Dockerfile
environment.yml / requirements.txt
workflow configs (.yaml / .json)
Non-genomic modalities (de-identified) Imaging, histology, microscopy; physiology / ephys See SPARC accepted file formats
Documentation Methods, protocols, README, crosswalk README.md
protocol docs (.docx / .pdf)
SDS metadata files

Table 3. Deposit in dbGaP

Controlled-access · Human read-level, alignment, variant, and phenotype data

Data category Examples / content Typical files / extensions
Read-level sequencing Raw reads + base quality scores *.fastq / .fq / .fastq.gz / .fq.gz
BCL run folders
Alignments / near-raw Aligned or unaligned reads *.bam + index (.bai / .csi)
*.cram + index (.crai)
uBAM (unaligned .bam)
Variants Genotype / variant calls *.vcf / .vcf.gz + index (.tbi / .csi)
*.g.vcf / .g.vcf.gz + index (.tbi / .csi)
Nanopore / PacBio raw Long-read raw signal or subreads Nanopore: .fast5 / .pod5
PacBio: subreads.bam / *.bax.h5
Participant-level phenotypes & consents Individual-level clinical / phenotype data + consent terms Consent documents (.pdf / .docx / .xlsx)
⚠️ Note: This table is a non-exhaustive quick-reference. Companion and index files (.bai, .csi, .tbi, .crai) that enable interpretation or reconstruction of underlying controlled-access genomic data are also subject to controlled-access deposition requirements. For ambiguous cases, consult your NIH Genomic Program Administrator (GPA) before depositing.