SPARC/dbGaP Quick-Reference
This information is provided for general demonstration purposes and is not a substitute for regulatory guidance. Ultimate responsibility for compliance with all applicable mandates — including the NIH Genomic Data Sharing Policy — rests solely with the investigator and Principal Investigator.
Table 1. SPARC and dbGaP Repositories at a Glance
| SPARC — Open-access repository | dbGaP — Controlled-access repository |
| Pennsieve / sparc.science Non-identifying derived data, metadata, code, protocols | NIH GDS Policy Human read-level, alignment, variant, and phenotype data |
Table 2. Deposit in SPARC
Open-access · Derived, processed, and non-identifying data
| Data category | Examples / content | Typical files / extensions |
|---|---|---|
| Derived / processed transcriptomics (bulk RNA-seq) | Gene expression summaries (non-identifying) | counts.tsv / .csvtpm.tsv / .csvDEG_results.tsv / .csvGSEA / GO / KEGG_results.tsv / .csv |
| Derived / processed single-cell & single-nucleus RNA-seq | Cell × gene matrices, cell-type labels, marker genes | matrix.mtx(.gz)features.tsv(.gz)barcodes.tsv(.gz)*_feature_bc_matrix.h5cell_annotations.csvmarkers.csv |
| Derived ATAC / accessibility outputs | Peaks and summarized accessibility | peaks.bedpeak_matrix.mtx / .h5motif_enrichment.tsvsummary plots ( .pdf / .png) |
| Visium spatial (Space Ranger summaries) | Spot × gene matrices + spatial metadata | filtered_feature_bc_matrix/*raw_feature_bc_matrix/*spatial/scalefactors_json.jsontissue_positions*.csv.cloupeweb_summary.html |
| Xenium summaries | Cell × gene matrices, segmentations, QC, de-identified images | cell_feature_matrix/* (MEX/H5/Zarr)cell_boundaries.parquetnucleus_boundaries.parquetQC html / csv de-ID images (PNG / TIFF / OME-TIFF / JP2) |
| QC reports (no reads) | Pipeline summaries and metrics | web_summary.htmlmetrics_summary.csvmultiqc_report.htmlQC reports ( .pdf) |
| Analysis code & workflows | Reproducibility artifacts | .R / .py / .ipynbDockerfileenvironment.yml / requirements.txtworkflow configs ( .yaml / .json) |
| Non-genomic modalities (de-identified) | Imaging, histology, microscopy; physiology / ephys | See SPARC accepted file formats |
| Documentation | Methods, protocols, README, crosswalk | README.mdprotocol docs ( .docx / .pdf)SDS metadata files |
Table 3. Deposit in dbGaP
Controlled-access · Human read-level, alignment, variant, and phenotype data
| Data category | Examples / content | Typical files / extensions |
|---|---|---|
| Read-level sequencing | Raw reads + base quality scores | *.fastq / .fq / .fastq.gz / .fq.gzBCL run folders |
| Alignments / near-raw | Aligned or unaligned reads | *.bam + index (.bai / .csi)*.cram + index (.crai)uBAM (unaligned .bam) |
| Variants | Genotype / variant calls | *.vcf / .vcf.gz + index (.tbi / .csi)*.g.vcf / .g.vcf.gz + index (.tbi / .csi) |
| Nanopore / PacBio raw | Long-read raw signal or subreads | Nanopore: .fast5 / .pod5PacBio: subreads.bam / *.bax.h5 |
| Participant-level phenotypes & consents | Individual-level clinical / phenotype data + consent terms | Consent documents (.pdf / .docx / .xlsx) |
⚠️ Note: This table is a non-exhaustive quick-reference. Companion and index files (.bai, .csi, .tbi, .crai) that enable interpretation or reconstruction of underlying controlled-access genomic data are also subject to controlled-access deposition requirements. For ambiguous cases, consult your NIH Genomic Program Administrator (GPA) before depositing. |
Updated about 14 hours ago