File Formats Accepted by SPARC
File formats for SPARC
While the file extension list is thorough, it's not exhaustive. Given the ongoing advancements in technology, we anticipate the emergence of new file types not covered here. If you encounter file extensions not documented in this list, please reach out to the curation team ([email protected]) for further discussion. Presently, the DRC is discussing how to accommodate OME-Zarr.
File Type | Required | Preferred | Will accept | Will not accept | BRAIN Initiative |
---|---|---|---|---|---|
Cell Sorting | .fcs | ||||
Code | Python, Matlab, R, | Java,C/C++, Octave, openGL and Fortran | |||
Documents | .txt, .md | .docx, .pdf, .rtf, .odl, .ods, or any format fully supported as a source for pandoc | .pages, .doc | ||
Figures | .tif, .tiff, .jpg, .jpeg, .png | .svg | .cdr or any proprietary format | ||
Generic Data | .hdf5, .mat, .xml, .json, | ||||
Generic Images | JPEG2000, OMETIFF | Any that can be converted by Microfile+ | |||
Microscopy Image data (raw, primary or derivative) | JPEG2000, OMETIFF1 | Terabyte and large volume compatible formats | Any that can be converted by Bioformats2 Any that can be converted by Microfile+ | .sws | |
Morphology | .xml | Segmentation and skeletonization: xml .swc, .roi, .ims Any that can be converted currently by MBF; open formats for MRI data | .swc Morphology | ||
Neuroimaging | .nifti | .DICOM | Required by BIDS | ||
Presentation | ppt or .pptx | .key | |||
Sequence | fastq | bam, .vcf, cram | fasta | “If transcriptomic data is available, fastq files are the minimum version of data to deposit. BAM files are optional.” | |
System files | .ini, .db | ||||
Tabular | .csv3, .tsv | .xlsx, .ods | .numbers, .xls | ||
Time series | .nwb 2.04 | Any format that can be converted using NEO. (including generic open formats, eg. csv) | .json, .adi | .nwb 2.0 | |
Vector drawings | svg | ||||
Video | .mp4 | .avi, .webvm, .ogv |
1 https://docs.openmicroscopy.org/ome-model/6.1.0/ome-tiff/
2 https://www.openmicroscopy.org/bio-formats/
3 Best practices for fomatting tabular data: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005510
4 https://nwb-schema.readthedocs.io/en/latest/
Notes on file formats
Compressed files
Files are currently uploaded as .zip, .gz or .bz2. If compression is used, it must be lossless compression.
Custom formats
The curation team will match all data files against the allowable file types and extensions.
- If it is a custom file format built on a generic data format (e.g., json), it must be documented as to which type and be accompanied by this documentation.
- If the format is truly custom; that is, not built on a pre-existing format, it must be convertable into one of the open formats and should be accompanied by code that can perform this conversion and open the files. Code should be deposited to SPARC in the Code folder and a link to any appropriate code repositories provided.
Image data
We recommend that all raw, primary or derivative image data for SPARC be made available in JPEG2000 and OMETIFF. Acceptable formats are those that can currently be converted by MBF tools. MBF offers a free tool, Microfile+, that converts a number of microscopy image type to the SPARC standardized formats. A list of these is found at the MIcrofile+ site. The converted files should be stored in the derivative data folder. Note that many investigators submit supporting images or figures. These will not be converted but must be in an acceptable format.
Morphology data
Morphology data is defined here as segmented data (e.g., surfaces, polygons, skeletons, binary masks) generated from primary imaging data (e.g., a neuron tree structure).
For neuron tree structures, MBF XML is a continuously supported and developed format that conforms to standard XML markup rules. As such, the data content is both machine- and human-readable and can be accessed using standard XML libraries and tools. SWC is an open format, while SWC can be used to represent tree structures, at the time this was written, it has limited or no support for other specific structures (spines, somas, varicosities, puncta, and blood vessels) or for annotation elements such as text, markers, and regions. MBF tools can read SWC and convert it to XML.
Time series data
NWB is our recommended format for time series data, as it is a BRAIN Initiative Standard and is being supported by the DANDI archive and the Allen Brain Institute for Brain Sciences. It is also endorsed by the INCF.
SPARC will take a phased approach towards adopting NWB. SPARC investigators are not yet submitting data in the .nwb format. Moreoever, we have identified > 20 different file extensions for physiological data coming from multiple commercial and open source packages, including Spike2, NeuraLynx, BrainVision, Synapse, NeuroLexus, Plexon, and Axon PCLAMP.
At this time, we accept file types for which there is a converter available into NWB so that a user will not be locked into a proprietary system. NWB has some recommendations for converters into NWB on their site.
Recommendations
- In the short term, SPARC will accept time series data in both proprietary formats and open formats, as long as the proprietary format can be converted to NWB using an available tool.
- The tools used to produce the data must be provided as metadata.
- A list of formats currently supported by Spike Interfaces is available on their site.
- In the long term, SPARC will consider requiring all time series data be submitted in NWB format.
Sequence data
The fastq file is required, but it is up to the investigator to include as much of the intermediary data as required for replicability.
System files
Investigators should remove various system-generated files (e.g., .ini, .db) from their datasets before release to the public.
Converted files
SPARC policy is to store native files acquired by the investigator in the Source folder as raw data. All data converted into an acceptable format will be in the Primary data folder.
Updated 4 months ago