SPARC Python Client
Introduction
In order to interact with multiple SPARC functionalities in Python and provide a more integrated approach to interact with SPARC resources, SPARC has introduced its own Python library called sparc.client. The library allows some basic operations for unauthorized users, such as browsing and querying publicly available datasets, listing records, files or downloading data.
Example Use Cases
The library is a joint effort of multiple SPARC teams to provide a common library that allows for interaction with different SPARC functionalities. Typical use cases include:
- downloading publicly available data from SPARC
- listing datasets, files, records
- authentication across different independent modules
- uploading and managing dataset
- importing/exporting studies and files
- access to knowledge, capabilities, and resources and organ scaffold mapping
Installation
In order to use SPARC's python client, please install the latest version of sparc.client from PyPI:
pip install sparc.client
SPARC Python Client has a modular structure. Each module provides different functionalities. The common configuration of the client is stored in config.ini file.
Alternatively, after downloading the most recent version of sparc.client from Github repository, the library could be installed in the following way:
pip install -e .
Setting up Authentication
For managing profiles, the library uses a config file in INI format. Providing a file is optional. If no configuration is provided, limiting functionalities may be available to the user.
from sparc.client import SparcClient
client = SparcClient(connect=False, config_file='config.ini')
Authenticated users gain access to more advanced functionality, including managing datasets, or uploading files. Those functionalities are provided by Pennsieve Agent. For details, please follow the Pennsieve Agent tutorial: Uploading files to SPARC Portal .
Modules
The following modules are currently available:
- Pennsieve module:
- listing datasets
- listing records
- listing files
- downloading files
- using Pennsieve API
- uploading files
- O2sparc module
- Upload files to the o²S²PARC platform
- Create computational jobs
- Submit jobs to computational services
- Inspect the logs of computational services
Examples
Pennsieve module
Listing public datasets
Listing a dataset that match a specific word, e.g. the last name of the PI or a medical term could be performed the following way:
response=client.pennsieve.list_datasets(query='cancer', limit=2)
response
For further reference, please read SPARC Python Client tutorial
Listing records
Apart from listing the dataset, we can also zoom into the records of a given dataset for a specific model, for example explore researchers within Sparc project.
response=client.pennsieve.list_records(model='researcher', organization='SPARC')
response
For further reference, please read SPARC Python Client tutorial
Listing files
Similarly, we can query for files that are related with given name, or extension, e.g. that are included in a specific dataset.
response=client.pennsieve.list_files(dataset_id=90, query='manifest', file_type='json')
response
For further reference, please read SPARC Python Client tutorial
Downloading files
All we need to do is to list file(s) that are to be downloaded and pass it to the download_files function.
The function will either download the file with its original extension (if output_name is not specified) or pack the files and download them in gzip format to the specified directory.
response=client.pennsieve.list_files(dataset_id=90, query='manifest', file_type='json')
client.pennsieve.download_file(file_list=response[1])
For further reference, please read SPARC Python Client tutorial
Using Pennsieve API
Sparc Client can interact with Pennsieve API and submit HTTP requests, e.g. GET or POST, for example:
#e.g. calling GET: https://docs.pennsieve.io/reference/browsefiles-1 for version 1 of dataset 90 with additional parameters
client.pennsieve.get('https://api.pennsieve.io/discover/datasets/90/versions/1/files/browse', params={'limit':2})
For reference, please visit Pennsieve API reference
Uploading files
An example on how to upload files to SPARC is available here: SPARC Python Client tutorial
Uploading requires downloading and configuring Pennsieve Agent. The instruction on configuring the agent could be found here: https://docs.pennsieve.io/docs/uploading-files-programmatically
Exporting Organ Scaffolds
An example on how to export organ scaffolds from the SPARC Portal to other common meshing formats is available here: SPARC Python Zinc Client tutorial
Export MBF Segmentation
An example on how to export MBF segmentations from the SPARC Portal to other common meshing formats is available here: SPARC Python Zinc Client tutorial
o²S²PARC module
Pre-requites:
- An account on osparc.io: you will need this to generate API tokens to use the o²S²PARC module. If you don't have an account, request one at o²S²PARC Support, as explained here.
- Have a configuration file (see here).
Installation
# import the necessary packages
import importlib, getpass
try:
import sparc.client
except ImportError:
! pip install sparc.client
try:
from tqdm import tqdm
except ImportError:
! pip install tqdm
from pprint import pprint
import os
from tqdm import tqdm
from sparc.client import SparcClient
from sparc.client.services.o2sparc import (
O2SparcService,
O2SparcSolver
)
from time import sleep
from pathlib import Path
from tempfile import TemporaryDirectory
Login to the o²S²PARC Python Client
Set-up credentials for o²S²PARC in osparc.io as explained here. Then use the "Key" as "Username" and "Secret" as "Password". Another option is to export them as environment variables.
os.environ["O2SPARC_HOST"] = getpass.getpass("osparc host, e.g. https://api.osparc.io:")
os.environ["O2SPARC_USERNAME"] = getpass.getpass("API Key:")
os.environ["O2SPARC_PASSWORD"] = getpass.getpass("API Secret:")
assert "O2SPARC_HOST" in os.environ, "O2SPARC_HOST must be exposed as an environment variable"
assert "O2SPARC_USERNAME" in os.environ, "O2SPARC_USERNAME must be exposed as an environment variable"
assert "O2SPARC_PASSWORD" in os.environ, "O2SPARC_PASSWORD must be exposed as an environment variable"
Access to the computational services provided by the o2sparc
submodule, is provided via O2SparcService
, an instance of which can be created as follows:
client = SparcClient(connect=False, config_file='config.ini')
o2sparc: O2SparcService = client.o2sparc
As a sanity check you can now contact the o²S²PARC server to check that you are logged in as the correct user:
print(o2sparc.get_profile())
# [email protected]
Run a computation
Now to the fun part :). In the following cell we specify a job (or a "computation") which we submit to a solver (or "computational service") on the o²S²PARC platform. More information on computational services here.
with TemporaryDirectory() as tmp_dir:
input_file: Path = Path(tmp_dir) / "input_file.txt"
input_file.write_text("3")
job: dict = {
"input_3": 0,
"input_2": 3.0,
"input_1": input_file
}
solver: O2SparcSolver = o2sparc.get_solver(solver_key="simcore/services/comp/itis/sleeper",solver_version="2.0.2")
job_id = solver.submit_job(job)
pbar = tqdm(total=1.0)
progress: float = 0
while not solver.job_done(job_id):
sleep(1)
if solver.get_job_progress(job_id) > progress:
pbar.update(solver.get_job_progress(job_id) - progress)
progress = solver.get_job_progress(job_id)
print("job outputs:")
for output_name, result in solver.get_results(job_id).items():
if isinstance(result,Path):
print(f"{output_name}: {Path(result).read_text()}")
else:
print(f"{output_name}: {result}")
print("job log:")
log_dir: TemporaryDirectory = solver.get_job_log(job_id)
for elm in Path(log_dir.name).rglob("*"):
if elm.is_file():
print(elm.read_text())
Understanding the steps
Let's break this into pieces.
First we specify the job/input to our solver. A job for a O2SparcSolver
is nothing but a dictionary whose entries are one of the following types: str
, int
, float
or pathlib.Path
. Only pathlib.Path
objects can be passed which point to existing files, like our input_1
above.
input_file: Path = Path(tmp_dir) / "input_file.txt"
input_file.write_text("3")
job: dict = {
"input_3": 0,
"input_2": 3.0,
"input_1": input_file
}
Second, we get the solver/computational resource we want to use by specifying its identifier and version. Once we have our solver we can submit the job to it.
solver: O2SparcSolver = o2sparc.get_solver(solver_key="simcore/services/comp/itis/sleeper",solver_version="2.0.2")
job_id = solver.submit_job(job)
Next, we need to wait for the job to be solved
pbar = tqdm(total=1.0)
progress: float = 0
while not solver.job_done(job_id):
sleep(1)
if solver.get_job_progress(job_id) > progress:
pbar.update(solver.get_job_progress(job_id) - progress)
progress = solver.get_job_progress(job_id)
Once the solver is done performing its computation we can get the results it produces
print("job outputs:")
for output_name, result in solver.get_results(job_id).items():
if isinstance(result,Path):
print(f"{output_name}: {Path(result).read_text()}")
else:
print(f"{output_name}: {result}")
as well as the log it has produced
print("job log:")
log_dir: TemporaryDirectory = solver.get_job_log(job_id)
for elm in Path(log_dir.name).rglob("*"):
if elm.is_file():
print(elm.read_text())
Getting the log can be particularly useful for determining issues encountered by the solver. E.g. if the specified job
is invalid, so that the solver cannot use it, the log will typically indicate this.
More resources
- More tutorials on the o²S²PARC Python Client are available here
- Documentation on the NIH SPARC Client and its sub-modules can be found here.
References
In order to learn more about SPARC Python library, please refer the following websites:
- Tutorial in Jupyter Notebook on how to use sparc.client: https://github.com/nih-sparc/sparc.client/blob/main/docs/tutorial.ipynb
- Tutorial in a Jupyter Notebook on how to analyse and export organ scaffolds using sparc.client: https://github.com/nih-sparc/sparc.client/blob/main/docs/tutorial-zinc.ipynb
- GitHub repository with the most current version of sparc.client : https://github.com/nih-sparc/sparc.client
- Code reference for the sparc.client : https://nih-sparc.github.io/sparc.client/sparc.client.html
- Uploading files to SPARC Portal (requires: Pennsieve Agent): https://docs.pennsieve.io/docs/uploading-files-programmatically .
- Guide for contributors: https://github.com/nih-sparc/sparc.client#readme
Updated 3 months ago