SPARC Python Client

Introduction

In order to interact with multiple SPARC functionalities in Python and provide a more integrated approach to interact with SPARC resources, SPARC has introduced its own Python library called sparc.client. The library allows some basic operations for unauthorized users, such as browsing and querying publicly available datasets, listing records, files or downloading data.

Example Use Cases

The library is a joint effort of multiple SPARC teams to provide a common library that allows for interaction with different SPARC functionalities. Typical use cases include:

  • downloading publicly available data from SPARC
  • listing datasets, files, records
  • authentication across different independent modules
  • uploading and managing dataset
  • importing/exporting studies and files
  • access to knowledge, capabilities, and resources and organ scaffold mapping

Installation

In order to use SPARC's python client, please install the latest version of sparc.client from PyPI:

pip install sparc.client

SPARC Python Client has a modular structure. Each module provides different functionalities. The common configuration of the client is stored in config.ini file.

Alternatively, after downloading the most recent version of sparc.client from Github repository, the library could be installed in the following way:

pip install -e .

Setting up Authentication

The library requires to create a config file (a default version is available here),

from sparc.client import SparcClient
client = SparcClient(connect=False, config_file='config.ini')

Authenticated users gain access to more advanced functionality, including managing datasets, or uploading files. Those functionalities are provided by Pennsieve Agent. For details, please follow the Pennsieve Agent tutorial: Uploading files to SPARC Portal .

Modules

The following modules are currently available:

  • Pennsieve module:
    • listing datasets
    • listing records
    • listing files
    • downloading files
    • using Pennsieve API
    • uploading files
  • O2sparc module
    • Upload files to the o²S²PARC platform
    • Create computational jobs
    • Submit jobs to computational services
    • Inspect the logs of computational services

Examples

Pennsieve module

Listing public datasets

Listing a dataset that match a specific word, e.g. the last name of the PI or a medical term could be performed the following way:

response=client.pennsieve.list_datasets(query='cancer', limit=2)
response

For further reference, please read SPARC Python Client tutorial

Listing records

Apart from listing the dataset, we can also zoom into the records of a given dataset for a specific model, for example explore researchers within Sparc project.

response=client.pennsieve.list_records(model='researcher', organization='SPARC')
response

For further reference, please read SPARC Python Client tutorial

Listing files

Similarly, we can query for files that are related with given name, or extension, e.g. that are included in a specific dataset.

response=client.pennsieve.list_files(dataset_id=90, query='manifest', file_type='json')
response

For further reference, please read SPARC Python Client tutorial

Downloading files

All we need to do is to list file(s) that are to be downloaded and pass it to the download_files function.

The function will either download the file with its original extension (if output_name is not specified) or pack the files and download them in gzip format to the specified directory.

response=client.pennsieve.list_files(dataset_id=90, query='manifest', file_type='json')  
client.pennsieve.download_file(file_list=response[1])

For further reference, please read SPARC Python Client tutorial

Using Pennsieve API

Sparc Client can interact with Pennsieve API and submit HTTP requests, e.g. GET or POST, for example:

#e.g. calling GET: https://docs.pennsieve.io/reference/browsefiles-1 for version 1 of dataset 90 with additional parameters
client.pennsieve.get('https://api.pennsieve.io/discover/datasets/90/versions/1/files/browse', params={'limit':2})

For reference, please visit Pennsieve API reference

Uploading files

An example on how to upload files to SPARC is available here: SPARC Python Client tutorial

Uploading requires downloading and configuring Pennsieve Agent. The instruction on configuring the agent could be found here: https://docs.pennsieve.io/docs/uploading-files-programmatically

Exporting Organ Scaffolds

An example on how to export organ scaffolds from the SPARC Portal to other common meshing formats is available here: SPARC Python Zinc Client tutorial

Export MBF Segmentation

An example on how to export MBF segmentations from the SPARC Portal to other common meshing formats is available here: SPARC Python Zinc Client tutorial

o²S²PARC module

Pre-requites:

  • An account on osparc.io: you will need this to generate API tokens to use the o²S²PARC module. If you don't have an account, request one at o²S²PARC Support, as explained here.
  • Have a configuration file (see here).

Installation

# import the necessary packages
import importlib, getpass
try:
  import sparc.client
except ImportError:
  ! pip install sparc.client
try:
  from tqdm import tqdm
except ImportError:
  ! pip install tqdm
from pprint import pprint
import os
from tqdm import tqdm
from sparc.client import SparcClient
from sparc.client.services.o2sparc import (
  O2SparcService,
  O2SparcSolver
)
from time import sleep
from pathlib import Path
from tempfile import TemporaryDirectory

Login to the o²S²PARC Python Client

Set-up credentials for o²S²PARC in osparc.io as explained here. Then use the "Key" as "Username" and "Secret" as "Password". Another option is to export them as environment variables.

os.environ["O2SPARC_HOST"] = getpass.getpass("osparc host, e.g. https://api.osparc.io:")
os.environ["O2SPARC_USERNAME"] = getpass.getpass("API Key:")
os.environ["O2SPARC_PASSWORD"] = getpass.getpass("API Secret:")
assert "O2SPARC_HOST" in os.environ, "O2SPARC_HOST must be exposed as an environment variable"
assert "O2SPARC_USERNAME" in os.environ, "O2SPARC_USERNAME must be exposed as an environment variable"
assert "O2SPARC_PASSWORD" in os.environ, "O2SPARC_PASSWORD must be exposed as an environment variable"

Access to the computational services provided by the o2sparc submodule, is provided via O2SparcService, an instance of which can be created as follows:

client = SparcClient(connect=False, config_file='config.ini')
o2sparc: O2SparcService = client.o2sparc

As a sanity check you can now contact the o²S²PARC server to check that you are logged in as the correct user:

print(o2sparc.get_profile())
# [email protected]

Run a computation

Now to the fun part :). In the following cell we specify a job (or a "computation") which we submit to a solver (or "computational service") on the o²S²PARC platform. More information on computational services here.

with TemporaryDirectory() as tmp_dir:
  input_file: Path = Path(tmp_dir) / "input_file.txt"
  input_file.write_text("3")

  job: dict = {
    "input_3": 0,
    "input_2": 3.0,
    "input_1": input_file
  }

  solver: O2SparcSolver = o2sparc.get_solver(solver_key="simcore/services/comp/itis/sleeper",solver_version="2.0.2")

  job_id = solver.submit_job(job)

  pbar = tqdm(total=1.0)
  progress: float = 0
  while not solver.job_done(job_id):
    sleep(1)
    if solver.get_job_progress(job_id) > progress:
      pbar.update(solver.get_job_progress(job_id) - progress)
      progress = solver.get_job_progress(job_id)

  print("job outputs:")
  for output_name, result in solver.get_results(job_id).items():
    if isinstance(result,Path):
      print(f"{output_name}: {Path(result).read_text()}")
    else:
      print(f"{output_name}: {result}")


  print("job log:")
  log_dir: TemporaryDirectory = solver.get_job_log(job_id)
  for elm in Path(log_dir.name).rglob("*"):
    if elm.is_file():
      print(elm.read_text())
      

Understanding the steps

Let's break this into pieces.
First we specify the job/input to our solver. A job for a O2SparcSolver is nothing but a dictionary whose entries are one of the following types: str, int, float or pathlib.Path. Only pathlib.Path objects can be passed which point to existing files, like our input_1 above.

  input_file: Path = Path(tmp_dir) / "input_file.txt"
  input_file.write_text("3")

  job: dict = {
    "input_3": 0,
    "input_2": 3.0,
    "input_1": input_file
  }

Second, we get the solver/computational resource we want to use by specifying its identifier and version. Once we have our solver we can submit the job to it.

  solver: O2SparcSolver = o2sparc.get_solver(solver_key="simcore/services/comp/itis/sleeper",solver_version="2.0.2")

  job_id = solver.submit_job(job)

Next, we need to wait for the job to be solved

 pbar = tqdm(total=1.0)
  progress: float = 0
  while not solver.job_done(job_id):
    sleep(1)
    if solver.get_job_progress(job_id) > progress:
      pbar.update(solver.get_job_progress(job_id) - progress)
      progress = solver.get_job_progress(job_id)

Once the solver is done performing its computation we can get the results it produces

  print("job outputs:")
  for output_name, result in solver.get_results(job_id).items():
    if isinstance(result,Path):
      print(f"{output_name}: {Path(result).read_text()}")
    else:
      print(f"{output_name}: {result}")

as well as the log it has produced

  print("job log:")
  log_dir: TemporaryDirectory = solver.get_job_log(job_id)
  for elm in Path(log_dir.name).rglob("*"):
    if elm.is_file():
      print(elm.read_text())

Getting the log can be particularly useful for determining issues encountered by the solver. E.g. if the specified job is invalid, so that the solver cannot use it, the log will typically indicate this.

More resources

  • More tutorials on the o²S²PARC Python Client are available here
  • Documentation on the NIH SPARC Client and its sub-modules can be found here.

References

In order to learn more about SPARC Python library, please refer the following websites: