Rosetta  2021.16
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Pages
Classes | Variables
pyrosetta.distributed.cluster.core Namespace Reference

Classes

class  PyRosettaCluster
 

Variables

string __author__ = "Jason C. Klima"
 
string __email__ = "klima.jason@gmail.com"
 
tuple G = TypeVar("G")
 

Detailed Description

PyRosettaCluster is a class for reproducible, high-throughput job distribution
of user-defined PyRosetta protocols efficiently parallelized on the user's
local computer, high-performance computing (HPC) cluster, or elastic cloud
computing infrastructure with available compute resources.

Args:
    tasks: A `list` of `dict` objects, a callable or called function returning
        a `list` of `dict` objects, or a callable or called generator yielding
        a `list` of `dict` objects. Each dictionary object element of the list
        is accessible via kwargs in the user-defined PyRosetta protocols.
        In order to initialize PyRosetta with user-defined PyRosetta command line
        options at the start of each user-defined PyRosetta protocol, either
        `extra_options` and/or `options` must be a key of each dictionary object,
        where the value is a `str`, `tuple`, `list`, `set`, or `dict` of
        PyRosetta command line options.
        Default: [{}]
    input_packed_pose: Optional input `PackedPose` object that is accessible via
        the first argument of the first user-defined PyRosetta protocol.
        Default: None
    seeds: A `list` of `int` objects specifying the random number generator seeds
        to use for each user-defined PyRosetta protocol. The number of seeds
        provided must be equal to the number of user-defined input PyRosetta
        protocols. Seeds are used in the same order that the user-defined PyRosetta
        protocols are executed.
        Default: None
    decoy_ids: A `list` of `int` objects specifying the decoy numbers to keep after
        executing user-defined PyRosetta protocols. User-provided PyRosetta
        protocols may return a list of `Pose` and/or `PackedPose` objects, or
        yield multiple `Pose` and/or `PackedPose` objects. To reproduce a
        particular decoy generated via the chain of user-provided PyRosetta
        protocols, the decoy number to keep for each protocol may be specified,
        where other decoys are discarded. Decoy numbers use zero-based indexing,
        so `0` is the first decoy generated from a particular PyRosetta protocol.
        The number of decoy_ids provided must be equal to the number of
        user-defined input PyRosetta protocols, so that one decoy is saved for each
        user-defined PyRosetta protocol. Decoy ids are applied in the same order
        that the user-defined PyRosetta protocols are executed.
        Default: None
    client: An initialized dask `distributed.client.Client` object to be used as
        the dask client interface to the local or remote compute cluster. If `None`,
        then PyRosettaCluster initializes its own dask client based on the
        `PyRosettaCluster(scheduler=...)` class attribute.
        Default: None
    scheduler: A `str` of either "sge" or "slurm", or `None`. If "sge", then
        PyRosettaCluster schedules jobs using `SGECluster` with `dask-jobqueue`.
        If "slurm", then PyRosettaCluster schedules jobs using `SLURMCluster` with
        `dask-jobqueue`. If `None`, then PyRosettaCluster schedules jobs using
        `LocalCluster` with `dask.distributed`. If `PyRosettaCluster(client=...)`
        is provided, then `PyRosettaCluster(scheduler=...)` is ignored.
        Default: None
    cores: An `int` object specifying the total number of cores per job, which
        is input to the `dask_jobqueue.SLURMCluster(cores=...)` argument.
        Default: 1
    processes: An `int` object specifying the total number of processes per job,
        which is input to the `dask_jobqueue.SLURMCluster(processes=...)` argument.
        This cuts the job up into this many processes.
        Default: 1
    memory: A `str` object specifying the total amount of memory per job, which
        is input to the `dask_jobqueue.SLURMCluster(memory=...)` argument.
        Default: "4g"
    scratch_dir: A `str` object specifying the path to a scratch directory where
        dask litter may go.
        Default: "/temp" if it exists, otherwise the current working directory
    min_workers: An `int` object specifying the minimum number of workers to
        which to adapt during parallelization of user-provided PyRosetta protocols.
        Default: 1
    max_workers: An `int` object specifying the maximum number of workers to
        which to adapt during parallelization of user-provided PyRosetta protocols.
        Default: 1000 if the initial number of `tasks` is <1000, else use the
            the initial number of `tasks`
    dashboard_address: A `str` object specifying the port over which the dask
        dashboard is forwarded. Particularly useful for diagnosing PyRosettaCluster
        performance in real-time.
        Default=":8787"
    nstruct: An `int` object specifying the number of repeats of the first
        user-provided PyRosetta protocol. The user can control the number of
        repeats of subsequent user-provided PyRosetta protocols via returning
        multiple clones of the output pose(s) from a user-provided PyRosetta
        protocol run earlier, or cloning the input pose(s) multiple times in a
        user-provided PyRosetta protocol run later.
        Default: 1
    compressed: A `bool` object specifying whether or not to compress the output
        .pdb files with bzip2, resulting in .pdb.bz2 files.
        Default: True
    system_info: A `dict` or `NoneType` object specifying the system information
        required to reproduce the simulation. If `None` is provided, then PyRosettaCluster
        automatically detects the platform and returns this attribute as a dictionary
        {'sys.platform': `sys.platform`} (for example, {'sys.platform': 'linux'}).
        If a `dict` is provided, then validate that the 'sys.platform' key has a value
        equal to the current `sys.platform`, and log a warning message if not.
        Additional system information such as Amazon Machine Image (AMI) identifier
        and compute fleet instance type identifier may be stored in this dictionary,
        but is not validated. This information is stored in the simulation records for
        accounting.
        Default: None
    pyrosetta_build: A `str` or `NoneType` object specifying the PyRosetta build as
        output by `pyrosetta._version_string()`. If `None` is provided, then PyRosettaCluster
        automatically detects the PyRosetta build and sets this attribute as the `str`.
        If a `str` is provided, then validate that the input PyRosetta build is equal
        to the active PyRosetta build, and log a warning message if not.
        Default: None
    sha1: A `str` or `NoneType` object specifying the git SHA1 hash string of the
        particular git commit being simulated. If a non-empty `str` object is provided,
        then it is validated to match the SHA1 hash string of the current HEAD,
        and then it is added to the simulation record for accounting. If an empty string
        is provided, then ensure that everything in the working directory is committed
        to the repository. If `None` is provided, then bypass SHA1 hash string
        validation and set this attribute to an empty string.
        Default: ""
    project_name: A `str` object specifying the project name of this simulation.
        This option just adds the user-provided project_name to the scorefile
        for accounting.
        Default: datetime.now().strftime("%Y.%m.%d.%H.%M.%S.%f") if not specified,
            else "PyRosettaCluster" if None
    simulation_name: A `str` object specifying the name of this simulation.
        This option just adds the user-provided simulation_name to the scorefile
        for accounting.
        Default: `project_name` if not specified, else "PyRosettaCluster" if None
    environment: A `NoneType` or `str` object specifying the active conda environment
        YML file string. If a `NoneType` object is provided, then generate a YML file
        string for the active conda environment and save it to the full simulation
        record. If a `str` object is provided, then validate it against the active
        conda environment YML file string and save it to the full simulation record.
        Default: None
    output_path: A `str` object specifying the full path of the output directory
        (to be created if it doesn't exist) where the output results will be saved
        to disk.
        Default: "./outputs"
    scorefile_name: A `str` object specifying the name of the output JSON-formatted
        scorefile. The scorefile location is always `output_path`/`scorefile_name`.
        Default: "scores.json"
    simulation_records_in_scorefile: A `bool` object specifying whether or not to
        write full simulation records to the scorefile. If `True`, then write
        full simulations records to the scorefile. This results in some redundant
        information on each line, allowing downstream reproduction of a decoy from
        the scorefile, but a larger scorefile. If `False`, then write
        curtailed simulations records to the scorefile. This results in minimally
        redundant information on each line, disallowing downstream reproduction
        of a decoy from the scorefile, but a smaller scorefile. If `False`, also
        write the active conda environment to a YML file in 'output_path'. Full
        simulation records are always written to the output '.pdb' or '.pdb.bz2'
        file(s), which can be used to reproduce any decoy without the scorefile.
        Default: False
    decoy_dir_name: A `str` object specifying the directory name where the
        output decoys will be saved. The directory location is always
        `output_path`/`decoy_dir_name`.
        Default: "decoys"
    logs_dir_name: A `str` object specifying the directory name where the
        output log files will be saved. The directory location is always
        `output_path`/`logs_dir_name`.
        Default: "logs"
    logging_level: A `str` object specifying the logging level of python tracer
        output to write to the log file of either "NOTSET", "DEBUG", "INFO",
        "WARNING", "ERROR", or "CRITICAL". The output log file is always written
        to `output_path`/`logs_dir_name`/`simulation_name`.log on disk.
        Default: "INFO"
    ignore_errors: A `bool` object specifying for PyRosettaCluster to ignore errors
        raised in the user-provided PyRosetta protocols. This comes in handy when
        well-defined errors are sparse and sporadic (such as rare Segmentation Faults),
        and the user would like PyRosettaCluster to run without raising the errors.
        Default: False
    timeout: A `float` or `int` object specifying how many seconds to wait between
        PyRosettaCluster checking-in on the running user-provided PyRosetta protocols.
        If each user-provided PyRosetta protocol is expected to run quickly, then
        0.1 seconds seems reasonable. If each user-provided PyRosetta protocol is
        expected to run slowly, then >1 second seems reasonable.
        Default: 0.5
    save_all: A `bool` object specifying whether or not to save all of the returned
        or yielded `Pose` and `PackedPose` objects from all user-provided
        PyRosetta protocols. This option may be used for checkpointing trajectories.
        To save arbitrary poses to disk, from within any user-provided PyRosetta
        protocol:
            `pose.dump_pdb(os.path.join(kwargs["output_path"], "checkpoint.pdb")`
        Default: False
    dry_run: A `bool` object specifying whether or not to save .pdb files to
        disk. If `True`, then do not write .pdb or .pdb.bz2 files to disk.
        Default: False

Returns:
    A PyRosettaCluster instance.

Variable Documentation

string pyrosetta.distributed.cluster.core.__author__ = "Jason C. Klima"
string pyrosetta.distributed.cluster.core.__email__ = "klima.jason@gmail.com"
tuple pyrosetta.distributed.cluster.core.G = TypeVar("G")