You are here

Less silly way to get number of iterations of Relax

3 posts / 0 new
Last post
Less silly way to get number of iterations of Relax
#1

Currently to get the number of iterations a mover, such as FastRelax does, I enable the dump_trajectory scoretype and count the models in the PDBs (code below).

This seems very very weird way of doing this, so I was wondering if there is a less silly way of doing this?

Also, is there a direct way to open a multimodel file say to get the scores of each? (Rather than extracting the models, writing to file, reading them into PyRosetta and scoring them).

# ============ Methods ======================================
DumpTrajectoryEnergy = pyrosetta.rosetta.core.energy_methods.DumpTrajectoryEnergy
#formerly: DumpTrajectoryEnergy = pyrosetta.rosetta.core.scoring.util_methods.DumpTrajectoryEnergy
def enable_trajectory(scorefxn: pyrosetta.ScoreFunction,
                      prefix:str='pose',
                      stride:int=100,
                      gz:bool=False) -> DumpTrajectoryEnergy:
    dt_st = pyrosetta.rosetta.core.scoring.ScoreType.dump_trajectory
    scorefxn.set_weight(dt_st, 1)
    # ----- make the traj dump less intense -----
    # settable by cmd line options... maybe better
    old_traj_energy = [k for k in scorefxn.all_methods() if isinstance(k, DumpTrajectoryEnergy)][0]
    scorefxn.all_methods().remove(old_traj_energy)
    emo = pyrosetta.rosetta.core.scoring.methods.EnergyMethodOptions()
    if gz:
        # emo.dump_trajectory_gz(True)
        raise RuntimeError('this just outputs a text file with gz suffix... (?!)')
    emo.dump_trajectory_prefix(prefix)
    emo.dump_trajectory_stride(stride)
    new_traj_energy = DumpTrajectoryEnergy(emo)
    scorefxn.all_methods().append(new_traj_energy)
    return new_traj_energy

import os, gzip, re, pathlib, datetime
import pandas as pd

def get_traj_interations(prefix:str) -> pd.DataFrame:
    raw_trajs = []
    for file in os.listdir():
        if re.match(prefix, file):
            with  open(file,'r') as fh:
                file_content=fh.read()
            modstamp = pathlib.Path(file).stat().st_mtime
            modtime = datetime.datetime.fromtimestamp(modstamp)
            models = re.findall('MODEL', file_content)
            raw_trajs.append(dict(filename=file,
                               time=modtime,
                               n_models=len(models))
                            )
    trajs = pd.DataFrame(raw_trajs).sort_values('time')
    cat_df = uncontrajs.filename.str.extract(r'_([^_]+)\.pdb')
    trajs['short_name'] =  cat_df[0] + '_' + cat_df.groupby([0]).cumcount().add(1).astype(str)
    return trajs

# ============ Run ======================================
experiment_name = 'test123'
scorefxn = pyrosetta.get_fa_scorefxn()
new_traj_energy = enable_trajectory(scorefxn=scorefxn,
                      prefix=experiment_name,
                      stride=10_000)
relax = pyrosetta.rosetta.protocols.relax.FastRelax(scorefxn, 3)
relax.apply(pose)
trajs = get_traj_interations(experiment_name)
trajs.head()
  filename time n_models short_name
43 test_unconstrained__20210720141946221_score.pdb 2021-07-20 14:19:46.429851 4 score_1
27 test_unconstrained__20210720141946966_pack.pdb 2021-07-20 14:19:48.854521 28 pack_1
67 test_unconstrained__20210720141949017_score.pdb 2021-07-20 14:19:49.172959 4 score_2
30 test_unconstrained__20210720141949412_min.pdb 2021-07-20 14:19:52.569970 8 min_1
95 test_unconstrained__20210720141952687_score.pdb 2021-07-20 14:19:52.815413 4 score_3
Category: 
Post Situation: 
Tue, 2021-07-20 07:00
matteoferla

I'd back up for a minute and ask what you're looking to use this information for. Usually you just count the number of apply()'s of whatever top-level mover you're using as the number of iterations, and treat what things happen internally as something of a black box (modulo whatever settings are exposed by the configuration of the top-level mover). If you have particular reasons to dig deeper into the internals, you have to carefully define the bounds of what you're interested in monitoring, because depending on what you're doing, you might count as "changes" things which aren't actually the steps/iterations you're interested in, or you may miss some of the desired steps which are done "externally" or separately.

Regarding getting scores from a multi-model file, if you can't see scoring information in text form in the file, Rosetta probably hasn't dumped it out. (You can dump out structure information without dumping scoring information, depending on how you do it.) If that's the case, you'd need to rescore the structure to get the score.

Tue, 2021-07-20 08:25
rmoretti

Well, in the case I was interested today I wanted to see how many repacking interations a FastRelax run was doing with and without a particular set of constraints as there was a large running time difference and I was concerned that it was that affecting the run —whereas the distribution of repacking events did change across the gradient of the cycle, there was nothing to worry about and the time difference was probably just the overhead from all the PyRosetta-defined constraints. But in general, I would argue that testing something to make sure it is behaving as hoped is better than just assuming it is.

But actually in FastRelax, one can set the max_inter getter/setter of FastRelax but the apply method does not say how many were done nor, from what I can tell, is there is no counter in pose.energies() or similar or in the log... but I am guessing there maybe one given than one can set the max_iter.

About the multi-model file, I mean simply reading it into memory. The models in a trajectory PDB are not scored, so yes, I know I need to rescore them. If one reads a multimodel PDB file, for example the PDB:1L2Y (tryptophan minicage NMR) it loads the models as separate chains into a single pose, which needs to be split up. I.e. it does not load it as a vector1_core_pose_Pose of poses or similar.

original = pyrosetta.toolbox.rcsb.pose_from_rcsb('1L2Y')
poses = original.split_by_chain()

It seems rather dangerous and potentially complicated —multichain multimodel etc. Hence why I was wondering if there was a I/O specifically for multimodel PDBs.

 

 

Tue, 2021-07-20 13:44
matteoferla