Page added 22 October 2018 by Julia Koehler Leman.
Scientific Benchmarks are tests that compare Rosetta generated structure predictions with experimental observations. Assessing the accuracy of Rosetta predictions will
Scientific benchmarks are meant to measure the physical realism of the energy function and how well a protocol is at sampling physically realistic structures.
Several tests are located in the
Rosetta/main/tests directory. The directory structure is the following:
Rosetta/main/tests/integrationcontains integration tests
Rosetta/main/tests/benchmarkcontains files required for the benchmark test server, i.e. the framework which runs the scientific tests, you might have to look them over when debugging
Rosetta/main/tests/scientificcontains the scientific tests
Rosetta/main/tests/scientific/testscontains the implementations of the tests, one directory per test
Rosetta/main/tests/scientific/datasubmodule that contains the input data if >5 MB per test
Rosetta/main/tests/scientific/tests/_template_template directory with all necessary input files
git submodule update --init --recursiveto get the submodule containing the input data. You will now see a
cd ..) and commit your changes again
git statusyou will notice when git complains about uncommitted files]
cp -r Rosetta/main/tests/scientific/tests/_template_ <my_awesome_test>
cd <my_awesome_test>: tests are set up by individual steps numbered sequentially so that they can be run individually without having to rerun the entire pipeline. Don’t run anything yet (we’ll get to that further below).
==> EDIT HEREtags.
0.compile.py: this script is for compilation, likely you won’t need to edit this file
1.submit.py: this contains the command line and the target proteins you are running. The debug option does not refer to the release vs debug run in Rosetta but rather to the debug version of running the scientific test for faster setup.
2.analyze.py: this script analyzes the results from the score files or whatever files you are interested in. It reads the cutoffs file and compares this run’s results against them. It will write the
result.txtfile for easy reading. A few basic functions for data analysis are at the bottom of this script. If you need your specialized function, please add it there.
3.plot.py: this script will plot the results via matplotlib into
plot_results.pngwith subplots for each protein. It draws vertical and horizontal lines for the cutoffs.
9.finalize.py: this script gathers all the information from the readme, the results plot and the results and creates a html page
index.htmlthat displays everything
cutoffsfile, header line starts with
#and has one protein per row, different measures in columns
readme.mdin as much detail as you can. Keep in mind that after you have left your lab, someone else in the community should be able to understand, run and maintain your test! This is automatically linked to the Gollum wiki.
runfunction at the bottom of
Rosetta/main/tests/benchmark/tests/scientific/command.pyto hook it into the test server framework
jsonformat, but plots are highly encouraged as digesting data visually is faster and easier for us for debugging.
jsonformat because it’s easy to read the data directly into python dictionaries.
python3. Every python script in the scientific tests will need the
python3prefix to run them properly!
Setup a run on Multiple Cores
benchmark.linux.ini(or whatever your architecture is). Adjust the settings in this file (i.e.
memory) as appropriate for your environment. If
hpc_driver = MultiCore, this will submit jobs up to
cpu_countwithout using an HPC job distributor.
Setup the run
python3 benchmark.py --compiler <clang or else> --skip \
--skipflag is to skip compilation, only recommended if you have an up-to-date version of master compiled in release mode (Sergey advises against skipping)
--debugflag is to run in debug mode which is highly recommended for debugging (i.e. you create 2 decoys instead of 1000s)
Rosetta/main/tests/benchmark/results/<os>.scientific.<my_awesome_test>where it creates softlinks to the files in
Rosetta/main/tests/scientific/tests/<my_awesome_test>and then it will likely crash in one way or another
cd Rosetta/main/tests/benchmark/results/<os>.scientific.<my_awesome_test>and debug each script individually, starting from the lowest number, by running for instance
config.jsonwhich contains the configuration settings
outputdirectory is created that contains the subdirectories for each protein
hpc-logsdirectory is created that contains the Rosetta run logs. You might have to check them out to debug your run if it crashed in the Rosetta run step.
.jsonfile that contains the variables you want to carry over into the next step
9.finalize.output.jsoncontains all the variables and results saved
plot_results.pngwith the results
index.htmlwith the gathered results, failures and details you have written up in the readme. While all the files are accessible on the test server later, this file is the results summary that people will look at
output.results.jsonwill tell you whether the tests passed or failed
Once you are finished debugging locally, commit all of your changes to your branch
nstructfor debugging your run on the test server. If you do that, don’t forget to increase it later once the tests run successfully
Once the tests run as you want, merge your branch into
scientificbranch is an extra branch that grabs the latest master version every few weeks to run all scientific tests on. DO NOT MERGE YOUR BRANCH INTO THE SCIENTIFIC BRANCH!!!
Celebrate! Congrats, you have added a new scientific test and contributed to Rosetta’s greatness. :D
Frequently, a scientific test will aim to evaluate the quality of a folding funnel (a plot of Rosetta energy vs. RMSD to a native or designed structure). Many of the simpler ways of doing this suffer from the effects of stochastic changes to the sampling: the motion of a single sample can drastically alter the goodness-of-funnel metric. For example, one common approach is to divide the funnel into a "native" region (with an RMSD below some threshold value) and a "non-native" region (with an RMSD above the threshold), and to ask whether there is a large difference between the lowest energy in the "native" region and the lowest in the "non-native" region. A single low-energy point that drifts across the threshold from the "native" region to the "non-native" region can convert a high-quality funnel into a low one, by this metric.
To this end, the PNear metric was developed. PNear is an estimate of the Boltzmann-weighted probability of finding a system in or near its native state, with "native-ness" being defined fuzzily rather than with a hard cutoff. The expression for PNear is:
Intuitively, the denominator is the partition function, while the numerator is the sum of the Boltzmann probability of individual samples multiplied by a weighting factor for the "native-ness" of each sample that falls off as a Gaussian with RMSD. The expression takes two parameters: lambda (λ), which determines the breadth of the Gaussian for "native-ness" (with higher values allowing a more permissive notion what is close to native), and kB*T, which determines how high energies translate into probabilities (with higher values allowing states with small energy gaps to be considered to be closer in probability). Recommended values are lambda = 2 to 4, kB*T = 1.0 (for ref2015) or 0.63 (for talaris2013).
For more information, see the Methods (online) of Bhardwaj, Mulligan, Bahl et al. (2016). Nature 538(7625):329-35.
Update: As of 10 October 2019, a Python script is available in the
tools repository (in
tools/analysis) to compute PNear. Instructions for its use are in the comments at the start of the script. In addition, the function
Rosetta/main/tests/benchmark/util/quality_measures.py, can be used to compute PNear.
Please use this template to describe your scientific test in the
readme.md as described above. Also check out the
fast_relax test for ideas of what we are looking for.
## AUTHOR AND DATE #### Who set up the benchmark? Please add name, email, PI, month and year
## PURPOSE OF THE TEST #### What does the benchmark test and why?
## BENCHMARK DATASET #### How many proteins are in the set? #### What dataset are you using? Is it published? If yes, please add a citation. #### What are the input files? How were the they created?
## PROTOCOL #### State and briefly describe the protocol. #### Is there a publication that describes the protocol? #### How many CPU hours does this benchmark take approximately?
## PERFORMANCE METRICS #### What are the performance metrics used and why were they chosen? #### How do you define a pass/fail for this test? #### How were any cutoffs defined?
## KEY RESULTS #### What is the baseline to compare things to - experimental data or a previous Rosetta protocol? #### Describe outliers in the dataset.
## DEFINITIONS AND COMMENTS #### State anything you think is important for someone else to replicate your results.
## LIMITATIONS #### What are the limitations of the benchmark? Consider dataset, quality measures, protocol etc. #### How could the benchmark be improved? #### What goals should be hit to make this a "good" benchmark?