You are here

Alignment and RMSD

6 posts / 0 new
Last post
Alignment and RMSD
#1

Hello,

I've been attempting measure some structures RMSDs by using the PyRosetta's functions all_atom_rmsd() and CA_rmsd(). I've also seen the function native_CA_rmsd(), and used it but I can't tell the diference between this one and CA_rmsd().

So far the functions I use are the ones I've been able to find by tabbing for iPython's autocomplete. I know that there is a native RMSD for all atom, but I've not been able to find it. Are these PyRosetta exposed methods listed somewhere?

Finally, is there a way to measure the RMSD of two structures of different length? I've tried it, but all_atom_rmsd() gives an unidentified C++ exeption.

Thanks in advance.

Post Situation: 
Wed, 2013-11-27 12:43
ñ

Most of the rmsd and alignment functions live in the core.scoring module. (There may be a select few that are exposed on a higher level, or some specialized ones which live elsewhere.)

For PyRosetta, there's no real difference between native_CA_rmsd() and CA_rmsd() - the former actually calls the latter in most cases. The only difference is that native_CA_rmsd() obeys the -in::file::native_exclude_res commandline option. So a native all atom rmsd is going to be equivalent to a regular all atom rmsd with the native structure as comparison.

For controlling which residues to compare, there are several varieties of CA_rmsd, depending on which ones are availible through PyRosetta (I'm not 100% sure all of them are). The most flexible is perhaps one where CA_rmsd() takes a third option which is a map of integers mapping the (pose numbered) residues of the first protein to the corresponding residue in the second protein - or just a vector1 of integers, where it's assumed that the numbers correspond. Alternatively, CA_rmsd() can also take two (pose numbered) integers indicating the start and end residues for the CA_rmsd as parameters 3 and 4, or a 5 parameter version which does start, end, and a vector1 of residues to exclude.

Other rmsd options are bb_rmsd() and bb_rmsd_including_O(), which do the obvious (but only have the two and five parameter version), the previously mentioned all_atom_rmsd() (which has only the two and three-parameter-with-list options), and all_atom_rmsd_nosuper() & all_scatom_rmsd_nosuper() (two parameter only) which do all atom and all sidechain atom rmsds, but without doing the best fit alignment first. There's other more flexible versions available at the C++ level, but they're not likely to be exposed at the Python level, due to technical limitations.

If your proteins are not the same length (and assuming that they're different on the N-terminus), only the three parameter mapping version of CA_rmsd() is likely going to work well. The way around it for all atom rmsd, etc. is to create a temporary shorter pose which only contains the residues of interest. You can either construct such a pose residue-by-residue yourself, or you could use the create_subpose(source,positions,foldtree,destination) utility function in the core.pose module. It's a little tricky, in that in addition to the vector1 of positions to take and an empty pose as the destination, you also have to provide it with a foldtree for the new pose. If all you're going to do with the new pose is use it for RMSD, you can just use the core.kinematics.Foldtree( N ) constructor to get a simple one, where n is the number of residues in the new pose (the length of your positions vector).

Fri, 2013-11-29 09:50
rmoretti

Thank you, your response was very helpful. However, I have come up with a related issue regarding gdtha() and gdtsc() functions. If I understand correctly, this functions require three arguments, two being Pose objects and the third being an std::map, yet I've not been able to figure out wich rosetta.utility.map_* functions to use. Maybe I'm mistaken and this is not what the gdt functions require, I've had a hard time trying to understand the C++ in the help messages.

How to use gdtsc() and gdtha()?

Mon, 2013-12-02 07:09
ñ

std::map and pythons dictionary are analogous. Real is Rosetta's double, Size is analogous to unsigned int. I've never used those functions, but if you paste the help message I can help construct the correct std::map.

Mon, 2013-12-02 13:11
jadolfbr

This is the C++ signature for gdtha(), gdtsc() has the same signature in it's help message:

C++ signature :
double gdtha(core::pose::Pose,core::pose::Pose,std::map, std::allocator > >)

Wed, 2013-12-04 06:05
ñ

Ick, that's completely garbled.

Looking at the C++ level code, it looks like gdtha() & gdtsc() are looking for a std::map as their third argument. This should be rosetta.utility.map_Size_Size object at the Python level.

Wed, 2013-12-04 08:10
rmoretti