Hi,
We are trying to get Rosetta setup fo a user here at UT Southwestern, and have downloaded and built the 2015.39 release using the Intel Composer XE 2015 compiler suite, with MVAPICH2 2.1 as the MPI stack. This is the standard compiler/mpi stack we use for a large amount of other software on the cluster. Compilation of Rosetta followed the site settings file distributed for the TACC stampede cluster.
When the user runs a simple test of relax.linuxiccrelease:
${ROSETTA_BIN_DIR}/relax.linuxiccrelease -database ${ROSETTA_DATABASE_DIR} -in:file:s 1BE9_clean.pdb -in:file:fullatom -out:prefix relax_
We are seeing Inaccurate G! step errors, such as:
core.optimization.LineMinimizer: (0) Inaccurate G! step= 3.8147e-08 Deriv= -239.863 Finite Diff= 4.62715e+07
... and the output is not as expected. These errors don't occur using binaries included in the download, or a build here using gcc instead of the Intel compilers.
Am wondering if anyone else has seen such numerical issues after using the Intel compiler, or if there are any pointers to investigate this further?
Many Thanks,
DT
Inaccurate G! is either nonconcerning or nondiagnostic (your pick). In broad strokes, it means the minimizer is behaving badly. Sometimes this is a false alarm (depending on minimizer settings), sometimes it's because you've hit a patch of a particular scorefunction's range where it behaves mathematically badly. Usually you can ignore them - it means inefficiency not error in most cases.
You also say "the output is not as expected". What does that mean?
Thanks for the info. Honestly I don't have an exact idea of what 'not as expected means'. I'll have to ask the user concerned to tak part on this forum. Their original query to our HPC support team below.
The concern is that the Innacurate G! issue only occurs on the Intel compiled version - and not on a gcc compiled version. Possible there could be slight numerical differences from using Intel MKL or similar?
I'll try to get more detail / ask the user to post here directly.
Many Thanks!
From the user...
"Output is not as expected" means that the total_score at the end of relax run are not the same. Perhaps the difference is not significant. When I run it with gcc compiled version, relax gives a final total_score below -200. While running relax with intel compiled version gave me total_scores between -190 and -167. I could do several runs to generate statistics on the scores if that would help.
The score files look like this:
==> gcc_TestRun_Relax.sc <==
SEQUENCE:
SCORE: total_score dslf_fa13 fa_atr fa_dun fa_elec fa_intra_rep fa_rep fa_sol hbond_bb_sc hbond_lr_bb hbond_sc hbond_sr_bb omega p_aa_pp pro_close rama ref description
SCORE: -205.193 0.000 -413.155 104.034 -55.167 0.940 40.477 236.847 -17.384 -35.506 -11.994 -20.385 6.055 -25.877 0.234 -10.497 -3.814 relax_1BE9_clean_0001
==> intel_TestRun_Relax.sc <==
SEQUENCE:
SCORE: total_score dslf_fa13 fa_atr fa_dun fa_elec fa_intra_rep fa_rep fa_sol hbond_bb_sc hbond_lr_bb hbond_sc hbond_sr_bb omega p_aa_pp pro_close rama ref description
SCORE: -167.911 0.000 -413.977 79.443 -47.998 0.960 77.783 237.462 -11.119 -32.917 -8.990 -19.018 6.097 -22.813 0.512 -9.521 -3.814 relax_1BE9_clean_0001
A score difference of 40 units is surprisingly large, but within the boundaries of what the random sampling of Monte Carlo will do. I would 100% not expect you to get identical scores from this test (even with the same RNG seeds, I'd be maybe 75/25 on a different score due to compiler and processor differences). Run maybe 100 models (-nstruct 100) and see what the averages look like.
Many thanks for the input. Have collected averages on 100 models for both our compiled intel/mvapich2 version and the gcc version in the download from this site.
The intel compiled version gives lower, and more variable 'total_score' values. Would appreciate any info to pass to user r.e. whether this is as expected / within reason?
Many Thanks,