You are here

Intel Compiler - Inaccurate G! step errors

6 posts / 0 new
Last post
Intel Compiler - Inaccurate G! step errors
#1

Hi,

We are trying to get Rosetta setup fo a user here at UT Southwestern, and have downloaded and built the 2015.39 release using the Intel Composer XE 2015 compiler suite, with MVAPICH2 2.1 as the MPI stack. This is the standard compiler/mpi stack we use for a large amount of other software on the cluster. Compilation of Rosetta followed the site settings file distributed for the TACC stampede cluster.

When the user runs a simple test of relax.linuxiccrelease:

${ROSETTA_BIN_DIR}/relax.linuxiccrelease -database ${ROSETTA_DATABASE_DIR} -in:file:s 1BE9_clean.pdb -in:file:fullatom -out:prefix relax_

We are seeing Inaccurate G! step errors, such as:

core.optimization.LineMinimizer: (0) Inaccurate G! step= 3.8147e-08 Deriv= -239.863 Finite Diff= 4.62715e+07

... and the output is not as expected. These errors don't occur using binaries included in the download, or a build here using gcc instead of the Intel compilers.

Am wondering if anyone else has seen such numerical issues after using the Intel compiler, or if there are any pointers to investigate this further?

Many Thanks,

DT

 

Category: 
Post Situation: 
Tue, 2016-01-19 07:33
dtrudg

Inaccurate G! is either nonconcerning or nondiagnostic (your pick).  In broad strokes, it means the minimizer is behaving badly.  Sometimes this is a false alarm (depending on minimizer settings), sometimes it's because you've hit a patch of a particular scorefunction's range where it behaves mathematically badly.  Usually you can ignore them - it means inefficiency not error in most cases.

You also say "the output is not as expected".  What does that mean?

Tue, 2016-01-19 07:45
smlewis

Thanks for the info. Honestly I don't have an exact idea of what 'not as expected means'. I'll have to ask the user concerned to tak part on this forum. Their original query to our HPC support team below.

The concern is that the Innacurate G! issue only occurs on the Intel compiled version - and not on a gcc compiled version. Possible there could be slight numerical differences from using Intel MKL or similar?

I'll try to get more detail / ask the user to post here directly.

Many Thanks!

I get some strange error during the run:

core.optimization.LineMinimizer: (0) Inaccurate G! step= 3.8147e-08 Deriv= -239.863 Finite Diff= 4.62715e+07

Otherwise the run finishes normally, but the scores of the optimization process are not accurate in the end. The collaborator does not manage to replicate this on his cluster with the same scripts that I am using. Could it be that something did not compile right? Do you perhaps have a log file that they could look at?

The collaborators are using the QB3 cluster (at UCSF) and they don’t use the MPI version. They have:

 

 

 

Tue, 2016-01-19 14:57
dtrudg

From the user...

 

"Output is not as expected" means that the total_score at the end of relax run are not the same. Perhaps the difference is not significant. When I run it with gcc compiled version, relax gives a final total_score below -200. While running relax with intel compiled version gave me total_scores between -190 and -167. I could do several runs to generate statistics on the scores if that would help. 

 

The score files look like this: 

==> gcc_TestRun_Relax.sc <==

SEQUENCE:

SCORE: total_score dslf_fa13    fa_atr    fa_dun   fa_elec fa_intra_rep       fa_rep       fa_sol hbond_bb_sc hbond_lr_bb    hbond_sc hbond_sr_bb       omega     p_aa_pp pro_close      rama       ref description

SCORE:    -205.193     0.000  -413.155   104.034   -55.167        0.940       40.477      236.847     -17.384     -35.506     -11.994     -20.385       6.055     -25.877     0.234   -10.497    -3.814 relax_1BE9_clean_0001

 

==> intel_TestRun_Relax.sc <==

SEQUENCE:

SCORE: total_score dslf_fa13    fa_atr    fa_dun   fa_elec fa_intra_rep       fa_rep       fa_sol hbond_bb_sc hbond_lr_bb    hbond_sc hbond_sr_bb       omega     p_aa_pp pro_close      rama       ref description

SCORE:    -167.911     0.000  -413.977    79.443   -47.998        0.960       77.783      237.462     -11.119     -32.917      -8.990     -19.018       6.097     -22.813     0.512    -9.521    -3.814 relax_1BE9_clean_0001

 

Wed, 2016-01-20 07:29
dtrudg

A score difference of 40 units is surprisingly large, but within the boundaries of what the random sampling of Monte Carlo will do.  I would 100% not expect you to get identical scores from this test (even with the same RNG seeds, I'd be maybe 75/25 on a different score due to compiler and processor differences).  Run maybe 100 models (-nstruct 100) and see what the averages look like.

Thu, 2016-01-21 08:06
smlewis

Many thanks for the input. Have collected averages on 100 models for both our compiled intel/mvapich2 version and the gcc version in the download from this site.

The intel compiled version gives lower, and more variable 'total_score' values.  Would appreciate any info to pass to user r.e. whether this is as expected / within reason?

Many Thanks,

  INTEL     GCC     DIFFERENCE
  MEAN STDEV   MEAN STDEV   MEAN
total_score -187.961 11.255   -200.810 3.913   -12.849
dslf_fa13 0.000 0.000   0.000 0.000   0.000
fa_atr -404.379 7.544   -407.013 4.233   -2.634
fa_dun 79.907 1.738   103.700 1.742   23.793
fa_elec -49.324 2.153   -55.068 2.176   -5.744
fa_intra_rep 0.981 0.020   0.959 0.017   -0.022
fa_rep 53.182 13.008   40.024 0.999   -13.158
fa_sol 233.381 4.160   232.609 3.195   -0.771
hbond_bb_sc -11.173 1.367   -14.666 1.547   -3.493
hbond_lr_bb -32.314 0.898   -34.760 0.796   -2.446
hbond_sc -10.442 1.097   -12.090 1.257   -1.648
hbond_sr_bb -19.517 0.793   -20.403 0.427   -0.887
omega 7.391 1.565   5.558 0.452   -1.833
p_aa_pp -22.739 0.687   -25.176 0.356   -2.437
pro_close 0.303 0.248   0.171 0.031   -0.132
rama -9.403 0.690   -10.839 0.569   -1.436
ref -3.814     -3.814     0.000
Thu, 2016-01-21 13:07
dtrudg