You are here

gcc 4.4 (4.4.2) compile error with mode=release related to -finline-limit=20000 compile option [SOLVED]

11 posts / 0 new
Last post
gcc 4.4 (4.4.2) compile error with mode=release related to -finline-limit=20000 compile option [SOLVED]
#1

There are several reports of build failures with gcc 4.4 in these forums. I also had compile errors with gcc 4.4, but I have found a workaround (although I have yet to evaluate the impact of this workaround on performance)

Here is my situation:
- working on a cluster where I have little control on available compilers
- readily available compiler is gcc 4.4.2
- working with Rosetta 3.1

Compile results with "./scons.py -j8 bin extras=static" :
scons: done building targets.
no errors
310 warnings

Compile results with "./scons.py -j8 bin mode=release extras=static" :
compile error:
src/protocols/toolbox/PoseMetricCalculators/ResidueDecompositionCalculator.cc: In member function ‘void protocols::toolbox::PoseMetricCalculators::ResidueDecompositionCalculator::residue_set_numbers_to_decomposition()’:
src/protocols/toolbox/PoseMetricCalculators/ResidueDecompositionCalculator.cc:155: internal compiler error: Segmentation fault

[ this is identical to post http://www.rosettacommons.org/node/1897 ]

from using log=environment, I get:
CCFLAGS = -pipe -ffor-scope -W -Wall -pedantic -Wno-long-long -O3 -ffast-math -funroll-loops -finline-functions -finline-limit=20000 -s -Wno-unused-variable

So I played around with compile options, and found that the optimization "-finline-limit=20000" was causing the problems. I eventually found that src/protocols/toolbox/PoseMetricCalculators/ResidueDecompositionCalculator.cc would only compile for values of -finline-limit less than 1134 (so, -finline-limit=1133" works.

By replacing "-finline-limit=20000" by "-finline-limit=1133" in basic.settings, the whole compile with mode=release went to completion. So the CCFLAGS that make it work with gcc 4.4.2 are:
CCFLAGS = -pipe -ffor-scope -W -Wall -pedantic -Wno-long-long -O3 -ffast-math -funroll-loops -finline-functions -finline-limit=1133 -s -Wno-unused-variable

Note that the above applies whether or not the gcc 4.1 flags in basic.settings are used, i.e, with or without the additional flags "--param inline-unit-growth=1000 --param large-function-growth=50000"

So, for gcc 4.4.2:

WORKS:
[debug compile] CCFLAGS = -pipe -ffor-scope -W -Wall -pedantic -Wno-long-long -O0 -g -ggdb -ffloat-store
[debug compile] CCFLAGS = -pipe -ffor-scope -W -Wall -pedantic -Wno-long-long --param inline-unit-growth=1000 --param large-function-growth=50000 -O0 -g -ggdb -ffloat-store
[mode=release] CCFLAGS = -pipe -ffor-scope -W -Wall -pedantic -Wno-long-long -O3 -ffast-math -funroll-loops -finline-functions -finline-limit=1133 -s -Wno-unused-variable
[mode=release] CCFLAGS = -pipe -ffor-scope -W -Wall -pedantic -Wno-long-long --param inline-unit-growth=1000 --param large-function-growth=50000 -O3 -ffast-math -funroll-loops -finline-functions -finline-limit=1133 -s -Wno-unused-variable

DOES NOT WORK:
[mode=release] CCFLAGS = -pipe -ffor-scope -W -Wall -pedantic -Wno-long-long -O3 -ffast-math -funroll-loops -finline-functions -finline-limit=20000 -s -Wno-unused-variable
[mode=release] CCFLAGS = -pipe -ffor-scope -W -Wall -pedantic -Wno-long-long --param inline-unit-growth=1000 --param large-function-growth=50000 -O3 -ffast-math -funroll-loops -finline-functions -finline-limit=20000 -s -Wno-unused-variable

It would be greatly appreciated if someone could comment on the reduction of -finline-limit from 20000 to 1133. Appropriate? Potential effect? Should it be included by default in basic.settings for gcc 4.4?

Post Situation: 
Wed, 2011-02-02 10:25
smg3d

We had communication six days ago with the GCC folks who suggested we lower finline-limit to 487, for a similar failure in a different part of the code. It looks like the GCC folks have fixed their bug causing this crash, but of course they can't get their fix on your cluster.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47478

I get the impression they (GCC) think inline-limit and unroll-loops are too high anyway; of course they may be unaware of how inner-loop-bound Rosetta truly is. I've posted to the developer list your findings and asked for comment.

I can guarantee that changing inline-limit and unroll-loops for ResidueDecompositionCalculator won't affect performance; it's not low-level enough to be called very frequently. Changing it for the whole build system may have effects...

Wed, 2011-02-02 10:55
smlewis

Copying the email sent to the developers' list:

A particularly perspicacious user has duplicated this fix independently and wants to know if and how it affects the efficiency of the code:

http://www.rosettacommons.org/node/2274

Does anybody know how inline-limit and unroll-loops got set as high as they are? Do they need to be that high? Has anybody tested it? Are the existing performance tests smart enough to know if changing these values are a good idea?

Wed, 2011-02-02 10:58
smlewis

OK, our suggestion is:

A) use the handy bug report above to lean on your sysadmin to upgrade to a non-broken GCC (4.5).

B) Lowering the inline limit is known to negatively affect performance on older versions of Rosetta. At some point a software engineer tested to see how high it needed to be before it stopped improving performance. We are many Rosetta and GCC versions advanced since then, so maybe you'll be lucky and it won't have an effect. Nobody's tried it yet.

C) We can tweak it in basic.settings, but then people might non-obviously get "slow rosetta" if they happened to have 4.4 installed, and "fast rosetta" if they have 4.5 or 4.3. It seems better to get them to not use the buggy compiler.

D) If you aren't USING ResidueDecompositionCalculator - why not just dummy out all its functions? If you want to try this I'll help you track where it's used.

Wed, 2011-02-02 11:11
smlewis

After reading a few random forums regarding the use of -finline-limit, I got the feeling that large values were required for "older" compiler, but that the heuristic approach used by more recent gcc has got better at deciding which functions should be inlined...

I will so some quick benchmark for my typical use (AbinitioRelax) and post.

Wed, 2011-02-02 11:19
smg3d

You *may* wish to run the "benchmark" app as well (should build by default). I've never used it (I have it running now in my 3.1 to see if it works); I think all its options are hardcoded so you run it in its code folder with a database path:

rosetta-3.1/rosetta_source/src/apps/benchmark> ../../../bin/benchmark.linuxgccrelease -database ../../../../rosetta_database/

EDIT: no, this doesn't work in 3.1, leave it be.

Wed, 2011-02-02 11:29
smlewis

Yup, I had tried the benchmark on my laptop in the past and it failed... Good to know it does not work for others...

Thanks for your suggestions
A) Thanks for pointing that bug report to me. Yes, I will certainly forward it to sysadmin, but it most likely will not be done quickly (national facility)... However, as shown in my next post, things now work great with gcc 4.4.2 and inline-limit=1133 as far as abinitio and relax computations go.

B) Please see next post (in an hour or so...). No apparent hit on performance. Actually improvement compared to GCC 4.1.2 with inline-limit=20000

C) In light of the benchmarking I did, I will go with basic.settings tweak for now and use the available gcc 4.4.2. As far as I can tell, the 4.4 bug leads to ICE with certain optimization/code combination, but not to buggy programs. Once 4.5 is available on my cluster, i will revisit this issue.

D) could do so, but ResidueDecompositionCalculator is not even halfway into the compile... and maybe there are other functions that would also crash with inline-limit=20000 and gcc 4.4.2. will stick with original source code for now.

Cheers,

Stéphane

Thu, 2011-02-03 09:40
smg3d

Benchmark) I'm not aware of anyone, anyplace, anytime actually running that code - only the benchmarking server. There's just some pathing we're missing or something. It failed for me on the zinc ion residue type, of all things...

National Lab) I feel your pain. My lab has access to a BlueGene we can't use, because the admins won't update the XLC compiler. It's not really a bug we can fix - the code it can't handle is our implementation of owning pointers, which are so widespread in the code that "fixing it" would be forking Rosetta wholesale. The Baker lab has access to a different BlueGene that works fine, grumble grumble. Most of the stuff my lab does requires enough memory that BlueGenes wouldn't work anyway.

Thu, 2011-02-03 10:54
smlewis

Here are my benchmark results regarding the use of a lower value for -inline-limit in order to compile with gcc 4.4.2.

System : 90 residues protein

Protocols tested :

  1. "AbinitioRelax -abinitio -out:nstruct 100"
  2. "relax -relax" on above abinitio structures
  3. "relax -relax:fast" on above abinitio structures

Benchmarks were all done on a single node (dual Intel Nahalem-EP at 2.8 GHz, 24 GB RAM, ), with a single process per node (i.e. yes, I am wasting 7 cores... until I get the pseudo-mpi version of Rosetta compiled).
Linux r107-n37 2.6.18-194.32.1.el5 #1 SMP Wed Jan 5 17:52:25 EST 2011 x86_64 x86_64 x86_64 GNU/Linux

All compiles were static and included the following CCFLAGS
CCFLAGS = -pipe -ffor-scope -W -Wall -pedantic -Wno-long-long -O3 -ffast-math -funroll-loops -finline-functions -s -Wno-unused-variable

The following were tested (GCC version and additional CCFLAGS):

A) GCC 4.4.2 -finline-limit=1133 --param inline-unit-growth=1000 --param large-function-growth=50000
B) GCC 4.4.2 -finline-limit=1133
C) GCC 4.1.2 -finline-limit=20000 --param inline-unit-growth=1000 --param large-function-growth=50000
D) GCC 4.1.2 -finline-limit=20000
E) GCC 4.1.2 -finline-limit=1133 --param inline-unit-growth=1000 --param large-function-growth=50000
F) GCC 4.4.2

The following computation times are average user cpu time in seconds/structure for the above three protocols

A) 26.2 - 84.0 - 30.2
B) 26.0 - 99.1 - 35.5
C) 28.4 - 110.3 - 33.6
D) 28.4 - 102.5 - 38.3
E) 28.4 - 110.1 - 33.2
F) 35.9 - 101.2 - 35.9

In summary:

  • there is no negative effect on performance associated with -finline-limit=1133 and GCC 4.4.2 (A) compared with the default 4.1.2 compilation (C). In fact, (A) is 8%, 24%, and 10% faster than (C) for "-abinitio", "-relax" and "-relax:fast", respectively.
  • the "--param inline-unit-growth=1000 --param large-function-growth=50000" appears to improve performance in most cases, with one exception ("-relax" and 4.1.2)
  • even with gcc 4.1.2, -finline-limit 20000 (C) or 1133 (E) does not affect performance. From above comment, things were different with older compilers (3.x)
  • there is a significant performance cost if -finline-limit is not set (F)

To conclude, there is no apparent performance cost associated with compiling using gcc 4.4.2 and -finline-limit=1133.

Please remember that the above benchmark are only about AbinitioRelax and relax protocols, gcc 4.4.2, and my hardware. Other settings may give different results.

Hopefully this will be useful to other Rosetta users who may have to compile with GCC 4.4, or maybe to developers that may be interested in revisiting inline optimization with newer compilers.

Cheers,

Stéphane

Thu, 2011-02-03 13:04
smg3d

I am echoing this very useful data to the developer's list. Hopefully someone will become interested in rigorously re-examining the inlining across several executeables and compilers.

Thu, 2011-02-03 13:23
smlewis

This may be trivial (but for me it was not) so:
the file to modify for the compilation to use finline-limit=1133 is $PHENIX_ROSETTA_PATH/rosetta_source/tools/build/basic.settings

Fri, 2011-06-03 02:40
pietro1968