You are here

mpi installation issues

4 posts / 0 new
Last post
mpi installation issues
#1

hi all

i am trying to build mpi version of rosetta 3.7 on a machine with redhat linux (I have no problem installing on Ubuntu). I copied the topsail site.settings file and commented out the path to INCLUDE because initially rosetta complained about no INCLUDE parameter.

I compile by using scons(scons -j8 bin mode=release extras=mpi) and the compiling ends fine, no error, no stalling, but when i execute the executables (for instance fixbb) i get the  message below:

fixbb.linuxgccrelease: route/tc.c:973: rtnl_tc_register: Assertion `0' failed.

fixbb.linuxgccrelease:8222 terminated with signal 6 at PC=2af1e093a5f7 SP=7fff24ad2568.  Backtrace:
/lib64/libc.so.6(gsignal+0x37)[0x2af1e093a5f7]
/lib64/libc.so.6(abort+0x148)[0x2af1e093bce8]
/lib64/libc.so.6(+0x2e566)[0x2af1e0933566]
/lib64/libc.so.6(+0x2e612)[0x2af1e0933612]
/lib64/libnl-route-3.so.200(+0x21249)[0x2af1e6a62249]
/lib64/ld-linux-x86-64.so.2(+0xf3a3)[0x2af1d17163a3]
/lib64/ld-linux-x86-64.so.2(+0x13ab6)[0x2af1d171aab6]
/lib64/ld-linux-x86-64.so.2(+0xf1b4)[0x2af1d17161b4]
/lib64/ld-linux-x86-64.so.2(+0x131ab)[0x2af1d171a1ab]
/lib64/libdl.so.2(+0x102b)[0x2af1e0ecf02b]
/lib64/ld-linux-x86-64.so.2(+0xf1b4)[0x2af1d17161b4]
/lib64/libdl.so.2(+0x162d)[0x2af1e0ecf62d]
/lib64/libdl.so.2(dlopen+0x31)[0x2af1e0ecf0c1]
/usr/lib64/openmpi/lib/libopen-pal.so.13(+0x58f34)[0x2af1e13a6f34]
/usr/lib64/openmpi/lib/libopen-pal.so.13(+0x3b891)[0x2af1e1389891]
/usr/lib64/openmpi/lib/libopen-pal.so.13(mca_base_component_find+0x78a)[0x2af1e138ae0a]
/usr/lib64/openmpi/lib/libopen-pal.so.13(mca_base_framework_components_register+0x56)[0x2af1e1394d46]
/usr/lib64/openmpi/lib/libopen-pal.so.13(mca_base_framework_register+0x196)[0x2af1e13951f6]
/usr/lib64/openmpi/lib/libopen-pal.so.13(mca_base_framework_open+0x12)[0x2af1e1395252]
/usr/lib64/openmpi/lib/libopen-pal.so.13(mca_base_framework_open+0x66)[0x2af1e13952a6]
/usr/lib64/openmpi/lib/libmpi.so.12(ompi_mpi_init+0x476)[0x2af1dfc2a2f6]

/usr/lib64/openmpi/lib/libmpi.so.12(MPI_Init+0x193)[0x2af1dfc4c4e3]
/home/enztech/rosetta_bin_linux_2016.32.58837_bundle/main/source/build/src/release/linux/3.10/64/x86/gcc/4.8/mpi/libcore.5.so(_ZN4core4init8init_mpiEiPPc+0x33)[0x2af1da00c313]
/home/enztech/rosetta_bin_linux_2016.32.58837_bundle/main/source/build/src/release/linux/3.10/64/x86/gcc/4.8/mpi/libcore.5.so(_ZN4core4init4initEiPPc+0x1a)[0x2af1da00ef1a]
./fixbb.linuxgccrelease[0x40c53e]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x2af1e0926b15]
./fixbb.linuxgccrelease[0x40d4b1]

fixbb.linuxgccrelease:8222 terminated with signal 11 at PC=2af1e6a62258 SP=7fff24ad1bf8.  Backtrace:
/lib64/libnl-route-3.so.200(rtnl_tc_unregister+0x8)[0x2af1e6a62258]
/lib64/ld-linux-x86-64.so.2(+0xfa1a)[0x2af1d1716a1a]
/lib64/libc.so.6(+0x38e69)[0x2af1e093de69]
/lib64/libc.so.6(+0x38eb5)[0x2af1e093deb5]
/lib64/libinfinipath.so.4(+0x426f)[0x2af1e6eea26f]
/lib64/libpthread.so.0(+0xf100)[0x2af1e06f8100]
/lib64/libc.so.6(gsignal+0x37)[0x2af1e093a5f7]
/lib64/libc.so.6(abort+0x148)[0x2af1e093bce8]
/lib64/libc.so.6(+0x2e566)[0x2af1e0933566]
/lib64/libc.so.6(+0x2e612)[0x2af1e0933612]
/lib64/libnl-route-3.so.200(+0x21249)[0x2af1e6a62249]
/lib64/ld-linux-x86-64.so.2(+0xf3a3)[0x2af1d17163a3]
/lib64/ld-linux-x86-64.so.2(+0x13ab6)[0x2af1d171aab6]
/lib64/ld-linux-x86-64.so.2(+0xf1b4)[0x2af1d17161b4]

/lib64/ld-linux-x86-64.so.2(+0x131ab)[0x2af1d171a1ab]
/lib64/libdl.so.2(+0x102b)[0x2af1e0ecf02b]
/lib64/ld-linux-x86-64.so.2(+0xf1b4)[0x2af1d17161b4]
/lib64/libdl.so.2(+0x162d)[0x2af1e0ecf62d]
/lib64/libdl.so.2(dlopen+0x31)[0x2af1e0ecf0c1]
/usr/lib64/openmpi/lib/libopen-pal.so.13(+0x58f34)[0x2af1e13a6f34]
/usr/lib64/openmpi/lib/libopen-pal.so.13(+0x3b891)[0x2af1e1389891]
/usr/lib64/openmpi/lib/libopen-pal.so.13(mca_base_component_find+0x78a)[0x2af1e138ae0a]
/usr/lib64/openmpi/lib/libopen-pal.so.13(mca_base_framework_components_register+0x56)[0x2af1e1394d46]
/usr/lib64/openmpi/lib/libopen-pal.so.13(mca_base_framework_register+0x196)[0x2af1e13951f6]
/usr/lib64/openmpi/lib/libopen-pal.so.13(mca_base_framework_open+0x12)[0x2af1e1395252]
/usr/lib64/openmpi/lib/libopen-pal.so.13(mca_base_framework_open+0x66)[0x2af1e13952a6]
/usr/lib64/openmpi/lib/libmpi.so.12(ompi_mpi_init+0x476)[0x2af1dfc2a2f6]
/usr/lib64/openmpi/lib/libmpi.so.12(MPI_Init+0x193)[0x2af1dfc4c4e3]
/home/enztech/rosetta_bin_linux_2016.32.58837_bundle/main/source/build/src/release/linux/3.10/64/x86/gcc/4.8/mpi/libcore.5.so(_ZN4core4init8init_mpiEiPPc+0x33)[0x2af1da00c313]
/home/enztech/rosetta_bin_linux_2016.32.58837_bundle/main/source/build/src/release/linux/3.10/64/x86/gcc/4.8/mpi/libcore.5.so(_ZN4core4init4initEiPPc+0x1a)[0x2af1da00ef1a]
./fixbb.linuxgccrelease[0x40c53e]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x2af1e0926b15]
./fixbb.linuxgccrelease[0x40d4b1]

this happens whether i use the mpi version or just the normal version.

could someone help me out with this? thank you very much

Category: 
Post Situation: 
Tue, 2016-10-04 18:48
banshee

"this happens whether i use the mpi version or just the normal version."

Normal version = fixbb.default.linuxgccrelease, or just fixbb.linuxgccrelease?  If it's the latter it's probably symlinked to the mpi version.  If it's the former it's very odd you're getting an mpi error without mpi compiling.  The one with no intervening .default. or .mpi. is symlinked to whatever was last built.  You probably know that but I wrote version 1 of this comment without checking the question's author.

Broadly speaking, I have no idea what the error is - it looks like it's failing in an MPI library during Rosetta initialization, so all I can suggest is to try a different version of openmpi and/or ensure that all the mpirun binaries, mpi build libraries, compilers, etc are all cross-compatible.  Do any other MPI tools work on the machine?  Could it be some sort of permissions issue with the mpi communication channels?

Wed, 2016-10-05 09:30
smlewis

thanks for your reply. i got the wording confused, I meant the static mpi version not the normal version. 

i read another question on the forum complaining of what it looks like to me a similar, but not the same, library related errors when running unit test with mpi. you mentioned that it could be the version of openmpi or the compiler is more recent than what was tested with rosetta. could that also be the case here? you said

GCC: 4.8.3, Open MPI: 1.6.4 was what was tested. i checked the version on my computer and it was gcc 4.8.5 and openmpi 3.0.2

thanks steven!

Wed, 2016-10-05 18:46
banshee

3.7 was before the Cxx11 changeover (we switched just after 3.7) so knowing what I have now is less useful than it might be.  I am using gcc 5.4.0 and openmpi 1.10.2.  The openmpi web site (https://www.open-mpi.org/) does not suggest to me that their versions yet go as high as 3 (I'm guessing it's a package renumbering from the linux distro's package manager).  

Googling around shows similar-looking errors due to ???? somewhere in MPI (https://www.mail-archive.com/devel@lists.open-mpi.org/msg18181.html) - not that I know what to do with that data.  The most interesting thing from that email thread is 

The main change appears to be a switch from a MOFED-based install to the
OFED packaged with RHEL7.

 

That suggests to me that the package you are getting from Red Hat is bad; maybe try (shudder, this NEVER works) building mpi yourself?  (We just had a long thread with someone who'd built it themselves and the solution was "use the package instead", I think....)   Maybe Red Hat can give you a different version of the openmpi package?

Thu, 2016-10-06 08:53
smlewis