You are here

Error when using next-gen KIC (Signal 6)

2 posts / 0 new
Last post
Error when using next-gen KIC (Signal 6)
#1

Dear all,

I keep receiving a very strange error message when using NGK on a 24-core (48 mpi threads) workstation. The error occurs normally around the ~ 180 output file (once i made it to > 600 but I am aiming for at least 2000). I am trying to model up to 4 loops simultaneously but the error also appears sometimes if I model only one loop. I also tried reducing to no more than 20 threads but the error would still appear.

After googling a bit I tried to set:

ulimit -s unlimited

but that didn't change anything. I also read something about a porting problem between windows and linux (I am using Ubuntu 14.04 LTS with Xubuntu) but I did the pre-minpack on the linux machine so I don't really think that this could be the problem.

Example error output (mpirun -np 20):

terminate called after throwing an instance of 'std::length_error'

what(): vector::_M_fill_insert

[franklin:13828] *** Process received signal ***

[franklin:13828] Signal: Aborted (6)

[franklin:13828] Signal code: (-6)

[franklin:13828] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x37000) [0x7ffd04c62000]

[franklin:13828] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x39) [0x7ffd04c61f89]

[franklin:13828] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x148) [0x7ffd04c65398]

[franklin:13828] [ 3] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x155) [0x7ffd052676b5]

[franklin:13828] [ 4] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x5e836) [0x7ffd05265836]

[franklin:13828] [ 5] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x5e863) [0x7ffd05265863]

[franklin:13828] [ 6] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x5eaa2) [0x7ffd05265aa2]

[franklin:13828] [ 7] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZSt20__throw_length_errorPKc+0x67) [0x7ffd052b7537]

[franklin:13828] [ 8] /home/jperthold/rosetta-3.5/rosetta_source/build/src/release/linux/3.13/64/x86/gcc/4.8/mpi/libprotocols_b.5.so(_ZNSt6vectorIN7utility7vector1IdSaIdEEESaIS3_EE14_M_fill_insertEN9__gnu_cxx17__normal_iteratorIPS3_S5_EEmRKS3_+0xa1b) [0x7ffd063a06ab]

[franklin:13828] [ 9] /home/jperthold/rosetta-3.5/rosetta_source/build/src/release/linux/3.13/64/x86/gcc/4.8/mpi/libnumeric.so(_ZN7numeric17kinematic_closure5dixonERKN7utility7vector1INS2_IdSaIdEEESaIS4_EEES8_S8_S8_RKNS2_IiSaIiEEERS6_SD_SD_Ri+0xbf33) [0x7ffd00546a23]

[franklin:13828] [10] /home/jperthold/rosetta-3.5/rosetta_source/build/src/release/linux/3.13/64/x86/gcc/4.8/mpi/libnumeric.so(_ZN7numeric17kinematic_closure13bridgeObjectsERKN7utility7vector1INS2_IdSaIdEEESaIS4_EEERKS4_SA_SA_RKNS2_IiSaIiEEESE_RS6_SF_SF_Ri+0x38e0) [0x7ffd0052a140]

[franklin:13828] [11] /home/jperthold/rosetta-3.5/rosetta_source/build/src/release/linux/3.13/64/x86/gcc/4.8/mpi/libprotocols.3.so(_ZN9protocols5loops12loop_closure17kinematic_closure14KinematicMover5applyERN4core4pose4PoseE+0x19b8) [0x7ffd035954a8]

[franklin:13828] [12] /home/jperthold/rosetta-3.5/rosetta_source/build/src/release/linux/3.13/64/x86/gcc/4.8/mpi/libprotocols.3.so(_ZN9protocols5loops10loop_mover6refine20LoopMover_Refine_KIC5applyERN4core4pose4PoseE+0x1cc7) [0x7ffd036b2d37]

[franklin:13828] [13] /home/jperthold/rosetta-3.5/rosetta_source/build/src/release/linux/3.13/64/x86/gcc/4.8/mpi/libprotocols_g.4.so(_ZN9protocols20comparative_modeling14LoopRelaxMover5applyERN4core4pose4PoseE+0x65d1) [0x7ffd042bd0b1]

[franklin:13828] [14] /home/jperthold/rosetta-3.5/rosetta_source/build/src/release/linux/3.13/64/x86/gcc/4.8/mpi/libprotocols_b.5.so(_ZN9protocols10loop_build14LoopBuildMover5applyERN4core4pose4PoseE+0x68b) [0x7ffd0617a9eb]

[franklin:13828] [15] /home/jperthold/rosetta-3.5/rosetta_source/build/src/release/linux/3.13/64/x86/gcc/4.8/mpi/libprotocols.1.so(_ZN9protocols3jd214JobDistributor7go_mainEN7utility7pointer10owning_ptrINS_5moves5MoverEEE+0x309) [0x7ffd02e0f299]

[franklin:13828] [16] /home/jperthold/rosetta-3.5/rosetta_source/build/src/release/linux/3.13/64/x86/gcc/4.8/mpi/libprotocols.1.so(_ZN9protocols3jd225MPIWorkPoolJobDistributor8slave_goEN7utility7pointer10owning_ptrINS_5moves5MoverEEE+0x3d) [0x7ffd02e253ed]

[franklin:13828] [17] /home/jperthold/rosetta-3.5/rosetta_source/build/src/release/linux/3.13/64/x86/gcc/4.8/mpi/libprotocols.1.so(_ZN9protocols3jd225MPIWorkPoolJobDistributor2goEN7utility7pointer10owning_ptrINS_5moves5MoverEEE+0x90) [0x7ffd02e25380]

[franklin:13828] [18] /home/jperthold/rosetta-3.5/rosetta_source/build/src/release/linux/3.13/64/x86/gcc/4.8/mpi/libprotocols_b.5.so(_ZN9protocols10loop_build14LoopBuild_mainEb+0x730) [0x7ffd06175720]

[franklin:13828] [19] /home/jperthold/rosetta-3.5/rosetta_source/bin/loopmodel.mpi.linuxgccrelease() [0x4086a6]

[franklin:13828] [20] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7ffd04c4cec5]

[franklin:13828] [21] /home/jperthold/rosetta-3.5/rosetta_source/bin/loopmodel.mpi.linuxgccrelease() [0x4087af]

[franklin:13828] *** End of error message ***

protocols.loops.loop_mover.perturb.LoopMover_Perturb_KIC: (21) new centroid perturb rmsd: 5.28133 protocols.loops.loop_mover.perturb.LoopMover_Perturb_KIC: (17) new centroid perturb rmsd: 7.82976 protocols.loops.loop_mover.perturb.LoopMover_Perturb_KIC: (21) new centroid perturb rmsd: 5.29043 protocols.loops.loop_mover.refine.LoopMover_Refine_KIC: (9) RMS to native after accepted kinematic round 1 move on loop 1: 5.19023 protocols.loops.loop_mover.refine.LoopMover_Refine_KIC: (9) energy after accepted move: -200.737 protocols.loops.loop_mover.perturb.LoopMover_Perturb_KIC: (21) new centroid perturb rmsd: 5.2741 protocols.loops.loop_mover.perturb.LoopMover_Perturb_KIC: (16) new centroid perturb rmsd: 1.30019 protocols.loops.loop_mover.perturb.LoopMover_Perturb_KIC: (21) new centroid perturb rmsd: 5.21618 protocols.loops.loop_mover.perturb.LoopMover_Perturb_KIC: (21) new centroid perturb rmsd: 5.21773 protocols.loops.loop_mover.refine.LoopMover_Refine_KIC: (23) refinement cycle (outer/inner): 4/5 126/490 protocols.loops.loop_mover.perturb.LoopMover_Perturb_KIC: (14) new centroid perturb rmsd: 4.32998 protocols.loops.loop_mover.perturb.LoopMover_Perturb_KIC: (14) new centroid perturb rmsd: 4.34061 protocols.loops.loop_mover.refine.LoopMover_Refine_KIC: (6) refinement cycle (outer/inner): 4/5 279/490 protocols.loops.loop_mover.refine.LoopMover_Refine_KIC: (9) RMS to native after accepted kinematic round 2 move on loop 1: 5.19097 protocols.loops.loop_mover.refine.LoopMover_Refine_KIC: (9) energy after accepted move: -200.924 protocols.loops.loop_mover.refine.LoopMover_Refine_KIC: (9) refinement cycle (outer/inner): 1/5 322/490 core.pack.pack_rotamers: (19) IG: 16417000 bytes protocols.loops.loop_mover.refine.LoopMover_Refine_KIC: (19) energy after design: -200.041 protocols.moves.MonteCarlo: (19) MonteCarlo:: last_accepted_score,lowest_score: -200.16 -209.532 protocols.loops.loop_mover.refine.LoopMover_Refine_KIC: (19) energy after repack: -200.16 protocols.loops.loop_mover.refine.LoopMover_Refine_KIC: (13) refinement cycle (outer/inner): 1/5 389/490 protocols.loops.loop_mover.perturb.LoopMover_Perturb_KIC: (21) new centroid perturb rmsd: 5.20974 protocols.loops.loop_mover.perturb.LoopMover_Perturb_KIC: (21) new centroid perturb rmsd: 5.20886 protocols.loops.loop_mover.perturb.LoopMover_Perturb_KIC: (21) new centroid perturb rmsd: 5.29811 protocols.loops.loop_mover.perturb.LoopMover_Perturb_KIC: (21) new centroid perturb rmsd: 6.1993 protocols.loops.loop_mover.perturb.LoopMover_Perturb_KIC: (21) new centroid perturb rmsd: 6.19936 protocols.loops.loop_mover.perturb.LoopMover_Perturb_KIC: (21) new centroid perturb rmsd: 6.19506 protocols.loops.loop_mover.perturb.LoopMover_Perturb_KIC: (21) new centroid perturb rmsd: 6.20512 protocols.loops.loop_mover.perturb.LoopMover_Perturb_KIC: (21) new centroid perturb rmsd: 6.20607 protocols.loops.loop_mover.refine.LoopMover_Refine_KIC: (20) refinement cycle (outer/inner): 4/5 122/490 protocols.loops.loop_mover.refine.LoopMover_Refine_KIC: (20) RMS to native after accepted kinematic round 1 move on loop 4: 4.53692 protocols.loops.loop_mover.refine.LoopMover_Refine_KIC: (20) energy after accepted move: -173.389 protocols.loops.loop_mover.refine.LoopMover_Refine_KIC: (20) refinement cycle (outer/inner): 4/5 123/490

-------------------------------------------------------------------------- mpirun noticed that process rank 7 with PID 13828 on node franklin exited on signal 6 (Aborted).

Example flag file:

-database -database /home/jperthold/rosetta-3.5/rosetta_database
-loops:loop_file ./input.loop
-loops:remodel perturb_kic
-loops:refine refine_kic
-loops:max_kic_build_attempts 10000
-loops:outer_cycles 5
-loops:max_inner_cycles 960
-loops:ramp_fa_rep
-loops:ramp_rama
-loops:kic_rama2b
-loops:kic_omega_sampling
-allow_omega_move true
-kic_min_after_repack true
-kic_bump_overlap_factor 0.36
-legacy_kic false
-corrections:score:use_bicubic_interpolation false
-in:file:fullatom
-in:file:native ./input.pdb
-in:file:s ./input.pdb
-out:file:fullatom
-out:overwrite
-out:prefix task1_
-out:path:pdb ./output
-ex1
-ex2
-extrachi_cutoff 0
-out:nstruct 2000</code>

I would be very happy if someone could help me with this issue. Thank you in advance!

Regards,

Jan

Category: 
Post Situation: 
Tue, 2014-08-26 08:36
janwp

Hi Jan,

none of the KIC/NGK developers have run into this. Have you tried running without MPI to see if that prevents the issue?
If you keep getting the same error, would be great if you could provide your input files for debugging.

thanks,
Amelie

Tue, 2014-09-30 14:29
amelie