espressomd-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ESPResSo] Self-avoiding random walk SAW


From: Axel Arnold
Subject: Re: [ESPResSo] Self-avoiding random walk SAW
Date: Wed, 11 Aug 2010 11:00:55 +0200
User-agent: KMail/1.10.3 (Linux/2.6.27.48-0.2-default; KDE/4.1.3; x86_64; ; )

On Tuesday 10 August 2010 23:52:24 you wrote:
> Dear Dr. Axel Arnold,
>
> Thank you again for your help with Espresso! I really appreciate it and I
> am trying really hard  to resolve most of the problems myself :)
> I've been trying to run the simulation in parallel recently because I
> needed to run it for bigger size simulations. I installed OpenMPI and
> Espresso seems to work at least on two cores on my computer. The problem is
> that the modified(after I added all that stuff to the code) Espresso gives
> me some errors during the simulation. So it seems like what I added to the
> code doesn't run(gives errors really quickly after I start the simulation )
> in parallel even though it ran perfectly fine without it. Here are the
> errors that I got. I was wondering if you had any idea what can cause this
> errors or at least could point me in the right direction where to look for
> an answer.
>
> [bender:21480] *** Process received signal ***
> [bender:21480] Signal: Floating point exception (8)
> [bender:21480] Signal code: Integer divide-by-zero (1)
> [bender:21480] Failing at address: 0x442cbf
> [bender:21480] [ 0] /lib/libpthread.so.0 [0x7f8f3f2d3190]
> [bender:21480] [ 1] ./obj-Core-pc-linux/Espresso_bin(force_calc+0x19f)
> [0x442cbf]
> [bender:21480] [ 2] ./obj-Core-pc-linux/Espresso_bin(integrate_vv+0x63a)
> [0x4385fa]
> [bender:21480] [ 3]
> ./obj-Core-pc-linux/Espresso_bin(mpi_integrate_slave+0xb) [0x4142cb]
> [bender:21480] [ 4] ./obj-Core-pc-linux/Espresso_bin(mpi_loop+0x2d)
> [0x41304d]
> [bender:21480] [ 5] ./obj-Core-pc-linux/Espresso_bin(main+0x40) [0x4100a0]
> [bender:21480] [ 6] /lib/libc.so.6(__libc_start_main+0xfd) [0x7f8f3ef72abd]
> [bender:21480] [ 7] ./obj-Core-pc-linux/Espresso_bin [0x40ff89]
> [bender:21480] *** End of error message ***
> mpiexec noticed that job rank 0 with PID 21479 on node bender exited on
> signal 15 (Terminated).
> 1 additional process aborted (not shown)

Well, somewhere, you divide by a zero integer (that can also be a modulo 
operation). The error happens somewhere at the beginning of force_calc, so 
likely in the code that you added.

Do you actually loop over all cells or only over local_cells? Parallel MD is 
somewhat more complicated; in ESPResSo, there is always exactly one processor 
responsible for a particle, and this processor has the full information about 
it. However, each processor has also information about particles from 
neighboring CPUs, which are close to its own domain. This particles are called 
ghost particles, and are stored in ghost cells, in opposition to the local 
cells, which contain the particles that are attached to the local CPU. The 
bond information for ghost cells is not set, and you cannot add bonds to these 
particles; that are just copies of particles on different processors.

Another source of errors might be that you use the local_particles array. On a 
single CPU, all particles are local, so all particles appear there; on a 
multiprocessor run, some particles are not present on the local CPU, which is 
indicated by a zero pointer in this array. If you try to modify such a 
particle, e.g. add a bond, that will not work. However, even if the pointer is 
non-zero, it might still point to a ghost particle, which you cannot add bonds 
to. You can check with part->l.ghost; if it is 1, then the particle is a 
ghost, that you should not modify.

Otherwise, I am afraid you will need to try use the debugger to figure out the 
problem. To this aim, recompile Espresso with './configure CFLAGS="-g" --
disable-processor-optimization && make'. That code will be pretty slow, but 
better for debugging. Then, run again, and note down the address of the 
failure, in your case 0x442cbf, but that will change after recompiling. Then, 
start the debugger by gdb ./obj-Core-pc-linux/Espresso_bin, and say "list 
*0x442cbf", or whatever address it is, to see the code line where the problem 
occurs.

Cheers,
Axel

--
JP Dr. Axel Arnold Tel: +49 711 685 67609
ICP, Universität Stuttgart      Email: address@hidden
Pfaffenwaldring 27
70569 Stuttgart, Germany




reply via email to

[Prev in Thread] Current Thread [Next in Thread]