help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Problem with MPITB on IA64 arch


From: Gianvito Quarta
Subject: Re: Problem with MPITB on IA64 arch
Date: Thu, 19 Jan 2006 18:20:29 +0100

Hi Javier,
now I can reply to the question:

5.- copy-paste a screen dump with the same Octave command sequence I showed above

this is the command sequence:

address@hidden ~]$ octave
Set SSI rpi to tcp with the command:
  putenv('LAM_MPI_SSI_rpi','tcp'), MPI_Init
Help on MPI: help mpi
octave-2.1.72:1> MPI_COMM_WORLD
ans =  2.3058e+18
octave-2.1.72:2> MPI_Init
ans = 0
octave-2.1.72:3> a=MPI_COMM_WORLD
a =  2.3058e+18
octave-2.1.72:4> whos a

*** local user variables:

  Prot Name        Size                     Bytes  Class
  ==== ====        ====                     =====  =====
   rwd a           1x1                          8  scalar

Total is 1 element using 8 bytes

octave-2.1.72:5> MPI_Finalize
ans = 0
octave-2.1.72:6>

Thanks, G. QUARTA

At 20.30 18/01/2006, you wrote:
Hi, Gianvito

Gianvito Quarta wrote:

Hi,
I'm trying to set up a parallel octave environment on an Itanium II, IA64, 128 cpu cluster.
I have some problem during the mpitb re-compilation because
for IA64 arch, the cast from pointer to int gives problem
(during the compilation with gcc 3.2.3 the error:
reinterpret_cast from `_comm*' to `int' loses precision
occurs).
I'm sorry I was not able to reply to your e-mail sent at 15:29 on time, and this question reached the help list at 17:22. Most people here won't probably be interested in MPITB compilation problems. If you don't mind, I'd rather continue this dialog with personal e-mail instead of the help mailing list.

Thanks for using MPITB. I'm gladly surprised you managed to reach that far. I have never used any IA64, but perhaps with a little bit of help you can manage to build a working MPITB version for that platform.

Please search for "size" and "alignment" in your LAM config.log file. I'm mostly interested in the "int" and "void*" types size and alignment on your IA64 architecture. Also check the endianness. In my IA32 PC I have this:
________________
configure:5363: checking size of int
...
configure:5408: result: 4
configure:5436: checking size of long
...
configure:5481: result: 4
configure:5509: checking size of long long
...
configure:5554: result: 8
...
configure:5801: checking size of void *
...
configure:5846: result: 4
...
configure:6111: checking alignment of int
...
configure:6172: result: 4
...
configure:6265: checking alignment of long long
...
configure:6326: result: 4
...
configure:6573: checking alignment of void *
...
configure:6634: result: 4
...
configure:19090: checking whether byte ordering is bigendian
...
configure:19301: result: no
________________

So on IA32 all alignments are 4 and only "long long" has size 8. That's why I chose to cast LAM communicators (_comm*) to C ints. BTW, when returned to Octave they become "flints", so MPITB communicators are Octave scalars (doubles). You are not expected to do any maths with them, so when later reused they can be casted again from flints back to C ints and void*.

Your error message makes me suspect that IA64 void* is size 8, or at least greater than 4. In order to be able to cast LAM pointers to Octave <integers, flints, scalars, whatever>, I would need to know which is the compatible integer type under IA64. BTW, you can also look for the same information on Octave's own config.log file. I have:
________________
ac_cv_sizeof_int=4
ac_cv_sizeof_long=4
ac_cv_sizeof_long_long=8
ac_cv_sizeof_short=2
________________

Tell me the alignment and size of your GCC integer types and void* type so we can choose which one matches best. You can find that information in the LAM config.log file. E-mail directly to me, we can later summarize here in the list if you succeed in having MPITB working under IA64..

I tried to change the casting of pointers to long
and then I have successifull compiled MPITB.
Perhaps sizeof(long)==8 in IA64 ?!?
I assume you have edited just MPI_COMM_WORLD.cc, on line 33
from
RET_1_ARG(reinterpret_cast<int>( NAME )) // defined -> expanded
to
RET_1_ARG(reinterpret_cast<long>( NAME )) // defined -> expanded

If you haven't modified that line, or have modified others, please let me know. There is no hint in your original e-mail about which files/lines you have edited.

Unfortunaly some problems occur at run time,
...
[info rank]=MPI_Comm_rank(MPI_COMM_WORLD)% rank=0
MPI process rank 0 (n0, p31218) caught a SIGSEGV in MPI_Comm_rank.
Rank (0, MPI_COMM_WORLD): Call stack within LAM:
Rank (0, MPI_COMM_WORLD):  - MPI_Comm_rank()
Rank (0, MPI_COMM_WORLD):  - main()
I think the SigSegV may come from the communicator argument, since that's what you have edited (if I correctly guessed above).

So MPI_Init is working ?!? Great!!! It also seems you can also invoke MPI_COMM_WORLD without any problems. Try it out. I get this:
________________
$ octave
Set SSI rpi to tcp with the command:
 putenv('LAM_MPI_SSI_rpi','tcp'), MPI_Init
Help on MPI: help mpi
octave:1> MPI_COMM_WORLD
ans = 1099670176
octave:2> MPI_Init
ans = 0
octave:3> a=MPI_COMM_WORLD
a = 1099670176
octave:4> whos a

*** local user variables:

 Prot Name        Size                     Bytes  Class
 ==== ====        ====                     =====  =====
  rwd a           1x1                          8  scalar

Total is 1 element using 8 bytes

octave:5> MPI_Finalize
ans = 0
octave:6> quit
address@hidden mpitb]$
________________

So the pointer becomes a flint a=1099670176. Send me a copy of your output for this command sequence. Of course, if a=0 that's where the SigSegV comes from. Perhaps the pointer is being correctly casted to long (if you were lucky with your long decision), but it is not being correctly casted back to pointer, since it's using this code:
________________
       MPI_Comm comm = (MPI_Comm) args(ARGN).int_value();
________________
That's my fault. Right now I cannot remember why I didn't write any XXX_cast reserved word there. When I learned one shouldn't directly cast in C++, I started to static_ and reinterpret_cast. Perhaps I wrote that line before I learned that. I have forgotten again C++, so I guess I must re-read once more Stroustrup's "The C++ progr. lang" chapter 6.2.7... sigh!

Ok, summarizing: this is your homework :-)
0.- reply directly to me, not to the mailing list
1.- copy-paste LAM config.log lines related to int and void* sizes, alignments and endianness
2.- copy-paste Octave config.log lines related to int sizes
3.- tell me if you modified the line I mentioned (MPI_COMM_WORLD.cc, on line 33)
4.- tell me if you modified (and how) any other line
5.- copy-paste a screen dump with the same Octave command sequence I showed above 6.- (just a joke) locate in the sources the last line of code shown, the one with the bad C-style typecast

When I have all that information I'll suggest you to change the typecast to reinterpret<> (gcc will complaint, as it should if I had wrote it correctly for a start), if so then I'll suggest you to cast from long instead of from int... and so on until it works (I hope :-)

-javier




-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.

Octave's home on the web:  http://www.octave.org
How to fund new projects:  http://www.octave.org/funding.html
Subscription information:  http://www.octave.org/archive.html
-------------------------------------------------------------



-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.

Octave's home on the web:  http://www.octave.org
How to fund new projects:  http://www.octave.org/funding.html
Subscription information:  http://www.octave.org/archive.html
-------------------------------------------------------------



reply via email to

[Prev in Thread] Current Thread [Next in Thread]