Re: Problem with MPITB on IA64 arch

help-octave

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Problem with MPITB on IA64 arch

From:	Javier Fernandez Baldomero
Subject:	Re: Problem with MPITB on IA64 arch
Date:	Sun, 05 Feb 2006 12:37:24 +0100
User-agent:	Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030225


Hi all,

just in case our results are useful to anyone, we summarize theworkaround for this MPITB problem.



Gianvito Quarta wrote:

Hi,
I'm trying to set up a parallel octave environment on an Itanium II,IA64, 128 cpu cluster.I have some problem during the mpitb re-compilation because for IA64arch, the cast from pointer to int gives problem (during thecompilation with gcc 3.2.3 the error:
reinterpret_cast from `_comm*' to `int' loses precision
occurs).

Yes, the MPI pointer types were being systematically translated toOctave flints (floating-point integers) with code like this:

RET_2_ARG (info,reinterpret_cast<int>(comm));


were, as one could guess
______________________________
#define RET_2_ARG(ARG0,ARG1)                        \
       octave_value_list retval;                                       \
               retval(0) = ARG0;                                       \
               retval(1) = ARG1;                                       \
       return  retval;
______________________________

LAM/MPI uses pointers to describe internal objects such as processgroups, communicators, reduction operations, callback functions, etc,used as arguments in MPI calls. The C bindings use C pointers to passthese arguments. The FORTRAN bindings use integers. It seemed normal touse Octave flints to store the C pointers, being thus able to storereturned pointers and pass them to subsequent MPI calls.

On IA-32, the default constructor for retval(1) in the code aboveconverts the 32bit-int (obtained from the reinterpret_cast yet moreabove) to an Octave flint. Particularly, the place in ov.h where theseconstructors are declared reads like this:


______________________________
 octave_value (int i);
 octave_value (unsigned int i);
 octave_value (long int i);
 octave_value (unsigned long int i);

// XXX FIXME XXX -- these are kluges. They turn into doubles

 // internally, which will break for very large values.  We just use
 // them to store things like 64-bit ino_t, etc, and hope that those
 // values are never actually larger than can be represented exactly
 // in a double.

#if defined (HAVE_LONG_LONG_INT)

 octave_value (long long int i);
#endif
#if defined (HAVE_UNSIGNED_LONG_LONG_INT)
 octave_value (unsigned long long int i);
#endif
______________________________

Gianvito obtained the following error message, due to pointers being 8Bwide on IA-64

(during the compilation with gcc 3.2.3 the error:
reinterpret_cast from `_comm*' to `int' loses precision

occurs).



These are the sizes on each architecture
             int        long    void*    double
IA-32    4B        4B        4B        8B
IA-64    4B        8B        8B        8B

Changing to reinterpret_cast<long>(comm) was an excellent try. Ithought it should work.

I tried to change the casting of pointers to long and then I havesuccessifull compiled MPITB.


Unfortunaly some problems occur at run time,
...
[info rank]=MPI_Comm_rank(MPI_COMM_WORLD)% rank=0
MPI process rank 0 (n0, p31218) caught a SIGSEGV in MPI_Comm_rank.
Rank (0, MPI_COMM_WORLD): Call stack within LAM:
Rank (0, MPI_COMM_WORLD):  - MPI_Comm_rank()
Rank (0, MPI_COMM_WORLD):  - main()



But I forgot 2 details:
*****
* 1.- *
*****

not all 64bits from an 8B double are devoted to mantissa:sign+exp+man=1+11+52. So if the address expressed in the pointerrequires more than 53 bits (there is an implicit 1) there will be somerounding and/or truncation. When we later try to access that address,we'll access a location not intended, and SegFault.That should be investigated. Are addresses really that big? The "FIXME"comment about 64b int_t would make one think that you should have...errr... 2^52= 4PetaBytes of memory before such an address would show up.These are the results from Gianvito:

address@hidden ~]$ octave
Set SSI rpi to tcp with the command:
  putenv('LAM_MPI_SSI_rpi','tcp'), MPI_Init
Help on MPI: help mpi
octave-2.1.72:1> MPI_COMM_WORLD

ans = 2.3058e+18

Hmpf, I didn't instruct him to use format long so I could see all those18 decimal digits. Anyways, 2*10^18 is a huge address. Since1K=2^10~10^3, coarsely 10^18 ~ 2^60, which fits in 64b but not in 52.

octave-2.1.72:2> MPI_Init
ans = 0
octave-2.1.72:3> a=MPI_COMM_WORLD
a =  2.3058e+18
octave-2.1.72:4> whos a

*** local user variables:

  Prot Name        Size                     Bytes  Class
  ==== ====        ====                     =====  =====
   rwd a           1x1                          8  scalar

Total is 1 element using 8 bytes

The communicator was being translated to scalar double. Here I'm a bitembarassed to confess I spent some 6 to 8 trials to discover the rightsequence of typecasts to get the pointer stored in an Octave uint64, andback to C pointer. Gianvito proved to have an enduring, unbreakablepatience. Some of the tried casts are:

     RET_1_ARG(reinterpret_cast<octave_uint64>( comm ))
    MPI_Comm comm = reinterpret_cast<MPI_Comm>( 
args(ARGN).uint64_scalar_value() );

Wrong, octave_uint64 is an object. Changing to octave_uint64_t does nothelp, since uint64_scalar_value() returns an object too.

     RET_1_ARG(reinterpret_cast<octave_uint64_t>( comm ))
    MPI_Comm comm = reinterpret_cast<MPI_Comm>( args(ARGN).ulong_value() );


Wrong, becomes flint under Octave, uses default constructor for 8B_int.
The final solution is

#if __ia64__
#define MPITB_OctPtrTyp         octave_uint64           // IA-64
#define MPITB_OctIntFcn         uint64_scalar_value
#else                                                   // #elif __i386__
#define MPITB_OctPtrTyp         octave_uint32           // IA-32
#define MPITB_OctIntFcn         uint32_scalar_value
#endif
#define MPITB_isOPtrTyp(ov)((ov.is_scalar_type())&&(ov.is_numeric_type()))#define MPITB_intcast(cptr)MPITB_OctPtrTyp (\reinterpret_cast <unsigned long> (cptr ) )#define MPITB_ptrcast(typ,ov) reinterpret_cast <typ> (ov. \MPITB_OctIntFcn().value() )


so the casts become:

     RET_1_ARG(octave_uint64(reinterpret_cast<unsigned long>( comm )))
MPI_Comm comm = reinterpret_cast<MPI_Comm>( args(ARGN).uint64_scalar_value().value());



*****
* 2.- *
*****

the args error-checking code was rejecting those new 8B integers asvalid MPI objects. So we also included the MPITB_isOPtrTyp() macro toreplace the previous ov.is_scalar() test.


-javier

P.S.:
Out of curiousity, these are the pointer values causing the SegFault

_____________________________________________________
octave-2.1.72:4> a=MPI_COMM_WORLD
a = 2.3058430092933637e+18
octave-2.1.72:5> whos a

*** local user variables:

 Prot Name        Size                     Bytes  Class
 ==== ====        ====                     =====  =====
  rwd a           1x1                          8  scalar

Total is 1 element using 8 bytes

octave-2.1.72:6> [info rank]=MPI_Comm_rank(MPI_COMM_WORLD)
MPI process rank 0 (n0, p8187) caught a SIGSEGV in MPI_Comm_rank.

_____________________________________________________

octave-2.1.72:4> a=MPI_COMM_WORLD
a = 2305843009293363640
octave-2.1.72:5> whos a

*** local user variables:

 Prot Name        Size                     Bytes  Class
 ==== ====        ====                     =====  =====
  rwd a           1x1                          8  uint64 scalar

Total is 1 element using 8 bytes

octave-2.1.72:6> [info rank]=MPI_Comm_rank(MPI_COMM_WORLD)
error: MPI_Comm_rank: required arg#1: comm(int)

_____________________________________________________

Can you see the rounding up (not truncation) when translated to double?I didn't expect such a huge address.


a = 2.3058430092933637e+18

a = 2305843009293363640




-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.

Octave's home on the web:  http://www.octave.org
How to fund new projects:  http://www.octave.org/funding.html
Subscription information:  http://www.octave.org/archive.html
-------------------------------------------------------------

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Problem with MPITB on IA64 arch, Javier Fernandez Baldomero <=

Prev by Date: Re: struct() handling in C++
Next by Date: Plot oddity?
Previous by thread: octave.pdf on letter paper?
Next by thread: Plot oddity?
Index(es):
- Date
- Thread