help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Problem with MPITB on IA64 arch


From: Javier Fernandez Baldomero
Subject: Re: Problem with MPITB on IA64 arch
Date: Sun, 05 Feb 2006 12:37:24 +0100
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030225


Hi all,

just in case our results are useful to anyone, we summarize the workaround for this MPITB problem.


Gianvito Quarta wrote:

Hi,
I'm trying to set up a parallel octave environment on an Itanium II, IA64, 128 cpu cluster. I have some problem during the mpitb re-compilation because for IA64 arch, the cast from pointer to int gives problem (during the compilation with gcc 3.2.3 the error:
reinterpret_cast from `_comm*' to `int' loses precision
occurs).


Yes, the MPI pointer types were being systematically translated to Octave flints (floating-point integers) with code like this:
RET_2_ARG (info,reinterpret_cast<int>(comm));


were, as one could guess
______________________________
#define RET_2_ARG(ARG0,ARG1)                        \
       octave_value_list retval;                                       \
               retval(0) = ARG0;                                       \
               retval(1) = ARG1;                                       \
       return  retval;
______________________________


LAM/MPI uses pointers to describe internal objects such as process groups, communicators, reduction operations, callback functions, etc, used as arguments in MPI calls. The C bindings use C pointers to pass these arguments. The FORTRAN bindings use integers. It seemed normal to use Octave flints to store the C pointers, being thus able to store returned pointers and pass them to subsequent MPI calls.

On IA-32, the default constructor for retval(1) in the code above converts the 32bit-int (obtained from the reinterpret_cast yet more above) to an Octave flint. Particularly, the place in ov.h where these constructors are declared reads like this:

______________________________
 octave_value (int i);
 octave_value (unsigned int i);
 octave_value (long int i);
 octave_value (unsigned long int i);
// XXX FIXME XXX -- these are kluges. They turn into doubles
 // internally, which will break for very large values.  We just use
 // them to store things like 64-bit ino_t, etc, and hope that those
 // values are never actually larger than can be represented exactly
 // in a double.
#if defined (HAVE_LONG_LONG_INT)
 octave_value (long long int i);
#endif
#if defined (HAVE_UNSIGNED_LONG_LONG_INT)
 octave_value (unsigned long long int i);
#endif
______________________________


Gianvito obtained the following error message, due to pointers being 8B wide on IA-64

(during the compilation with gcc 3.2.3 the error:
reinterpret_cast from `_comm*' to `int' loses precision
occurs).


These are the sizes on each architecture
             int        long    void*    double
IA-32    4B        4B        4B        8B
IA-64    4B        8B        8B        8B

Changing to reinterpret_cast<long>(comm) was an excellent try. I thought it should work.

I tried to change the casting of pointers to long and then I have successifull compiled MPITB.

Unfortunaly some problems occur at run time,
...
[info rank]=MPI_Comm_rank(MPI_COMM_WORLD)% rank=0
MPI process rank 0 (n0, p31218) caught a SIGSEGV in MPI_Comm_rank.
Rank (0, MPI_COMM_WORLD): Call stack within LAM:
Rank (0, MPI_COMM_WORLD):  - MPI_Comm_rank()
Rank (0, MPI_COMM_WORLD):  - main()


But I forgot 2 details:
*****
* 1.- *
*****
not all 64bits from an 8B double are devoted to mantissa: sign+exp+man=1+11+52. So if the address expressed in the pointer requires more than 53 bits (there is an implicit 1) there will be some rounding and/or truncation. When we later try to access that address, we'll access a location not intended, and SegFault. That should be investigated. Are addresses really that big? The "FIXME" comment about 64b int_t would make one think that you should have... errr... 2^52= 4PetaBytes of memory before such an address would show up. These are the results from Gianvito:

address@hidden ~]$ octave
Set SSI rpi to tcp with the command:
  putenv('LAM_MPI_SSI_rpi','tcp'), MPI_Init
Help on MPI: help mpi
octave-2.1.72:1> MPI_COMM_WORLD
ans = 2.3058e+18

Hmpf, I didn't instruct him to use format long so I could see all those 18 decimal digits. Anyways, 2*10^18 is a huge address. Since 1K=2^10~10^3, coarsely 10^18 ~ 2^60, which fits in 64b but not in 52.

octave-2.1.72:2> MPI_Init
ans = 0
octave-2.1.72:3> a=MPI_COMM_WORLD
a =  2.3058e+18
octave-2.1.72:4> whos a

*** local user variables:

  Prot Name        Size                     Bytes  Class
  ==== ====        ====                     =====  =====
   rwd a           1x1                          8  scalar

Total is 1 element using 8 bytes

The communicator was being translated to scalar double. Here I'm a bit embarassed to confess I spent some 6 to 8 trials to discover the right sequence of typecasts to get the pointer stored in an Octave uint64, and back to C pointer. Gianvito proved to have an enduring, unbreakable patience. Some of the tried casts are:

     RET_1_ARG(reinterpret_cast<octave_uint64>( comm ))
    MPI_Comm comm = reinterpret_cast<MPI_Comm>( 
args(ARGN).uint64_scalar_value() );

Wrong, octave_uint64 is an object. Changing to octave_uint64_t does not help, since uint64_scalar_value() returns an object too.

     RET_1_ARG(reinterpret_cast<octave_uint64_t>( comm ))
    MPI_Comm comm = reinterpret_cast<MPI_Comm>( args(ARGN).ulong_value() );

Wrong, becomes flint under Octave, uses default constructor for 8B_int.
The final solution is

#if __ia64__
#define MPITB_OctPtrTyp         octave_uint64           // IA-64
#define MPITB_OctIntFcn         uint64_scalar_value
#else                                                   // #elif __i386__
#define MPITB_OctPtrTyp         octave_uint32           // IA-32
#define MPITB_OctIntFcn         uint32_scalar_value
#endif
#define MPITB_isOPtrTyp(ov) ((ov.is_scalar_type())&&(ov.is_numeric_type())) #define MPITB_intcast(cptr) MPITB_OctPtrTyp (\ reinterpret_cast <unsigned long> ( cptr ) ) #define MPITB_ptrcast(typ,ov) reinterpret_cast <typ> ( ov. \ MPITB_OctIntFcn().value() )

so the casts become:

     RET_1_ARG(octave_uint64(reinterpret_cast<unsigned long>( comm )))
MPI_Comm comm = reinterpret_cast<MPI_Comm>( args(ARGN).uint64_scalar_value().value() );


*****
* 2.- *
*****
the args error-checking code was rejecting those new 8B integers as valid MPI objects. So we also included the MPITB_isOPtrTyp() macro to replace the previous ov.is_scalar() test.

-javier

P.S.:
Out of curiousity, these are the pointer values causing the SegFault

_____________________________________________________
octave-2.1.72:4> a=MPI_COMM_WORLD
a = 2.3058430092933637e+18
octave-2.1.72:5> whos a

*** local user variables:

 Prot Name        Size                     Bytes  Class
 ==== ====        ====                     =====  =====
  rwd a           1x1                          8  scalar

Total is 1 element using 8 bytes

octave-2.1.72:6> [info rank]=MPI_Comm_rank(MPI_COMM_WORLD)
MPI process rank 0 (n0, p8187) caught a SIGSEGV in MPI_Comm_rank.

_____________________________________________________

octave-2.1.72:4> a=MPI_COMM_WORLD
a = 2305843009293363640
octave-2.1.72:5> whos a

*** local user variables:

 Prot Name        Size                     Bytes  Class
 ==== ====        ====                     =====  =====
  rwd a           1x1                          8  uint64 scalar

Total is 1 element using 8 bytes

octave-2.1.72:6> [info rank]=MPI_Comm_rank(MPI_COMM_WORLD)
error: MPI_Comm_rank: required arg#1: comm(int)

_____________________________________________________
Can you see the rounding up (not truncation) when translated to double? I didn't expect such a huge address.

a = 2.3058430092933637e+18

a = 2305843009293363640




-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.

Octave's home on the web:  http://www.octave.org
How to fund new projects:  http://www.octave.org/funding.html
Subscription information:  http://www.octave.org/archive.html
-------------------------------------------------------------



reply via email to

[Prev in Thread] Current Thread [Next in Thread]