gcl-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gcl-devel] HEAD Maxima and HEAD trad GCL


From: Camm Maguire
Subject: Re: [Gcl-devel] HEAD Maxima and HEAD trad GCL
Date: 06 Jan 2004 12:39:15 -0500
User-agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2

Greetings!

"Mike Thomas" <address@hidden> writes:

> Hi Camm.
> 
> | > OK, but defpackage in the HEAD CLtL1 build is missing.
> | > .....
> |
> I had previously believed that defpackage was part of CLtL1 - I didn't
> realise it was something you had added later to GCL.
> 

OK.  So I take it there is no problem here.

> |  We should definitely get to the bottom of
> | this.  Vadim had done some work in this regard, but I think most of
> | the fruit of that labor was the expansion of the heap size.  (Speaking
> | of which, we really should integrate unexw32.c at some point.)  I had
> | suspected that the core was growing into already used areas, but this
> | seems less likely now.  We may be overruning the *link-array* or
> | otherwise corrupting the C stack at the offending function call.  In
> | any case, I need a refresher on this.  If you and/or Vadim could post
> | a gdb backtrace (again) at the point of segfault with fast-links left
> | on, do a frame 1, disassemble, and report the register contents at the
> | point of the call to the offending function, this would be a start.
> 
> Sorry but I'm not fully understanding what you're saying - I've included a
> cut and paste below - run,  frame 1, bt, info frame and a quick run up the
> stack.
> 
> Is the current stack frame (frame 1) the arguments of the current function
> present on the stack?  In any event, none of the addresses seem to be inside
> functions.
> 

OK, thank you for this detailed output.  Comments below.  In general
on x86, the stack pointer is kept in the register esp, and one can
examine the call stack by treating its value as an integer array
address, and incrementing forward or backward depending on the stack
increment direction and inspecting the values at that address.  For
example:

i reg esp
p/x *((int *)<value>-16)@32

You don't need this for the below.

> 
> | If memory serves, we had pinned the problem down to the very function
> | call itself with our printf/format debugging.  I'm guessing now that
> | somehow a bad value is getting written to the function pointers stored
> | in compiled lisp files via the link-array.  (The way this works
> | basically is that these pointers are initially set to stubs present in
> | the same generated C file, which in turn call call_or_link et.al. with
> | the address to the original function pointer as one argument.  When
> | fast-linking is on, the function pointer is reset from the stub to the
> | real function address which call_or_link finds, so that call_or_link
> | is invoked only once, with subsequent calls going straight to the
> | intended function as in C.)
> 
> What is the link-array - is it the result of loading the "*.data" files
> produced by the compiler?
> 

No -- it is simply an array containing pointer pairs.  The first
element of the pair is the address of the function pointer used in the
compiled lisp code.  The second value is the address of the static C
function stub.  After these values are saved, the real address of the
desired function is saved at the former address.  The link array
stores the information necessary to reverse this process when
(sue-fast-links nil) is invoked.  Pretty amazing design if you asked
me.

> What do you mean by stubs in the C file? (A specific name in a particular C
> file would help to make this a little more concrete for me.)
> 

Look at the bottom of pcl_boot.c:

static object  LnkTLI495(object first,...){object V1;va_list 
ap;va_start(ap,first);V1=call_proc_new(VV[495],(void **)(void 
*)&LnkLI495,5,first,ap);va_end(ap);return V1;} /* 
OPTIMIZE-GENERIC-FUNCTION-CALL */

is the stub.  A call to the function appears earlier as in 

        base[3]= 
(*(LnkLI495))((V1337),(base0[5]->c.c_car),(V1339),(base0[4]->c.c_car),(base0[3]->c.c_car));

The value of this pointer is initialized to the stub statically in
pcl_boot.h:

static object  LnkTLI495(object,...);
static object  (*LnkLI495)() = (object (*)()) LnkTLI495;

This value (i.e. to the stub) will only be used once when fast links
are on. 


> | I've occasionally run into these as well, so this may not be Windows
> | specific.  If you find a reproducible instance which does not appear
> | on Linux, I'd be interested.
> 
> I'll try and find one for you.
> 

Thanks!  A negative result is also of interest.

> | I'd be very alarmed if at least this behavior were not attainable.
> | Nothing should have changed to make this impossible at present.
> 
> I'm alarmed almost every time I look at the internals of GCL!!
> 

It is a bit from a different era, but as I work with it, I'm
repeatedly impressed by the sophistication of some of the key
algorithms, like this one.  In principle, this feature together with
adequate proclaiming, makes a lisp function call as fast as one in C,
as Schelter originally claimed on his website.

Further steps outlined below.

> Cheers
> 
> MIke Thomas.
> 
> 
> 
> ===============DEBUGGER OUTPUT========================================
> 
> .....
> Loading binary of PCL_DFUN...
> Loading binary of PCL_FAST_INIT...
> Loading binary of PCL_BRAID...
> 
> Program received signal SIGSEGV, Segmentation fault.
> 0x1030c13d in ?? ()
> (gdb) frame 1
> #1  0x00000640 in ?? ()
> (gdb) bt
> #0  0x1030c13d in ?? ()
> #1  0x00000640 in ?? ()
> #2  0x00000640 in ?? ()
> #3  0xffffffff in ?? ()
> #4  0x102d6104 in ?? ()
> #5  0x005b0e94 in value_stack ()
> #6  0x1021d0b4 in ?? ()
> #7  0x0022be88 in ?? ()
> #8  0x005b0e98 in value_stack ()
> #9  0x005b0e78 in value_stack ()
> #10 0x005b0e90 in value_stack ()
> #11 0x0022be88 in ?? ()

OK, this is the location of the corruption, given the value of
fun->cf.cf_self you report below.

> #12 0x004314ac in call_or_link (sym=0x1022c3a8, link=0x1031d57c)
>     at funlink.c:71
> #13 0x1031cb77 in ?? ()
> #14 0x1022c3a8 in ?? ()
> #15 0x1031d57c in ?? ()
> #16 0x10268858 in ?? ()
> #17 0x1030e4b7 in ?? ()
> #18 0x10242c3c in ?? ()
> #19 0x005a57e0 in small_fixnum_table ()
> #20 0x1031f054 in ?? ()
> #21 0x1030e4eb in ?? ()
> ---Type <return> to continue, or q <return> to quit---q
> Quit (expect signal SIGINT when the program is resumed)
> (gdb) info frame
> Stack level 1, frame at 0x22be58:
>  eip = 0x640; saved eip 0x640
>  called by frame at 0x22be5c, caller of frame at 0x22be54
>  Arglist at 0x22be50, args:
>  Locals at 0x22be50, Previous frame's sp is 0x22be58
>  Saved registers:
>   eip at 0x22be54

In all the frames in which you attempt disassembly, the program is in
a corrupted state.  The first such frame at this point where
disassemble would work is frame 12.  disassemble takes no arguments to
my knowledge, though this could just be the way I always use it.

What you want to do now is set a break point at funlink.c:71 (Though
in my source the line you appear to be at is 55???), conditionalize it
as follows:

cond <break number> fun->cf.cf_self == 0x1030c130

Also set a break at the troublesome address:

b *0x1030c130

run, and at the first breakpoint, please do

p fun->cf.cf_name->st

and then continue once, where you should be at the address above.
Then you can do disassemble.  You or we can then look for suspicious
points at which the program jumps to the spurious 0x0022be88 listed in
stack 11.

<snip>

> (gdb) p *sLAlink_arrayA
> $8 = {FIX = {t = 8 '\b', flag = 0 '\0', s = 0 '\0', m = 0 '\0',
>     FIXVAL = 269964752}, big = {t = 8 '\b', flag = 0 '\0', s = 0 '\0',
>     m = 0 '\0', big_mpz_t = {_mp_alloc = 269964752, _mp_size = 5509808,
>       _mp_d = 0x52c3f3}}, rat = {t = 8 '\b', flag = 0 '\0', s = 0 '\0',
>     m = 0 '\0', rat_den = 0x101755d0, rat_num = 0x5412b0}, SF = {t = 8 '\b',
>     flag = 0 '\0', s = 0 '\0', m = 0 '\0', SFVAL = 2.98456067e-029}, LF = {
>     t = 8 '\b', flag = 0 '\0', s = 0 '\0', m = 0 '\0',
>     LFVAL = 4.175446029447999e-307}, cmp = {t = 8 '\b', flag = 0 '\0',
>     s = 0 '\0', m = 0 '\0', cmp_real = 0x101755d0, cmp_imag = 0x5412b0}, ch
> = {
>     t = 8 '\b', flag = 0 '\0', s = 0 '\0', m = 0 '\0', ch_code = 21968,
>     ch_font = 23 '\027', ch_bits = 16 '\020'}, s = {t = 8 '\b', flag = 0
> '\0',
>     s = 0 '\0', m = 0 '\0', s_dbind = 0x101755d0,
>     s_sfdef = 0x5412b0 <Cnil_body>, st_self = 0x52c3f3 "*LINK-ARRAY*",
>     st_fillp = 12, s_gfdef = 0x0, s_plist = 0x5412b0, s_hpack = 0x10103fa4,
>     s_stype = 2, s_mflag = 0}, p = {t = 8 '\b', flag = 0 '\0', s = 0 '\0',
>     m = 0 '\0', p_name = 0x101755d0, p_nicknames = 0x5412b0,
>     p_shadowings = 0x52c3f3, p_uselist = 0xc, p_usedbylist = 0x0,
>     p_internal = 0x5412b0, p_external = 0x10103fa4, p_internal_size = 2,
>     p_external_size = 8, p_internal_fp = 0, p_external_fp = 5509808,
>     p_link = 0x52c3ed}, c = {t = 8 '\b', flag = 0 '\0', s = 0 '\0',
>     m = 0 '\0', c_cdr = 0x101755d0, c_car = 0x5412b0}, ht = {t = 8 '\b',
>     flag = 0 '\0', s = 0 '\0', m = 0 '\0', ht_self = 0x101755d0,
>     ht_rhsize = 0x5412b0, ht_rhthresh = 0x52c3f3, ht_nent = 12, ht_size = 0,
> ---Type <return> to continue, or q <return> to quit---
>     ht_test = 4784}, a = {t = 8 '\b', flag = 0 '\0', s = 0 '\0', m = 0 '\0',
>     a_displaced = 0x101755d0, a_rank = 4784, a_elttype = 84,
>     a_self = 0x52c3f3, a_adjustable = 12, a_offset = 0, a_dim = 0,
>     a_dims = 0x5412b0}, v = {t = 8 '\b', flag = 0 '\0', s = 0 '\0',
>     m = 0 '\0', v_displaced = 0x101755d0, v_hasfillp = 4784, v_elttype = 84,
>     v_self = 0x52c3f3, v_fillp = 12, v_dim = 0, v_adjustable = 4784,
>     v_offset = 84}, st = {t = 8 '\b', flag = 0 '\0', s = 0 '\0', m = 0 '\0',
>     st_displaced = 0x101755d0, st_hasfillp = 4784, st_adjustable = 84,
>     st_self = 0x52c3f3 "*LINK-ARRAY*", st_fillp = 12, st_dim = 0}, ust = {
>     t = 8 '\b', flag = 0 '\0', s = 0 '\0', m = 0 '\0',
>     ust_displaced = 0x101755d0, ust_hasfillp = 4784, ust_adjustable = 84,
>     ust_self = 0x52c3f3 "*LINK-ARRAY*", ust_fillp = 12, ust_dim = 0}, bv = {
>     t = 8 '\b', flag = 0 '\0', s = 0 '\0', m = 0 '\0',
>     bv_displaced = 0x101755d0, bv_hasfillp = 4784, bv_elttype = 84,
>     bv_self = 0x52c3f3 "*LINK-ARRAY*", bv_fillp = 12, bv_dim = 0,
>     bv_adjustable = 4784, bv_offset = 84}, str = {t = 8 '\b', flag = 0 '\0',
>     s = 0 '\0', m = 0 '\0', str_def = 0x101755d0, str_self = 0x5412b0}, sm =
> {
>     t = 8 '\b', flag = 0 '\0', s = 0 '\0', m = 0 '\0', sm_fp = 0x101755d0,
>     sm_object0 = 0x5412b0, sm_object1 = 0x52c3f3, sm_int0 = 12, sm_int1 = 0,
>     sm_buffer = 0x5412b0 "\b", sm_mode = -92 'n', sm_flags = 63 '?',
>     sm_fd = 4112}, rnd = {t = 8 '\b', flag = 0 '\0', s = 0 '\0', m = 0 '\0',
>     rnd_value = 269964752}, rt = {t = 8 '\b', flag = 0 '\0', s = 0 '\0',
>     m = 0 '\0', rt_self = 0x101755d0}, pn = {t = 8 '\b', flag = 0 '\0',
> ---Type <return> to continue, or q <return> to quit---
>     s = 0 '\0', m = 0 '\0', pn_host = 0x101755d0, pn_device = 0x5412b0,
>     pn_directory = 0x52c3f3, pn_name = 0xc, pn_type = 0x0,
>     pn_version = 0x5412b0}, cf = {t = 8 '\b', flag = 0 '\0', s = 0 '\0',
>     m = 0 '\0', cf_name = 0x101755d0, cf_self = 0x5412b0 <Cnil_body>,
>     cf_data = 0x52c3f3}, cc = {t = 8 '\b', flag = 0 '\0', s = 0 '\0',
>     m = 0 '\0', cc_name = 0x101755d0, cc_self = 0x5412b0 <Cnil_body>,
>     cc_env = 0x52c3f3, cc_data = 0xc, cc_envdim = 0, cc_turbo = 0x5412b0},
>   cl = {t = 8 '\b', flag = 0 '\0', s = 0 '\0', m = 0 '\0',
>     cl_name = 0x101755d0, cl_self = 0x5412b0 <Cnil_body>, cl_data =
> 0x52c3f3,
>     cl_argd = 12, cl_envdim = 0, cl_env = 0x5412b0}, sfn = {t = 8 '\b',
>     flag = 0 '\0', s = 0 '\0', m = 0 '\0', sfn_name = 0x101755d0,
>     sfn_self = 0x5412b0 <Cnil_body>, sfn_data = 0x52c3f3, sfn_argd = 12},
>   vfn = {t = 8 '\b', flag = 0 '\0', s = 0 '\0', m = 0 '\0',
>     vfn_name = 0x101755d0, vfn_self = 0x5412b0 <Cnil_body>,
>     vfn_data = 0x52c3f3, vfn_minargs = 12, vfn_maxargs = 0}, cfd = {
>     t = 8 '\b', flag = 0 '\0', s = 0 '\0', m = 0 '\0',
>     cfd_start = 0x101755d0 "\r", cfd_size = 5509808, cfd_fillp = 5424115,
>     cfd_self = 0xc}, spc = {t = 8 '\b', flag = 0 '\0', s = 0 '\0', m = 0
> '\0',
>     spc_dummy = 269964752}, d = {t = 8 '\b', flag = 0 '\0', s = 0 '\0',
>     m = 0 '\0'}, fixa = {t = 8 '\b', flag = 0 '\0', s = 0 '\0', m = 0 '\0',
>     fixa_displaced = 0x101755d0, fixa_rank = 4784, fixa_elttype = 84,
>     fixa_self = 0x52c3f3, fixa_adjustable = 12, fixa_offset = 0, fixa_dim =
> 0,
>     fixa_dims = 0x5412b0}, sfa = {t = 8 '\b', flag = 0 '\0', s = 0 '\0',
> ---Type <return> to continue, or q <return> to quit---
>     m = 0 '\0', sfa_displaced = 0x101755d0, sfa_rank = 4784, sfa_elttype =
> 84,
>     sfa_self = 0x52c3f3, sfa_adjustable = 12, sfa_offset = 0, sfa_dim = 0,
>     sfa_dims = 0x5412b0}, lfa = {t = 8 '\b', flag = 0 '\0', s = 0 '\0',
>     m = 0 '\0', lfa_displaced = 0x101755d0, lfa_rank = 4784, lfa_elttype =
> 84,
>     lfa_self = 0x52c3f3, lfa_adjustable = 12, lfa_offset = 0, lfa_dim = 0,
>     lfa_dims = 0x5412b0}}
> (gdb) whatis *sLAlink_arrayA

When printing lisp objects in gdb, it is easier to inspect their type
first. For example

p (enum type)sLAlink_arrayA->d.t

and assuming t_vector comes back, you can

p sLAlink_arrayA->v

You can look at the elements which are lisp objects in like manner.
Frequently used types in integral form are

8 t_symbol
13 t_string
0 t_cons

I can only imagine three possibilities right now, with the third the
most likely

1) The address reported for fun->cf.cf_self is wrong/corrupted.  This
   could be due to a relocation error when the module containing it
   was loaded.  I can show you how to trap the address at relocation
   time if this appears to be the case.

2) There is a missing flush of the data cache as is required on some
   architectures in calling addresses dynamically loaded into the
   .data section (e.g. arm, several others).  This is definitely not
   needed on Linux x86 and sparc (as well as several others), and when
   it does show up, one often gets non-reproducible behavior, so I'm
   doubting this one.

3) Some earlier corruption has been introduced into the body of this
   function (or perhaps its address) as a result of earlier resettings
   of function pointers in call_link -- after all this problem goes
   away without fast-linking.  When we get the function name and
   disassembly, we can set harwared watchpoints to detect when the
   spurious values get introduced.

Take care,
-- 
Camm Maguire                                            address@hidden
==========================================================================
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah




reply via email to

[Prev in Thread] Current Thread [Next in Thread]