gcl-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gcl-devel] Re: PLT code and Mach-O


From: Camm Maguire
Subject: [Gcl-devel] Re: PLT code and Mach-O
Date: 09 Mar 2004 09:53:54 -0500
User-agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2

Greetings!  OK, perhaps together we can work out a scheme which is as
robust and platform independent as possible.  Toward this end, it is
probably useful to review the nature of the problem that necessitated
this plt stuff in the first place (just so we are all on the same
page).

When GCL loads a compiled .o file, it copies it into its heap, and
relocates the symbols referred to therein to their proper addresses in
the running executable.  These symbols are in 3 broad types:

1) symbols provided explicitly by GCL, e.g. vs_base

2) symbols explicitly written by the lisp compiler which refer to
   functions in external dynamically linked libraries,
   e.g. _setjmp,cos.

3) Various platform-specific symbols referring to functions in
   internal gcc libraries written by the gcc optimizer, e.g. __moddi3
   (i386), .div (sparc), etc.

To find the proper addresses of these symbols, whether relocating via
custreloc or bfd, gcl in the past relied purely on the symbol *values*
in the symbol table of the running executable.  

Defined symbols, e.g. 1), always have had properly setup 'values', and
I think it is reasonable to assume they always will in future ld
development, though I don't know of any reason why ld *must* leave the
symbol table in the executable, i.e. after linking, at all.  Perhaps
someone could enlighten/assure me on this point.

Up until the latest binutils, undefined symbols, e.g. 2) and 3) also
had properly setup values, at least on platforms capable of native
object relocation at present.  These values referred to an internal
*plt table*, which is just a section of the executable containing jump
instructions to addresses that the dynamic linker/loader relocates to
the correct values in the used external shared libraries at runtime.
With the latest binutils, on i386, symbols as in 2) have had their
values zeroed out as a dynamic linker/loader optimization.  Symbols 3)
apparently still have values, though I am not sure if this is required
in future binutils development.  Here again, it would behoove us to
understand, likely with explicit clarification from the binutils
people, what features of the executable can be considered as
permanent.   If any of you can help with this, it would be most
appreciated. 

As we apparently cannot at the very least rely on set symbol values to
relocate the symbols 2), we've put in a two-tiered strategy to find
valid relocations addresses otherwise.  

        a) When the executable has a .plt section output by the linker
        with the -Map option, we parse this file in parse_plt and
        setup an alist mapping symbol strings to addresses in the .plt
        section.  Where this works, this appears to be the most
        comprehensive solution, as one is guaranteed to get all
        symbols referred to in any way by the running executable, and
        allows seamless future GCL lisp compiler optimizations to
        proceed without danger.  (I.e. someday the lisp compiler might
        optimize functions to include explicit calls to strstr and the
        like.)  If this table is empty, everything is still OK due to
        b) below.  I'd appreciate knowing if it is empty, though, on
        mingw. 

        b) In any case, we provide a second minimal table of
        string/address pairs for functions as in 2) known to be
        explicitly called by C code output by the lisp compiler at the
        present time.  Even when there is no plt table, this table
        alone will suffice at present to build maxima, acl2, and
        axiom.  These known functions are listed in plttest.c.  This
        test file is compiled, and the symbol names actually placed in
        the .o file by gcc (i.e. with any name mangling) are read via
        nm and written to the table in plt.h.  The table effectively
        looks like this:

typedef struct {
const char *n;
unsigned long ad;
} Plt;

#include <math.h>
static Plt mplt {{"cos",(unsigned long)(void *)cos}};

        This example above would appear to be purely portable C.
        Vadim has been reporting errors on mingw which would appear to
        indicate that the compiler is incapable on that platform of
        processing this code.  I'm skeptical.  Vadim, could you please
        try copying the above into a dummy test C file, adding '(int
        main() {return 0;}' at the bottom, and try to compile it by
        hand with gcc?  If this does not work, I'd greatly appreciate
        help from Mike in understanding why not.

        Aurelien, my guess is that on macosx, this simple test above
        would work.  If so, then even without the .plt section, you
        should be good to go providing

                1) We get the symbol name mangling right
                2) We remove the -Map ...map option in your case 
                   and just touch an empty map.

        With these steps, I think we should be able to get the same
        code base to compile on macosx without change.

        We can go beyond this and try to incorporate the tables
        referred to by Aurelien below on macosx.  While not necessary
        at present, doing so, to my understanding would give this
        platform as well enhanced protection against future ld changes
        which might remove the symbol values written by ld into the
        executable.  As we are now defaulting to bfd on macosx,
        perhaps we could parse the table from the opened bfd 'bself'
        in sfasli.c.  The .plt section cannot be thus parsed, as it is
        only code (with latest binutils) with no symbol names
        associated.  Barring this possibility, we could provide
        makedefs in macosx.defs to replace the -Map call with a run of
        the tool you show below, and either massage its output or
        modify parse_plt to read it in.  

        The problem with this last part brings up something we should
        keep in mind -- that any changes to the code used to produce
        raw_gcl et.al in unixport/makefile must be synchronized with
        compiler::link in gcl_cmpmain.lsp.  I do not like having to
        specify things in more than one place, but we have to be able
        to rebuild gcl images using the native ld as well as
        save-system, though this is only strictly true when native
        relocation is not supported and one is using dlopen.  As
        dlopen is an option everywhere, I'd like to make sure that
        compiler::link works everywhere too in any case.  So running
        the external macosx table parsing tool will require us to push
        the feature :macosx and write a section in compiler::link with
        #+macosx.  Again, providing the second small, explicit table
        can be correctly compiled in, we can delay the plt-analog
        symbol table reading for later if we choose.


Aurelien, this lush/dldbfd stuff is amazing.  It will take me a while
to analyze, but at first glance I don't see any explicit solution to
the problem above.  Rather, they appear to have written a much nicer
sfaslbfd.c, and one that works on mips(!).  Quite impressive, and I'm
sure very useful to us.  The problem, to my understanding, is that
there is *no information in the bfd* corresponding to executables
written by the latest binutils which can give one the address of
_setjmp, for example.  When there is a plt-like stable statically
written by ld, we can still survive by extracting the table externally
in some fashion.  We cannot use dlsym, as we'd be writing an address
into an object in the statically allocated and saved heap which would
change with each image restart.  So if there is no plt table, nor an
ability to compile in an example like our second table above, we will
have to provide our own stubs for each needed function:

double
my_cos(double x) {
        return cos(x);
}

static Plt mplt[]={{"cos",(unsigned long)(void *)my_cos}};

If I am misunderstanding in any way, please enlighten me.  As stated
earlier I've only just begun to look at the dldbfd stuff.  What is the
license?  

The last possibility is that we put in some configure magic to skip
the plt stuff on windows and macosx.  I don't like this much, as I
fear that eventually the binutils change will catch up there too.

Take care,


Aurelien Chanudet <address@hidden> writes:

> Hi Camm,
> 
> A couple of things :
> 
> 1- I tried out the latest plt code as you instructed. Unfortunately,
> it does not work for Mach-O. Removing the leading underscore from
> symbol names in o/plt.h would not make it work. As stated in a mail I
> send yesterday, Mach-O has no Procedure Linkage Table, although it has
> a slightly equivalent table serving the same purpose, namely the
> indirect symbol table. The Apple linker does not support the -Map
> option, but there's an external object file analysis tool which can be
> used to get the indirect symbol table. Here is such a table for a
> really simplistic executable file (the __picsymbol_stub, __symbol_stub
> and __picsymbolstub1 entries are akin to the PLT and the
> __la_symbol_ptr and __nl_symbol_ptr entries are akin to the GOT) :
> 
> $ otool -IV a.out
> a.out:
> Indirect symbols for (__TEXT,__picsymbol_stub) 0 entries
> address    index name
> Indirect symbols for (__TEXT,__symbol_stub) 0 entries
> address    index name
> Indirect symbols for (__TEXT,__picsymbolstub1) 12 entries
> address    index name
> 0x00001de0   117 _exit
> 0x00001e00   114 _atexit
> 0x00001e20   105 ___keymgr_dwarf2_register_sections
> 0x00001e40   109 __dyld_register_func_for_remove_image
> 0x00001e60   108 __dyld_register_func_for_add_image
> 0x00001e80   110 __init_keymgr
> 0x00001ea0   118 _free
> 0x00001ec0   113 _abort
> 0x00001ee0   112 __keymgr_set_and_unlock_processwide_ptr
> 0x00001f00   111 __keymgr_get_and_lock_processwide_ptr
> 0x00001f20   115 _calloc
> 0x00001f40   120 _printf
> Indirect symbols for (__DATA,__la_symbol_ptr) 12 entries
> address    index name
> 0x00002020   117 _exit
> 0x00002024   114 _atexit
> 0x00002028   105 ___keymgr_dwarf2_register_sections
> 0x0000202c   109 __dyld_register_func_for_remove_image
> 0x00002030   108 __dyld_register_func_for_add_image
> 0x00002034   110 __init_keymgr
> 0x00002038   118 _free
> 0x0000203c   113 _abort
> 0x00002040   112 __keymgr_set_and_unlock_processwide_ptr
> 0x00002044   111 __keymgr_get_and_lock_processwide_ptr
> 0x00002048   115 _calloc
> 0x0000204c   120 _printf
> Indirect symbols for (__DATA,__nl_symbol_ptr) 5 entries
> address    index name
> 0x00002050   116 _errno
> 0x00002054   107 __cthread_init_routine
> 0x00002058   119 _mach_init_routine
> 0x0000205c   106 ___keymgr_global
> 0x00002060   101 _environ
> 
> 2- I happen to discuss about dynamic code loading with someone working
> on Lush. Lush (the Lisp Universal SHell, see
> http://lush.sourceforge.net/) has the ability to load code dynamically
> in much the same manner as GCL does. Long ago, Lush used the Dld
> dynamic link editor :
> 
> http://swissnet.ai.mit.edu/~jaffer/dld_toc.html
> 
> Alas, Dld only supports a.out. For this reason, the Lush people wrote
> their own dynamic link editor using BFD :
> 
> http://cvs.sourceforge.net/viewcvs.py/lush/lush/include/dldbfd.h
> http://cvs.sourceforge.net/viewcvs.py/lush/lush/src/dldbfd.c
> 
> Lush now supports various Linux platforms, including ELF/MIPS which is
> known to have odd BFD features, and Cygwin. An effort to support
> Mach-O
> is underway.
> 
> Aurelien
> 
> 
> 
> 

-- 
Camm Maguire                                            address@hidden
==========================================================================
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah




reply via email to

[Prev in Thread] Current Thread [Next in Thread]