lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Sporadic segfault during OnInit()


From: Vadim Zeitlin
Subject: Re: [lmi] Sporadic segfault during OnInit()
Date: Thu, 6 Feb 2014 19:37:05 +0100

On Wed, 05 Feb 2014 16:41:36 +0000 Greg Chicares <address@hidden> wrote:

GC> Here is a sporadic segfault that occurs about twice in every ten trials.
GC> To reproduce:

 Unfortunately I haven't been able to reproduce it neither using MSVC, nor
using build produced by our autotools-based build process (which I can
easily build) after running it 20 times with each version (and seriously
regretting not having automated dismissing the message boxes somehow
towards the end). I will try to reproduce it later using LMI built using
the official build system, for which I have a special VM, but I am not sure
if I'm going to be really able to do anything about this even if I do
manage to see it with that build because of the considerations below.

GC> By inserting temporary print statements, I determined where the
GC> segfault arises--working from outer through inner frames:
GC>   in 'main_wx.cpp':
GC>     authenticate_system();
GC>   in 'authenticity.cpp':
GC>     system_command("md5sum --check --status " + std::string(md5sum_file()));
GC>   in 'system_command_wx.cpp':
GC>     fatal_error() << std::flush;
GC>   in 'alert.cpp':
GC>     fatal_error_alert_function(alert_string());
GC> 
GC> The outermost frame in which the segfault arises is in 'main_wx.cpp',
GC> inside Skeleton::OnInit(). If I move the offending line in that function:
GC>     authenticate_system();
GC> down, to follow this line:
GC>     frame_->CreateStatusBar();
GC> then no segfault occurs. Could this possibly suggest that there's some
GC> instability in wx while the application is initializing?

 There are some things which can't be (successfully) done until the end of
the initialization. Notably, images can't be loaded before the call to
wxInitAllImageHandlers() which happens later and nothing can be loaded from
wxXmlResource before it's initialized later too. But I don't see absolutely
anything which could result in a crash like this.

GC> I found it hard to reproduce the segfault in gdb, but eventually did
GC> stumble on a rare success:
GC> 
GC> gdb ./lmi_wx_shared
GC> set arg --data_path=/opt/lmi/data/bogus
GC> handle SIGTRAP nostop noprint
GC> run
GC> Program received signal SIGSEGV, Segmentation fault.
GC> 0x0022edf7 in ?? ()
GC> (gdb) bt
GC> #0  0x0022edf7 in ?? ()
GC> #1  0x004ba4a5 in __cxa_throw ()
GC> #2  0x00404caf in fatal_error_alert (s=...) at /lmi/src/lmi/alert_wx.cpp:138
GC> #3  0x01adff06 in fatal_error_buf::raise_alert (this=0x0)
GC>     at 
C:/opt/lmi/MinGW-20090203/bin/../lib/gcc/mingw32/3.4.5/../../../../include/c++/3.4.5/bits/basic_string.h:1456
GC> #4  0x01b7e421 in alert_buf::sync (this=0x0) at /lmi/src/lmi/alert.cpp:144
GC> #5  0x00531277 in std::ostream::flush() ()
GC> #6  0x004a34dd in (anonymous namespace)::concrete_system_command (
GC>     command_line=...)
GC>     at 
C:/opt/lmi/MinGW-20090203/bin/../lib/gcc/mingw32/3.4.5/../../../../include/c++/3.4.5/bits/ostream.tcc:63
GC> #7  0x0199938e in Authenticity::Assay (candidate=..., data_path=...)
GC>     at /lmi/src/lmi/authenticity.hpp:89
GC> #8  0x019999c3 in authenticate_system () at 
/lmi/src/lmi/authenticity.cpp:283
GC> #9  0x0045d523 in Skeleton::OnInit (this=0x3a4cb80)
GC>     at /lmi/src/lmi/main_wx.cpp:646
GC> #10 0x004f8c41 in wxAppConsoleBase::CallOnInit (this=0x0)
GC>     at /opt/lmi/local/include/wx-2.9/wx/app.h:94
GC> #11 0x6c2d4a56 in wxEntryReal(int&, wchar_t**) ()
GC>    from 
/opt/lmi/local/lib/wxmsw295u_gcc_gcc-345-e98c5f92805493f150656403ffef3bb0.dll
GC> #12 0x6c45b613 in wxEntry(HINSTANCE__*, HINSTANCE__*, char*, int) ()
GC> ---Type <return> to continue, or q <return> to quit---
GC>    from 
/opt/lmi/local/lib/wxmsw295u_gcc_gcc-345-e98c5f92805493f150656403ffef3bb0.dll
GC> #13 0x00450e9d in WinMain (hInstance=0x0, hPrevInstance=0x0, lpCmdLine=0x0,
GC>     nCmdShow=0) at /lmi/src/lmi/main_wx.cpp:215
GC> #14 0x004c3028 in main ()
GC> 
GC> Studying lmi's 'fatal_error' code, I was unable to determine how
GC> this is possible.

 It really shouldn't be. Throwing an exception just must not crash inside
__cxa_throw(). Worse, I have absolutely no idea how to debug even if it
does. I guess we should look at __cxa_throw() disassembly (because you
don't have the source information for it, do you?) and try to understand
what's going on there. But upgrading to a newer MinGW/MinGW-W64/TDM-GCC
version looks incomparably more appealing...

GC> Initially I wondered whether the cause might lie somewhere in the
GC> rather old version of libstdc++ we're using; but the 'fatal_error'
GC> stuff always works everywhere else...and it works here, too, if I
GC> rearrange the code as above, so that the code leading to the crash is
GC> invoked later in OnInit(). Do you have any idea what's really happening
GC> here?

 Sorry, really none at all. I am almost sure that it's a bug in
compiler/libstdc++ somewhere because there is no way for something like
this to happen (even if we have somehow corrupted exception handling
information -- which doesn't seem to happen according to the memory
debugging tools I used -- it should still result in a crash only when the
exception is handled, not when it's thrown) if they work correctly. But I
have no idea why does it work most of the time and not just in this
particular case (and then only once out of 10 tries).

 My only idea is to look at the disassembly and try to understand where
exactly does the crash happen, but I'd definitely try upgrading the toolset
first.

 Sorry for lack of more help,
VZ

reply via email to

[Prev in Thread] Current Thread [Next in Thread]