lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] "Stack overflow" resolved [Was: Restructuring code that...]


From: Greg Chicares
Subject: Re: [lmi] "Stack overflow" resolved [Was: Restructuring code that...]
Date: Sat, 23 Feb 2019 13:23:29 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0

On 2019-02-23 00:26, Vadim Zeitlin wrote:
> On Fri, 22 Feb 2019 23:00:37 +0000 Greg Chicares <address@hidden> wrote:
> 
> GC> On 2019-02-22 03:12, Greg Chicares wrote:
> GC> [...]
> GC> > Unfortunately, I've run into a problem. Today I added a couple dozen new
> GC> > products that are to be introduced later this year, and now 'make 
> install'
> GC> > fails as follows on this line
> GC> >         @cd $(datadir); $(PERFORM) $(bindir)/product_files$(EXEEXT)
> GC> > of its recipe:
> GC> > 
> GC> > wine: Unhandled stack overflow at address 0x6d0d37fc (thread 01db), 
> starting debugger...
> GC> > 01db:err:seh:setup_exception_record stack overflow 880 bytes in thread 
> 01db eip 7bc2baed esp 007b0f
> GC> > 
> GC> > 'product_files$(EXEEXT)' is a 32-bit msw binary that doesn't use SEH, 
> so I
> GC> > think the "seh:setup_exception_record" part must indicate a defect in 
> 'wine'
> GC> > rather than in 'product_files$(EXEEXT)'.
> GC> 
> GC> My current theory is that this was an actual overflow in my 32-bit
> GC> msw binary (obscured by being wrapped in a 'wine' SEH message).
> 
>  I didn't have time to try this yet, but if this is indeed the case, I
> should be seeing it under native MSW, right?

No, because you don't have the proprietary sources.

> Would you like me to test this
> assumption or is it better to skip testing it to avoid disappointment if
> the result doesn't conform to our expectations?

I always prefer to know the truth. However...

> If so, could you please
> make available your binary somewhere so that I could run it?

...that binary and its sources embody a particular insurance company's
proprietary trade secrets. Anyone who sees them becomes subject to
bureaucratic rules that you really, really want to avoid.

Maybe Kim will be able to test it sometime next week, if convenient.
The question is whether any error such as "stack overflow" occurs with
this SHA1 in the proprietary repository [this commit message contains
no trade secrets]...

  commit dd7d79db9eb409ce6458aac8e44dfe9f6f83acd4
  Author: Gregory W. Chicares <address@hidden>
  Date:   2019-02-22T14:20:29+00:00

    Write '.database' files for new 2017 CSO products
    
    This may very well work already for native msw; for 'wine', it appears
    that building with special flags can solve the problem noted yesterday.

...before updating lmi to this SHA1 in the public repository:

  commit c1feb6d07cb91efa8b32e7e70722f0eacf46a178
  Author: Gregory W. Chicares <address@hidden>
  Date:   2019-02-22T23:35:45+00:00

    Prevent a stack overflow

> GC> The proprietary 'my_db.cpp' and 'my_prod.cpp' files are both
> GC> enormous.
> 
>  Can I ask the obvious question: what prevents you from splitting them into
> several parts?

It is very convenient to have all products in a single source file.
That makes it easier to compare one product to another, or one
family of products to another. It also makes maintenance easier:
for instance, monthly crediting-rate updates require editing only
one 80-line portion of one file. And it also facilitated this month's
grand refactoring, which has reduced the size of the source code by
two-thirds even though the number of products has simultaneously
grown by one-fifth:

  'du -b proprietary/src/my_db.cpp'
  as of various dates
  -------------------
  2016-09-23  569827
  2018-03-03  629575
  2019-02-22  196241

> GC> .../src/my_prod.cpp:1362:6: note: variable tracking size limit
> GC>  exceeded with -fvar-tracking-assignments, retrying without
> GC>  void product_data::write_proprietary_policy_files()
> GC>       ^~~~~~~~~~~~
> GC> 
> GC> [...] I used '-fno-var-tracking-assignments' to suppress it.
> 
>  OK, it looks -fvar-tracking-assigment (about whose existence I had been
> blissfully unaware until today) is enabled by default by -Os and so it
> makes sense to disable it if it doesn't work anyhow.

In 'workhorse.make', I've specified
  -Os -fno-var-tracking-assignments
which, though harmlessly redundant, emphasizes that the latter flag is
required if the '-Os' part is changed. And I did experiment with other '-O'
flags that don't necessarily incorporate '-fno-var-tracking-assignments'.
Since some results are still in scrollback, I'll paste them here for the
record...

Results of two successive runs of this command:
  time make $coefficiency install check_physical_closure
All proprietary object files were removed before each "first" run, which
represents time to build and execute. The second run compiles nothing and
merely executes the binary.

-O2 -fno-var-tracking-assignments
  43.14s user 2.28s system 159% cpu 28.493 total
  2.59s user 0.19s system 92% cpu 3.004 total

-O1 -fno-var-tracking-assignments
  26.44s user 1.20s system 189% cpu 14.569 total
  2.62s user 0.19s system 92% cpu 3.037 total

-Os -fno-var-tracking-assignments
  27.04s user 1.62s system 165% cpu 17.289 total
  2.55s user 0.18s system 92% cpu 2.962 total

The time it takes to run the program varies only slightly, and, as expected:
  -O1 < -Os < -O2
because '-Os' is like '-O2' with some optimizations inhibited. The build
time is double for '-O2', but comparable for '-O1' and '-Os'. I chose
to commit '-Os' mainly because it seems to express the intention better.

> GC> Now here's something weird. The diagnostics for each file were
> GC>   my_db.cpp  : "stack overflow"
> GC>   my_prod.cpp: "variable tracking size limit exceeded"
> GC> with '-O0'; but with '-O2 -fno-var-tracking-assignments', there
> GC> are no diagnostics at all. Which of those two flags prevented
> GC> the stack overflow at run time? Testing both separately shows
> GC> that it's '-O2'. Thus, '-O2' or '-Os' (above) prevent the stack
> GC> overflow, which occurs only with '-O0'. Thus, for gcc-3.4.5
> GC> (with an older version of this code), forbidding optimization
> GC> ('-O0') was a necessary workaround; but today that workaround
> GC> has become the real problem ("stack overflow"), and the new
> GC> workaround is to require optimization.
> 
>  It's a bit surprising that unoptimized code dies with a stack overflow,
> which normally happens when there is deep recursion which the compiler
> shouldn't be able to optimize anyhow, but I guess it may be possible...

Without testing under native msw, we can't know for sure whether the
"stack overflow" was in 'product_files.exe' or in 'wine'. But if it
was in 'product_files.exe' as I conjecture, then it's not at all the
kind of stack overflow I normally encounter (infinite recursion due
to a coding mistake)--instead, it's the result of pushing N bytes
onto an M-byte stack where N was just slightly less than M last week,
but exceeded M after I added new products this week. I think this
would be the first time I've ever seen a non-infinite stack overflow.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]