[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Reboots?
From: |
Marcus Brinkmann |
Subject: |
Re: Reboots? |
Date: |
Mon, 2 Apr 2001 05:43:13 +0200 |
User-agent: |
Mutt/1.3.15i |
On Sun, Apr 01, 2001 at 06:14:27PM -0400, Roland McGrath wrote:
> > Ok. I put it in proc's demuxer, so we miss out anything that is processed
> > earlier. I will move the code into libproc before I do another test (and
> > increase the buffer a bit).
>
> That will have a better chance of catching the crash location, but
> it still could easily be obscured if the corruption is sufficient.
Yes. I have mixed feelings about my latest results. Obviously, it is not
telling us much more than we already know, but it is one more data point as
a comparison, so here it comes. There is a similarity. The previous crash
was in/after set_arg_locations which followed a setmsgport on one local
port, and get_arg_locations on another local port.
This crash is in/after a semsgoprt on one local port, followed by a
get_arg_locations on another local port.
I am not proposing that this vague similarity is pointing at the bug. But
the _arg_locations functions are really very simple, and setmsgport is the
very last function in the bunch that has the potential to do a complex job
(if checkmsghangs is true).
> If you made your buffer really big, it might be that the trace of messages
> would show us something particularly unusual that we could suspect. But I
> would not hold out a lot of hope that the problem will suddenly become
> apparent just because we can see more of the past RPCs.
Right.
> There isn't any reliable way to predict which thread will wake up to take
> the next message from the portset. The best thing I can think of is to
> arrange that there only be one thread waiting in the portset at a time, and
> that each thread completely unwind its stack and die after handling one
> request. Then it should crash immediately during that unwind if the
> clobberation is of the sort I have been describing. This will slow the
> server down a lot.
I think this might be worth a try. We will see if the bug is reproducable
with such a change or not. I hope I am able to hack it up following your
description.
If this working assumption is correct, it doesn't make sense to log reply
messages, right? The proc global lock doesn't prevent a new thread from
coming up, so not seeing a reply logged immediately before the crash does
not mean it is this function that crashed when returning and unwinding a
part of the stack. On the other hand, always seeing a reply message logged
means that the function returned correctly, which means that this bit of the
stack is okay. Mmmh. I will just add reply msg logging to be sure. We'll see
what happens.
Thanks,
Marcus
--
`Rhubarb is no Egyptian god.' Debian http://www.debian.org brinkmd@debian.org
Marcus Brinkmann GNU http://www.gnu.org marcus@gnu.org
Marcus.Brinkmann@ruhr-uni-bochum.de
http://www.marcus-brinkmann.de
Re: Reboots?, Marcus Brinkmann, 2001/04/01