bug-hurd
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Bug#77857: hurd: write_node assertion failed building emacs


From: Marcus Brinkmann
Subject: Bug#77857: hurd: write_node assertion failed building emacs
Date: Fri, 24 Nov 2000 01:38:43 +0100
User-agent: Mutt/1.1.4i

On Thu, Nov 23, 2000 at 07:13:51PM -0500, Roland McGrath wrote:
> Please try to get a stack trace from the assertion failure.  You could
> attach gdb in noninvasive mode and hit it, or you could just hack the code
> to use glibc's backtrace (execinfo.h) function and print it out rather than
> using assert.

I have put in a sleep(3600), which was sufficient to attach gdb when it hit
the bug.

The complete stack trace and data content is attached as a script session
(has gdb a functionality to pipe output in a file?). I have not come too far
interpreting what I see, but there are  acouple of things that are easy to
get from it:

* The assertion failure (and thus the write_node) happens during a normal
sync process.

* The inode which has such time flags set is the inode temacs is dumping the
emacs binary file to. The flags set are dn_set_mtime and dn_set_ctime (while
dn_set_atime is cleared).

* The file size is the final file size, so the file is completely written by
the time.

* Maybe Important: The file emacs is hardlinked to emacs-20.7 as well!
(The hard link or the creation of it might be relevant). I don't know if it
first written and then hard linked, or first hard linked and then written,
or if this matters.

Random thoughts:
* The node->lock is held, which should probably avoid syncing?
* Not all parts of the system which set some dn_set_?time flag call
diskfs_node_update consecutively (for example, write_symlink). I don't know
if this is a requirement and possible point of failure. When nodes can be
written (synced) while they are locked, there is a lot of room for race
conditions (everywhere where dn_set_?time is set, even when it is directly
followed by a diskfs_node_update).

This seems to be some race condition betweenm the sync thread and other
dn_set_?time mangling stuff. It's only strange that it never happened
before, and building emacs is such a reproducible test case (huge file with
a hardlink? Can't be the only reason, as it does only happen with a full
build, not with an interrupted and restarted build).

I can reproduce this easily, so if more testing and debugging is requried, I
am happy to do that.

Thanks,
Marcus




-- 
`Rhubarb is no Egyptian god.' Debian http://www.debian.org brinkmd@debian.org
Marcus Brinkmann              GNU    http://www.gnu.org    marcus@gnu.org
Marcus.Brinkmann@ruhr-uni-bochum.de
http://www.marcus-brinkmann.de




reply via email to

[Prev in Thread] Current Thread [Next in Thread]