[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Bug#77857: hurd: write_node assertion failed building emacs
From: |
Marcus Brinkmann |
Subject: |
Bug#77857: hurd: write_node assertion failed building emacs |
Date: |
Fri, 24 Nov 2000 03:51:55 +0100 |
User-agent: |
Mutt/1.1.4i |
Hi,
On Thu, Nov 23, 2000 at 07:57:08PM -0500, Roland McGrath wrote:
> > This seems to be some race condition betweenm the sync thread and other
> > dn_set_?time mangling stuff.
>
> I would tend to agree. Notice for example thread 3, which appears to be a
> peropen in the process of dying. That is probably the temacs open file
> descriptor on the file being written, being closed within the few seconds
> while your sleep call was blocking ext2fs from crashing. Something to
> think about is that (last I knew) temacs is writing the data with mmap
> rather than write; that indicates the possibility of the file pager being
> the suspect agent interacting with the sync thread.
This is probably strengthened by my new info. I followed your advice and
found out that the positive ctime and mtime settings come from block_getblk
in ext2fs/getblk.c: (Only the end of the function, with my change 1->14)
node->dn_set_ctime = node->dn_set_mtime = 14;
node->dn_stat.st_blocks += 1 << log2_stat_blocks_per_fs_block;
node->dn_stat_dirty = 14;
return 0;
}
I did this test twice, with the same result.
You'll find the 14's in the attached stack trace. This seems to point at
diskfs_grow/pager_unlock_page vs sync thread (again! ;). I want to point out
that the above does not call diskfs_node_update, while inode_getblk does in
at least some cases:
node->dn_set_ctime = node->dn_set_mtime = 13;
node->dn_stat.st_blocks += 1 << log2_stat_blocks_per_fs_block;
node->dn_stat_dirty = 13;
if (diskfs_synchronous || node->dn->info.i_osync)
diskfs_node_update (node, 1);
return 0;
}
Note that the assertion failure comes shortly after the message
"Dumping under names..." appears on the screen. dump-emacs calls
map_out_data, but a grep showed nothing further about map+out+data,
I don't know where it is defined... (strange).
> A wacky idea
> that might help narrow down quickly is to frob every place that sets
> dn_set_?time so that instead of setting them to 1 it uses a unique nonzero
> value in each location (or maybe fetches the caller's PC or something);
> then you should be able to see what code touched it last in the race.
Okay, that's done. I could probably set the value dependant on the caller by
passing down an arg from pager_unlock_page/diskfs_grow or somewhere, if this
might be helpful.
> > * The node->lock is held, which should probably avoid syncing?
>
> > * Not all parts of the system which set some dn_set_?time flag call
> > diskfs_node_update consecutively (for example, write_symlink).
>
> I don't think that ought to be a problem; in fact, it would be seriously
> inefficient to make them do so. This just means that the time fields don't
> need to be updated (e.g. atime for a read) until the node is making its
> normal way out to be synchronized. Whenever someone does call
> diskfs_node_update, that will take care of dn_set_?time. But I'm not
> entirely clearly on all this code.
Me too. Especially I am wondering what *should* guarantee that dn_set_?time
are cleared before write_node is entered. I can't see anything in the code
that takes care of that. write_node is called by
1. diskfs_write_disknode, which is only called from diskfs_node_update,
which calls diskfs_set_node_times, which clears the flags (OKAY); and
2. write_one_disknode (in write_all_disknodes), which calls
diskfs_set_node_times, then pokel_sync indirect blocks (with wait=1),
then write_node.
What we see is a failure in case 2. Between write_one_disknode and
write_node, while pokel_sync runs, it seems that the file is grown and
block_getblk called. Note the utter lack of any locking here.
So what happens when we extend the file while it's indirect block list is
pokel_sync'ed to the disk? Is this actually possible?
Thanks,
Marcus
--
`Rhubarb is no Egyptian god.' Debian http://www.debian.org brinkmd@debian.org
Marcus Brinkmann GNU http://www.gnu.org marcus@gnu.org
Marcus.Brinkmann@ruhr-uni-bochum.de
http://www.marcus-brinkmann.de
emacs3
Description: Text document