bug-hurd
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Bug#77857: hurd: write_node assertion failed building emacs


From: Marcus Brinkmann
Subject: Bug#77857: hurd: write_node assertion failed building emacs
Date: Fri, 24 Nov 2000 03:51:55 +0100
User-agent: Mutt/1.1.4i

Hi,

On Thu, Nov 23, 2000 at 07:57:08PM -0500, Roland McGrath wrote:
> > This seems to be some race condition betweenm the sync thread and other
> > dn_set_?time mangling stuff. 
> 
> I would tend to agree.  Notice for example thread 3, which appears to be a
> peropen in the process of dying.  That is probably the temacs open file
> descriptor on the file being written, being closed within the few seconds
> while your sleep call was blocking ext2fs from crashing.  Something to
> think about is that (last I knew) temacs is writing the data with mmap
> rather than write; that indicates the possibility of the file pager being
> the suspect agent interacting with the sync thread.

This is probably strengthened by my new info. I followed your advice and
found out that the positive ctime and mtime settings come from block_getblk
in ext2fs/getblk.c: (Only the end of the function, with my change 1->14)

  node->dn_set_ctime = node->dn_set_mtime = 14;
  node->dn_stat.st_blocks += 1 << log2_stat_blocks_per_fs_block;
  node->dn_stat_dirty = 14;

  return 0;
}

I did this test twice, with the same result.
You'll find the 14's in the attached stack trace. This seems to point at
diskfs_grow/pager_unlock_page vs sync thread (again! ;). I want to point out
that the above does not call diskfs_node_update, while inode_getblk does in
at least some cases:

  node->dn_set_ctime = node->dn_set_mtime = 13;
  node->dn_stat.st_blocks += 1 << log2_stat_blocks_per_fs_block;
  node->dn_stat_dirty = 13;

  if (diskfs_synchronous || node->dn->info.i_osync)
    diskfs_node_update (node, 1);

  return 0;
}

Note that the assertion failure comes shortly after the message
"Dumping under names..." appears on the screen. dump-emacs calls
map_out_data, but a grep showed nothing further about map+out+data,
I don't know where it is defined... (strange).

> A wacky idea
> that might help narrow down quickly is to frob every place that sets
> dn_set_?time so that instead of setting them to 1 it uses a unique nonzero
> value in each location (or maybe fetches the caller's PC or something);
> then you should be able to see what code touched it last in the race.

Okay, that's done. I could probably set the value dependant on the caller by
passing down an arg from pager_unlock_page/diskfs_grow or somewhere, if this
might be helpful.

> > * The node->lock is held, which should probably avoid syncing?
> 
> > * Not all parts of the system which set some dn_set_?time flag call
> > diskfs_node_update consecutively (for example, write_symlink). 
> 
> I don't think that ought to be a problem; in fact, it would be seriously
> inefficient to make them do so.  This just means that the time fields don't
> need to be updated (e.g. atime for a read) until the node is making its
> normal way out to be synchronized.  Whenever someone does call
> diskfs_node_update, that will take care of dn_set_?time.  But I'm not
> entirely clearly on all this code.

Me too. Especially I am wondering what *should* guarantee that dn_set_?time
are cleared before write_node is entered. I can't see anything in the code
that takes care of that. write_node is called by

1. diskfs_write_disknode, which is only called from diskfs_node_update,
   which calls diskfs_set_node_times, which clears the flags (OKAY); and

2. write_one_disknode (in write_all_disknodes), which calls
   diskfs_set_node_times, then pokel_sync indirect blocks (with wait=1),
   then write_node.

What we see is a failure in case 2. Between write_one_disknode and
write_node, while pokel_sync runs, it seems that the file is grown and
block_getblk called. Note the utter lack of any locking here.

So what happens when we extend the file while it's indirect block list is
pokel_sync'ed to the disk? Is this actually possible?

Thanks,
Marcus

-- 
`Rhubarb is no Egyptian god.' Debian http://www.debian.org brinkmd@debian.org
Marcus Brinkmann              GNU    http://www.gnu.org    marcus@gnu.org
Marcus.Brinkmann@ruhr-uni-bochum.de
http://www.marcus-brinkmann.de

Attachment: emacs3
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]