RE: patches for PR 218, index corruption

help-gnats

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: patches for PR 218, index corruption

From:	Dirk Bergstrom
Subject:	RE: patches for PR 218, index corruption
Date:	Thu, 8 Nov 2001 13:05:00 -0800

> However, I don't think the problem is completely gone:
>     DB> i modified getFirstPr() so
>     DB>  that it always calls checkPRChain() to check mtime 
> of the index on
>     DB>  disk before using data in memory; if cached data is 
> stale, the index
>     DB>  is re-read.  also added a final check in the 
> writeIndex routine to
>     DB>  catch problems and alert the administrator.
> The granularity of time_t on GNU systems is one second, so the mtime
> test is better than nothing, but it doesn't ensure the index 
> data isn't
> stale when writing the index.  Wouldn't it be better to always reread
> the index before altering it?

[hmmm, after writing the three paragraphs below, i realized you might be
making a different point.  so i have two separate answers to your
question.]

*) first answer:
well, in theory.  but it's not possible to read the index instantly, so
there's always going to be a race there.  we need to trust that the
database lock will prevent other gnats processes from altering the index
between read & write.  the problem that my patch adresses is that gnatsd
read the index *before* locking the db, leaving a fairly large opening
for other processes to change the index.

with the patch, any routine that accesses the index will trigger an
mtime check, and, if necessary, a re-read.  this means that routines
that change the index (edit, submit, etc.), which only run after a
database lock, will be guaranteed a stable index to work from.  the
(presumably race-proof) database lock is our insurance against a race in
index mtimes.

the final check in writeIndex() is a sanity check, meant to alert the
admin if something Very Bad has happened.  it's not there to prevent
anything.

*) second answer:
oh, wait, i understand the problem.  mtime is only to the second, so we
could read the index, lock the db, and check the mtime, and it would
look fine, but we might miss that someone had written a new index in the
second half of the second that we read it in.  mtime reads the same, but
the index is different.  hmmm, yes, big problem.  still a race, but a
different race.  sneaky, these bugs...  nice work picking up on this,
milan.

i really don't want to force an index read *every* time we access the
index.  that could be a big hit.  i counted seven getFirstPR() calls in
an edit transaction, and if we required an index read for every one,
that could get messy -- the index for our main database is four megs
(and growing by over 200K/month) and reading that seven times would
surely slow things down...

ok, here's a solution:

put a call to getIndex() in lock_gnats(), so that after a successful
lock, we are guaranteed a clean index.  this adds one extra index read,
but it should solve the problem.

what do you think?

--
Dirk Bergstrom               address@hidden
_____________________________________________
Juniper Networks Inc.,          Computer Geek
Tel: 707.433.0564           Fax: 707.433.0769

[Prev in Thread]

Current Thread

[Next in Thread]

RE: patches for PR 218, index corruption, Dirk Bergstrom, 2001/11/01
- Re: patches for PR 218, index corruption, Milan Zamazal, 2001/11/08
- RE: patches for PR 218, index corruption, Dirk Bergstrom <=
  - Re: patches for PR 218, index corruption, Milan Zamazal, 2001/11/09

Prev by Date: Re: patch to edit-pr
Next by Date: proposal for improvements in indexing
Previous by thread: Re: patches for PR 218, index corruption
Next by thread: Re: patches for PR 218, index corruption
Index(es):
- Date
- Thread