[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug#42023] [PATCH] database: register-items: reduce transaction scope.
From: |
Ludovic Courtès |
Subject: |
[bug#42023] [PATCH] database: register-items: reduce transaction scope. |
Date: |
Thu, 25 Jun 2020 09:45:14 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) |
Hi,
Caleb Ristvedt <caleb.ristvedt@cune.org> skribis:
> Ludovic Courtès <ludo@gnu.org> writes:
[...]
>> Two questions:
>>
>> 1. Can another process come and fiddle with TO-REGISTER while we’re
>> still in ‘reset-timestamps’? Or can GC happen while we’re in
>> ‘reset-timestamps’ and delete TO-REGISTER and remove it from the
>> database?
>
>>
>> I think none of these scenarios can happen, as long as we’ve taken the
>> .lock file for TO-REGISTER before, like ‘finalize-store-file’ does.
>
> The lock file has no bearing on liveness of the path it locks, though
> the liveness of the path it locks *does* count as liveness for the lock
> file and for the .chroot file; see tryToDelete() in nix/libstore/gc.cc.
>
> (Well, semi-liveness; .lock and .chroot files won't show up in a list of
> live paths, but they will still be protected from deletion if their
> associated store item is a temp root)
>
> So general "fiddling" can't happen, but GC can. The responsibility for
> both locking and registering as temporary roots falls on the caller,
> currently, as I believe it should. We may want to document this
> responsibility in the docstring for register-items, though.
Yes, it would be good to add a sentence to document it.
> So currently finalize-store-file is safe from clobbering by another
> process attempting to finalize to the same path, but not safe from
> garbage collection between when the temporary store file is renamed and
> when it is registered. It needs an add-temp-root prior to renaming.
Ah, so we’d need to do that before applying the patch that reduces the
scope of the transaction, right?
>> 2. After the transaction, TO-REGISTER is considered valid. But are
>> the effects of the on-going deduplication observable, due to
>> non-atomicity of some operation?
>
> Subdirectories of store items need to be made writable prior to renaming
> the temp link, so there will necessarily be a window of time during
> which various subdirectories will appear writable. I don't think this
> should cause problems; we already assume that the .lock file is held, so
> we should be the only process attempting to deduplicate it. On a related
> note, we should probably use dynamic-wind for setting and restoring the
> permissions in replace-with-link.
Yes. Here’s a patch for ‘dynamic-wind’:
diff --git a/.dir-locals.el b/.dir-locals.el
index 92979fc5ed..155255a770 100644
--- a/.dir-locals.el
+++ b/.dir-locals.el
@@ -37,6 +37,7 @@
(eval . (put 'with-file-lock 'scheme-indent-function 1))
(eval . (put 'with-file-lock/no-wait 'scheme-indent-function 1))
(eval . (put 'with-profile-lock 'scheme-indent-function 1))
+ (eval . (put 'with-writable-file 'scheme-indent-function 1))
(eval . (put 'package 'scheme-indent-function 0))
(eval . (put 'origin 'scheme-indent-function 0))
diff --git a/guix/store/deduplication.scm b/guix/store/deduplication.scm
index 6784ee0b92..af52c03370 100644
--- a/guix/store/deduplication.scm
+++ b/guix/store/deduplication.scm
@@ -94,6 +94,20 @@ LINK-PREFIX."
(try (tempname-in link-prefix))
(apply throw args))))))
+(define (call-with-writable-file file thunk)
+ (let ((stat (lstat file)))
+ (dynamic-wind
+ (lambda ()
+ (make-file-writable file))
+ thunk
+ (lambda ()
+ (set-file-time file stat)
+ (chmod file (stat:mode stat))))))
+
+(define-syntax-rule (with-writable-file file exp ...)
+ "Make FILE writable for the dynamic extent of EXP..."
+ (call-with-writable-file file (lambda () exp ...)))
+
;; There are 3 main kinds of errors we can get from hardlinking: "Too many
;; things link to this" (EMLINK), "this link already exists" (EEXIST), and
;; "can't fit more stuff in this directory" (ENOSPC).
@@ -120,20 +134,14 @@ Note: TARGET, TO-REPLACE, and SWAP-DIRECTORY must be on
the same file system."
;; If we couldn't create TEMP-LINK, that's OK: just don't do the
;; replacement, which means TO-REPLACE won't be deduplicated.
(when temp-link
- (let* ((parent (dirname to-replace))
- (stat (stat parent)))
- (make-file-writable parent)
+ (with-writable-file (dirname to-replace)
(catch 'system-error
(lambda ()
(rename-file temp-link to-replace))
(lambda args
(delete-file temp-link)
(unless (= EMLINK (system-error-errno args))
- (apply throw args))))
-
- ;; Restore PARENT's mtime and permissions.
- (set-file-time parent stat)
- (chmod parent (stat:mode stat)))))
+ (apply throw args)))))))
(define* (deduplicate path hash #:key (store %store-directory))
"Check if a store item with sha256 hash HASH already exists. If so,
> Also, replace-with-link doesn't check whether the "containing directory"
> is the store like optimisePath_() does, so in theory if another process
> tried to make a permanent change to the store's permissions it could be
> clobbered when replace-with-link restores the permissions. I don't know
> of any instance where this could happen currently, but it's something to
> keep in mind.
I guess we should also avoid changing permissions on /gnu/store, it
would be wiser.
> And, of course, there's the inherent visible change of deduplication -
> possible modification of inode number, but I don't see how that could
> cause problems with any reasonable programs.
Yes, that’s fine.
>> I think the ‘replace-with-link’ dance is atomic, so we should be fine.
>>
>> Thoughts?
>
> replace-with-link is atomic, but it can fail if the "canonical" link in
> .links is deleted by the garbage collector between when its existence is
> checked in 'deduplicate' and when temp link creation in
> replace-with-link happens. Then ENOENT would be returned, and
> 'deduplicate' wouldn't catch that exception, nor would optimisePath_()
> in nix/libstore/optimise-store.cc.
>
> The proper behavior there, in my opinion, would be to retry the
> deduplication. Attached is a patch that makes 'deduplicate' do that.
>
> Also, while I'm perusing optimisePath_(), there's a minor bug in which
> EMLINK generated by renaming the temp link may still be treated as a
> fatal error. This is because errno may be clobbered by unlink() prior to
> being checked (according to the errno man page it can still be modified
> even if the call succeeds).
Indeed, good catch!
> From 461064da9e169df3dd939b734bb2860598d113f4 Mon Sep 17 00:00:00 2001
> From: Caleb Ristvedt <caleb.ristvedt@cune.org>
> Date: Wed, 24 Jun 2020 00:56:50 -0500
> Subject: [PATCH 1/2] deduplication: retry on ENOENT.
>
> It's possible for the garbage collector to remove the "canonical" link after
> it's been detected as existing by 'deduplicate'. This would cause an ENOENT
> error when replace-with-link attempts to create the temporary link. This
> changes it so that it will properly handle that by retrying.
>
> * guix/store/deduplication.scm (deduplicate): retry on ENOENT.
> ---
> guix/store/deduplication.scm | 64 +++++++++++++++++++++++-------------
> 1 file changed, 41 insertions(+), 23 deletions(-)
>
> diff --git a/guix/store/deduplication.scm b/guix/store/deduplication.scm
> index 6784ee0b92..b6d94e49c2 100644
> --- a/guix/store/deduplication.scm
> +++ b/guix/store/deduplication.scm
> @@ -161,26 +161,44 @@ under STORE."
> (scandir* path))
> (let ((link-file (string-append links-directory "/"
> (bytevector->nix-base32-string
> hash))))
> - (if (file-exists? link-file)
> - (replace-with-link link-file path
> - #:swap-directory links-directory)
> - (catch 'system-error
> - (lambda ()
> - (link path link-file))
> - (lambda args
> - (let ((errno (system-error-errno args)))
> - (cond ((= errno EEXIST)
> - ;; Someone else put an entry for PATH in
> - ;; LINKS-DIRECTORY before we could. Let's use it.
> - (replace-with-link path link-file
> - #:swap-directory
> links-directory))
> - ((= errno ENOSPC)
> - ;; There's not enough room in the directory index
> for
> - ;; more entries in .links, but that's fine: we can
> - ;; just stop.
> - #f)
> - ((= errno EMLINK)
> - ;; PATH has reached the maximum number of links,
> but
> - ;; that's OK: we just can't deduplicate it more.
> - #f)
> - (else (apply throw args)))))))))))
> + (let retry ()
> + (if (file-exists? link-file)
> + (catch 'system-error
> + (lambda ()
> + (replace-with-link link-file path
> + #:swap-directory links-directory))
> + (lambda args
> + (if (and (= (system-error-errno args) ENOENT)
> + ;; ENOENT can be produced because:
> + ;; - LINK-FILE has missing directory components
> + ;; - LINKS-DIRECTORY has missing directory
> + ;; components
> + ;; - PATH has missing directory components
> + ;;
> + ;; the last two are errors, the first just
> + ;; requires a retry.
> + (file-exists? (dirname path))
> + (file-exists? links-directory))
> + (retry)
> + (apply throw args))))
I feel like there are TOCTTOU issues here and redundant ‘stat’ calls,
plus the risk of catching a ‘system-error’ coming from “somewhere else.”
What about baking this logic directly in ‘replace-with-link’, and
replacing ‘file-exists?’ calls by 'system-error handling?
> From a14ff0a9dab4d73680befaf9d244d6cce0a73ab3 Mon Sep 17 00:00:00 2001
> From: Caleb Ristvedt <caleb.ristvedt@cune.org>
> Date: Wed, 24 Jun 2020 01:00:40 -0500
> Subject: [PATCH 2/2] .dir-locals.el: fix call-with-{retrying-}transaction
> indenting.
>
> * .dir-locals.el (call-with-transaction, call-with-retrying-transaction):
> change scheme-indent-function property from 2 to 1.
LGTM!
Thanks for your quick feedback and thorough analyses!
Ludo’.