emacs-orgmode
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] Re: tangle option to not write a file with same contents?


From: Max Nikulin
Subject: Re: [PATCH] Re: tangle option to not write a file with same contents?
Date: Tue, 17 May 2022 22:39:30 +0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.1

On 08/05/2022 11:42, Ihor Radchenko wrote:
On 28/10/2021 11:04, Greg Minshall wrote:

i wonder if it would be reasonable to add an option such that, when
tangling, `org-babel-tangle` would not write a file with the
already-existing contents of the target file?
[...]
The patch is attached.

On SSD, when tangling into ~200 files, the patch speeds up tangling
by almost 2x: before 7.6 sec; after 4.4 sec.

By mistake I sent my reply to Ihor off-list, so a part of discussion is missed in the list archives. The only excuse is that a copy of message received as Cc does not have List-Post header, so reply action works as for private messages.

Ihor Radchenko. [BUG] org-babel-load-file can not compile file. Fri, 13 May 2022 18:38:14 +0800. https://list.orgmode.org/878rr5ra3t.fsf@localhost

This is a patch from another thread that should be a part of this change.

diff --git a/lisp/org.el b/lisp/org.el
index 47a16e94b..09a001414 100644
--- a/lisp/org.el
+++ b/lisp/org.el
@@ -256,6 +256,11 @@ (defun org-babel-load-file (file &optional compile)
             tangled-file
             (file-attribute-modification-time
              (file-attributes (file-truename file))))
+      ;; Make sure that tangled file modification time is
+      ;; updated even when `org-babel-tangle-file' does not make changes.
+      ;; This avoids re-tangling changed FILE where the changes did
+      ;; not affect the tangled code.
+      (set-file-times tangled-file)
       (org-babel-tangle-file file
                              tangled-file
                              (rx string-start

`set-file-times' signals if the file does not exist, so I expect that the call should be after `org-babel-tangle-file' otherwise first invocation for a new org file will fail. I would prefer to avoid touching the tangled file at all, but it makes impossible to check if the file is up to date (at least without saving hashes somewhere, that is unnecessary complication here). With optimizing of writhing of the tangled file overall behavior is rather close to original approach, so `set-file-times' should be OK.

Ihor Radchenko, off-list [PATCH v2] Re: tangle option to not write a file with same contents? Mon, 09 May 2022 21:22:55 +0800.

diff --git a/lisp/ob-tangle.el b/lisp/ob-tangle.el
index 16d938afb..76243f83f 100644
--- a/lisp/ob-tangle.el
+++ b/lisp/ob-tangle.el
@@ -282,11 +282,24 @@ (defun org-babel-tangle (&optional arg target-file 
lang-re)
                    lspecs)
                   (when make-dir
                     (make-directory fnd 'parents))
-                   ;; erase previous file
-                   (when (file-exists-p file-name)
-                     (delete-file file-name))
-                  (write-region nil nil file-name)
-                  (mapc (lambda (mode) (set-file-modes file-name mode)) modes)
+                   (unless
+                       (when (file-exists-p file-name)
+                         (let ((tangle-buf (current-buffer)))
+                           (with-temp-buffer
+                             (insert-file-contents file-name)
+                             (and
+                              (equal (buffer-size)
+                                     (buffer-size tangle-buf))
+                              (= 0
+                                 (let (case-fold-search)
+                                   (compare-buffer-substrings
+                                    nil nil nil
+                                    tangle-buf nil nil)))))))
+                     ;; erase previous file
+                     (when (file-exists-p file-name)
+                       (delete-file file-name))
+                    (write-region nil nil file-name)
+                    (mapc (lambda (mode) (set-file-modes file-name mode)) 
modes))
                    (push file-name path-collector))))))
         (if (equal arg '(4))
             (org-babel-tangle-single-block 1 t)

I do not like (unless (when ...)) composition. If I remember correctly, `when' should be used for side effects, so `and' may be more suitable here. Otherwise it looks like what Greg suggested and should work faster than first variant of this patch.

My fault caused significant delay, so feel free to ignore comments below.

I still had a hope that `org-babel-load-file' might be improved a bit by using `byte-recompile-file' with 0 passed for ARG (previously I incorrectly wrote FORCE). The goal is to avoid recompiling the tangled .el file if it is not changed.

I am still curious if it is reliable to compare file size from `file-attributes' with (+ 1 (bufferpos-to-filepos (buffer-size))) for tangle result prior to loading existing file. I am unsure due to variations in encodings and newline formats, however it might further improve performance then tangle result changes.

I have noticed that `org-babel-tangle-file' may create empty org file if it does not exist. From my point of view it is questionable behavior.

Finally some comments on performance numbers. Ihor, your test simulates iterative debugging. Tangle results were likely in disk caches. Another case may give different numbers. Consider single pass after small modification of the source .org file. For comparison existing files are mostly should be loaded from disk. I did not mean disabling disk caches completely. After tangling it may take some time till files are actually written to disk. I am unsure if during repetitive benchmarking some files may be replaced in caches without writing to disk at all, likely timeout for dirty cache pages is small enough.

Outline of more fair performance test (however I do not think that such accuracy is really required):
- purge disk caches, so earlier tangled files have to be loaded from disk
- tangle
- flush caches (sync) to complete cycle.

And of course, tangling to single large file is not the same as multiple small ones.

Leaving aside further changes and details of benchmarking, I hope these 2 patches will improve experience for make users and will not break anything in Org.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]