[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Most used words in current buffer
From: |
Eric Abrahamsen |
Subject: |
Re: Most used words in current buffer |
Date: |
Sat, 21 Jul 2018 09:15:28 -0700 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux) |
Udyant Wig <udyantw@gmail.com> writes:
> On 07/20/2018 08:38 AM, Bob Newell wrote:
>> By the way on a 2 MB file the elisp version runs in a few seconds.
>> Hats off to the coder.
>
> I am still looking to improve it. For example, on a 4.5 MB text file,
> the original version takes over 5 seconds to run, as measured using the
> functions #'benchmark-run and #'benchmark-run-compiled.
>
> Is it feasible to read words from the buffer and hash them directly from
> there? Or, going further, is there a better way to do this -- counting
> words and producing the N most used -- using some other design, maybe
> with some other data structure?
Interesting... In general I think Emacs is highly optimized to use the
buffer as its textual data structure, more so than a string.
Particularly when the code is compiled (many of the text-movement
commands have opcodes). I made the following two commands to collect
words from a novel in an Org file, and the one that uses `forward-word'
and `buffer-substring' runs around twice as fast as the `split-string'.
Of course, they don't collect the same list of words! But even if you
add more code for trimming, etc., it will still likely be faster than
operating on a string.
(defun test-string (&optional f)
(let ((file (or f "/home/eric/org/hollowmountain.org"))
str lst)
(with-temp-buffer
(insert-file-contents file)
(setq str (split-string (buffer-string)))
(dolist (word str)
(push word lst)))
(length lst)))
(defun test-buffer (&optional f)
(let ((file (or f "/home/eric/org/hollowmountain.org"))
pnt lst)
(with-temp-buffer
(insert-file-contents file)
(goto-char (point-min))
(setq pnt (point))
(while (forward-word)
(push (buffer-substring pnt (point)) lst)
(setq pnt (point))))
(length lst)))
- Re: Most used words in current buffer, (continued)
- Message not available
- Re: Most used words in current buffer, Udyant Wig, 2018/07/21
- Message not available
- Re: Most used words in current buffer, Udyant Wig, 2018/07/20
- Message not available
- Re: Most used words in current buffer, Udyant Wig, 2018/07/19
- Re: Most used words in current buffer, Bob Proulx, 2018/07/19
- Re: Most used words in current buffer, Bob Newell, 2018/07/19
- Message not available
- Re: Most used words in current buffer, Udyant Wig, 2018/07/21
- Re: Most used words in current buffer,
Eric Abrahamsen <=
- Message not available
- Re: Most used words in current buffer, Udyant Wig, 2018/07/21
- Re: Most used words in current buffer, Eric Abrahamsen, 2018/07/21
- Re: Most used words in current buffer, Eric Abrahamsen, 2018/07/22
- Re: Most used words in current buffer, Eric Abrahamsen, 2018/07/22
- Message not available
- Re: Most used words in current buffer, Udyant Wig, 2018/07/22
- Re: Most used words in current buffer, Eric Abrahamsen, 2018/07/22
- Message not available
- Re: Most used words in current buffer, Udyant Wig, 2018/07/22
- Message not available
- Re: Most used words in current buffer, Udyant Wig, 2018/07/20
- Re: Most used words in current buffer, Stefan Monnier, 2018/07/21
- Re: Most used words in current buffer, tomas, 2018/07/22