[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Most used words in current buffer
From: |
Eric Abrahamsen |
Subject: |
Re: Most used words in current buffer |
Date: |
Sat, 21 Jul 2018 21:05:48 -0700 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux) |
Eric Abrahamsen <eric@ericabrahamsen.net> writes:
> Eric Abrahamsen <eric@ericabrahamsen.net> writes:
>
>> Udyant Wig <udyantw@gmail.com> writes:
>>
>>> On 07/21/2018 09:45 PM, Eric Abrahamsen wrote:
>>>> Interesting... In general I think Emacs is highly optimized to use the
>>>> buffer as its textual data structure, more so than a string.
>>>> Particularly when the code is compiled (many of the text-movement
>>>> commands have opcodes). I made the following two commands to collect
>>>> words from a novel in an Org file, and the one that uses
>>>> `forward-word' and `buffer-substring' runs around twice as fast as the
>>>> `split-string'.
>>>>
>>>> Of course, they don't collect the same list of words! But even if you
>>>> add more code for trimming, etc., it will still likely be faster than
>>>> operating on a string.
>>>> [snip code]
>>>
>>> I have acted upon the advice (yours and Stefan Monnier's) to operate on
>>> the buffer directly using BUFFER-SUBSTRING. Please see my follow up to
>>> Stefan's message.
>>>
>>> BUFFER-SUBSTRING did gain me (somewhat) better performance.
>>
>> As Stefan said, going character by character is going to be slow... But
>> my example with `forward-word' collects a lot of cruft. So I would
>> suggest doing what `forward-word' does internally and move by syntax.
>
> Actually I think alternating `forward-word' with `forward-to-word' might
> do the exact same thing as alternating (skip-syntax-forward "w") with
> (skip-syntax-forward "^w"), and might get you some extra... stuff. Maybe
> worth benchmarking!
And, because apparently my Saturday nights are slow:
(defun test-buffer (f)
(let ((counts (make-hash-table :test #'equal))
pnt)
(with-temp-buffer
(insert-file-contents f)
(goto-char (point-min))
(forward-to-word 1)
(setq pnt (point))
(while (and (null (eobp)) (forward-word))
(cl-incf (gethash (downcase (buffer-substring pnt (point))) counts 0))
(forward-to-word 1)
(setq pnt (point))))
counts))
Seems to go pretty quick on my test file, though it's only 220K.
- Re: Most used words in current buffer, (continued)
- Message not available
- Re: Most used words in current buffer, Udyant Wig, 2018/07/20
- Message not available
- Re: Most used words in current buffer, Udyant Wig, 2018/07/19
- Re: Most used words in current buffer, Bob Proulx, 2018/07/19
- Re: Most used words in current buffer, Bob Newell, 2018/07/19
- Message not available
- Re: Most used words in current buffer, Udyant Wig, 2018/07/21
- Re: Most used words in current buffer, Eric Abrahamsen, 2018/07/21
- Message not available
- Re: Most used words in current buffer, Udyant Wig, 2018/07/21
- Re: Most used words in current buffer, Eric Abrahamsen, 2018/07/21
- Re: Most used words in current buffer, Eric Abrahamsen, 2018/07/22
- Re: Most used words in current buffer,
Eric Abrahamsen <=
- Message not available
- Re: Most used words in current buffer, Udyant Wig, 2018/07/22
- Re: Most used words in current buffer, Eric Abrahamsen, 2018/07/22
- Message not available
- Re: Most used words in current buffer, Udyant Wig, 2018/07/22
- Message not available
- Re: Most used words in current buffer, Udyant Wig, 2018/07/20
- Re: Most used words in current buffer, Stefan Monnier, 2018/07/21
- Re: Most used words in current buffer, tomas, 2018/07/22
- Re: Most used words in current buffer, Bob Proulx, 2018/07/23
- Re: Most used words in current buffer, tomas, 2018/07/23
- Message not available
- Re: Most used words in current buffer, Udyant Wig, 2018/07/23
- Message not available
- Re: Most used words in current buffer, Udyant Wig, 2018/07/22