bug-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Texinfo 7.1 released


From: Gavin Smith
Subject: Re: Texinfo 7.1 released
Date: Sun, 22 Oct 2023 13:35:19 +0100

On Sun, Oct 22, 2023 at 12:06:21PM +0300, Eli Zaretskii wrote:
>   . makeinfo is painfully slow.  For example, building the ELisp
>     manual that is part of Emacs takes a whopping 82.3 sec.  By
>     contrast, Texinfo-7.0.3 takes just 20.7 sec.  And this is with
>     Perl extensions being used!  What could explain such a performance
>     regression? perhaps the use of libunistring or some other code
>     that handles non-ASCII characters?

It could be the use of Unicode collation for sorting document indices.
All the testing I did showed that it didn't make much difference to run
times, but it could be different on Windows.

First, check that the Perl extension modules are actually being used.  Try
setting the TEXINFO_XS environment variable to "require" or "debug".

Otherwise, the easiest way of turning off the Unicode collation is
patching the source code:

--- a/tp/Texinfo/Structuring.pm
+++ b/tp/Texinfo/Structuring.pm
@@ -2604,7 +2604,7 @@ sub setup_sortable_index_entries($$$$$;$)
   my $collator;
   eval { require Unicode::Collate; Unicode::Collate->import; };
   my $unicode_collate_loading_error = $@;
-  if ($unicode_collate_loading_error eq '') {
+  if (0 || $unicode_collate_loading_error eq '') {
     $collator = Unicode::Collate->new(%collate_options);
   } else {
     $collator = Texinfo::CollateStub->new();

This should use the 'cmp' Perl operator instead of the more complicated
Unicode collation algorithm.

If that doesn't make a difference, you could then profile texi2any with
NYTProf (assuming this works on MS-Windows).  After installing the
Devel::NYTProf module, run texi2any as "perl -D:NYTProf texi2any MANUAL.texi".
Then run "nytprofhtml --open" to see where the execution time is going.

(Incidently 20.7 seconds for Texinfo 7.0.3 is still longer than I would
expect.  On my system the same manual is processed in 5-6 seconds, on
GNU/Linux on a fairly cheap Acer laptop.)

>   . makeinfo seems to ignore @documentencoding, at least in some
>     places.  Specifically, it consistently produces ASCII equivalents
>     of some punctuation characters, like quotes “..” and ’, en-dash –,
>     etc.  Curiously, other punctuation characters, and even the above
>     ones in some contexts, _are_ produced.  As an example, makeinfo
>     7.1 produces
> 
>        If you don't customize ‘auth-sources’, you'll have to live with the
>       defaults: the unencrypted netrc file ‘~/.authinfo’ will be used for any
>       host and any port.
> 
>     where 7.0.3 produced
> 
>        If you don’t customize ‘auth-sources’, you’ll have to live with the
>       defaults: the unencrypted netrc file ‘~/.authinfo’ will be used for any
>       host and any port.
> 
>     Note how ’ in "don’t" and "you’ll" produced the ASCII ', whereas
>     ‘auth-sources’ and ‘~/.authinfo’ are quoted with non-ASCII quote
>     characters.  Why this difference?  Texinfo 7.0.3 produces
>     non-ASCII quotes in both cases.

This was a deliberate choice.  It is a bad idea to give output like
'don’t' in my opinion, as it means you cannot search for that word
easily unless your browser has support for finding ’ with '.
The same applies to hypenated words.

The output of @code and @samp is different, as Emacs Info might use these
directional quotes for highlighting the enclosed text.

This was mentioned in the NEWS file:

 . Info output:
    . new variable ASCII_DASHES_AND_QUOTES, on by default,
      outputs ASCII characters for literal quote or hyphen characters
      in source, rather than UTF-8.  this makes it easier to search
      Info files.

You can turn it off by passing '-c ASCII_DASHES_AND_QUOTES=0' on the
texi2any command line.

See the following discussions

https://lists.gnu.org/archive/html/automake-patches/2022-12/msg00000.html
https://lists.gnu.org/archive/html/automake-patches/2022-12/msg00019.html
https://lists.gnu.org/archive/html/bug-texinfo/2023-06/msg00000.html

> The above basically means I'm unable to upgrade to 7.1, and will need
> to keep using v7.0.3 for the time being.
> 
> I'm sorry I didn't try this version on the Emacs docs when it was in
> pretest.  To my defense, I never before saw such issues once the test
> suite runs successfully.  Any suggestions for debugging the above two
> issues will be welcome.

It's why we do pretest releases, to try to avoid this kind of reception.
At least the dashes and quotes issue is not a mystery.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]