bug-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Nick Dokos: texi2dvi egrep regexp


From: Nick Dokos
Subject: Nick Dokos: texi2dvi egrep regexp
Date: Fri, 08 Oct 2010 14:26:01 -0400

[Try again: Misspelt the mailing list name the first time]

------- Forwarded Message

Date:    Fri, 08 Oct 2010 13:57:43 -0400
From:    Nick Dokos <address@hidden>
To:      address@hidden
cc:      address@hidden, "Eric S. Fraga" <address@hidden>,
         Suvayu Ali <address@hidden>
Subject: texi2dvi egrep regexp

There was a discussion about some problems with the egrep regexp
that texi2dvi uses back in March 2010 in the thread entitled

     texi2dvi: locale-dependent error in egrep [A-z]

(see http://lists.gnu.org/archive/html/bug-texinfo/2010-03/msg00031.html
and following).

Has anything come of that? The reason I am asking is that recently emacs
org-mode tried to switch to texi2dvi for org->pdf exporting and several
people have reported this problem. The underlying reason seems to be
that recent versions of egrep check range expressions more strictly:
e.g. Fedora 13 uses grep version 2.6.3 and egrep fails the range check.
OTOH, Ubuntu 10.04 uses grep version 2.5.4: egrep does not fail there.

The egrep manual page says:

       Within a bracket expression, a range expression consists of two
       characters separated by a hyphen.  It matches any single
       character that sorts between the two characters, inclusive, using
       the locale=E2=80=99s collating sequence and character set.  For exam=
ple,
       in the default C locale, [a-d] is equivalent to [abcd].  Many
       locales sort characters in dictionary order, and in these locales
       [a-d] is typically not equivalent to [abcd]; it might be
       equivalent to [aBbCcDd], for example.  To obtain the traditional
       interpretation of bracket expressions, you can use the C locale
       by setting the LC_ALL environment variable to the value C.

       Finally, certain named classes of characters are predefined
       within bracket expressions, as follows.  Their names are self
       explanatory, and they are [:alnum:], [:alpha:], [:cntrl:],
       [:digit:], [:graph:], [:lower:], [:print:], [:punct:], [:space:],
       [:upper:], and [:xdigit:].  For example, [[:alnum:]] means
       [0-9A-Za-z], except the latter form depends upon the C locale and
       the ASCII character encoding, whereas the former is independent
       of locale and character set.  (Note that the brackets in these
       class names are part of the symbolic names, and must be included
       in addition to the brackets delimiting the bracket expression.)
       Most meta-characters lose their special meaning inside bracket
       expressions.  To include a literal ] place it first in the list.
       Similarly, to include a literal ^ place it anywhere but first.
       Finally, to include a literal - place it last.

Given that, would it make sense to replace the egrep invocation in
texi2dvi with

         egrep '^(/|[:alpha:]:/)'

which would be valid under any locale? It does not include the
ASCII characters between 'Z' and 'a',  which (I was surprised to find
out from Eli's response) could be drive letters, but as Eli also
points out, those are probably never used nowadays.

Thanks,
Nick


------- End of Forwarded Message




reply via email to

[Prev in Thread] Current Thread [Next in Thread]