emacs-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#65496: closed (30.0.50; Issue with the regexp used to auto-detect PB


From: GNU bug Tracking System
Subject: bug#65496: closed (30.0.50; Issue with the regexp used to auto-detect PBM image data)
Date: Wed, 06 Sep 2023 16:02:02 +0000

Your message dated Wed, 06 Sep 2023 19:00:28 +0300
with message-id <83il8n1ecj.fsf@gnu.org>
and subject line Re: bug#65496: 30.0.50; Issue with the regexp used to 
auto-detect PBM image data
has caused the debbugs.gnu.org bug report #65496,
regarding 30.0.50; Issue with the regexp used to auto-detect PBM image data
to be marked as done.

(If you believe you have received this mail in error, please contact
help-debbugs@gnu.org.)


-- 
65496: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=65496
GNU Bug Tracking System
Contact help-debbugs@gnu.org with problems
--- Begin Message --- Subject: 30.0.50; Issue with the regexp used to auto-detect PBM image data Date: Thu, 24 Aug 2023 12:55:03 +0200 User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.14.0
Hello,

While experimenting with code to create image from data, I encountered
an issue with the regexp in `image-type-header-regexps' used to
auto-detect PBM image type from the first bytes of image data. That is:

"\\`P[1-6]\\(?:\
\\(?:\\(?:#[^\r\n]*[\r\n]\\)*[[:space:]]\\)+\
\\(?:\\(?:#[^\r\n]*[\r\n]\\)*[0-9]\\)+\
\\)\\{2\\}"

Here is a simple recipe to illustrate the issue:

In *scratch* buffer eval:
-------------------------
;; Get content of a pbm file.
(setq test-data
      (with-current-buffer
          (find-file-noselect "[YourEmacsPath]/etc/images/splash.pbm")
        (prog1 (buffer-substring-no-properties (point-min) (point-max))
          (kill-buffer (current-buffer)))))

;; Check string data fail for pbm image-type!
(image-type-from-data test-data)
nil
;; With a temp buffer current, the same test works!
(with-temp-buffer
 (image-type-from-data test-data))
pbm
-------------------------

After further digging, I found that the problem might be due to the use
of the [:space:] character class whose meaning, according to the manual,
depends on the syntax of whitespace characters setup in current buffer.
So, using discrete values in place of syntax class seems to solve the
issue:

(setcar (nth 1 image-type-header-regexps)
        "\\`P[1-6]\\(?:\
\\(?:\\(?:#[^\r\n]*[\r\n]\\)*[ \t\r\n]\\)+\
\\(?:\\(?:#[^\r\n]*[\r\n]\\)*[0-9]\\)+\
\\)\\{2\\}")

(image-type-from-data test-data)
pbm

I attached a patch proposal.
Hope it will help.
Regards


In GNU Emacs 30.0.50 (build 3, x86_64-pc-linux-gnu, GTK+ Version
 3.24.38, cairo version 1.17.8) of 2023-08-23
Repository revision: 26ca3e84e167f975afb4e9e9a838935bfe4a19a7
Repository branch: master
Windowing system distributor 'The X.Org Foundation', version 11.0.12014000
System Description: Fedora Linux 38 (KDE Plasma)

Configured using:
 'configure --with-x-toolkit=gtk3
 --with-native-compilation=no
 PKG_CONFIG_PATH=/usr/local/lib/pkgconfig:/usr/lib/pkgconfig'

Configured features:
ACL CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GPM GSETTINGS HARFBUZZ JPEG
JSON LCMS2 LIBOTF LIBSELINUX LIBSYSTEMD LIBXML2 M17N_FLT MODULES NOTIFY
INOTIFY PDUMPER PNG RSVG SECCOMP SOUND SQLITE3 THREADS TIFF
TOOLKIT_SCROLL_BARS TREE_SITTER WEBP X11 XDBE XIM XINPUT2 XPM GTK3 ZLIB

Important settings:
  value of $LC_TIME: fr_FR.utf8
  value of $LANG: fr_FR.UTF-8
  locale-coding-system: utf-8-unix

Attachment: image-type-header-regexps-patch-V0.patch
Description: Text Data


--- End Message ---
--- Begin Message --- Subject: Re: bug#65496: 30.0.50; Issue with the regexp used to auto-detect PBM image data Date: Wed, 06 Sep 2023 19:00:28 +0300
> Date: Wed, 6 Sep 2023 16:05:39 +0200
> Cc: 65496@debbugs.gnu.org
> From: David Ponce <da_vid@orange.fr>
> 
> >> Maybe for the use case of auto-detecting image type from image data,
> >> my proposed patch to replace character class by a list of unambiguous
> >> explicit character values in the regexp could make sense?
> > 
> > Yes, it makes sense, but are you sure you mention there all the
> > characters that can happen in PBM images, and only those characters?
> 
> Yes, according to the specification of pbm available at
> <https://netpbm.sourceforge.net/doc/pbm.html>:
> 
>    "Each PBM image consists of the following:
> 
>      * A "magic number" for identifying the file type.
>        A pbm image's magic number is the two characters "P4".
> 
> ==> * Whitespace (blanks, TABs, CRs, LFs). <==
> 
>      * The width in pixels of the image, formatted as ASCII characters in 
> decimal.
> 
>      ..."

Thanks, I've now installed your patch on the emacs-29 branch, and I'm
closing this bug.


--- End Message ---

reply via email to

[Prev in Thread] Current Thread [Next in Thread]