* bug#65496: 30.0.50; Issue with the regexp used to auto-detect PBM image data
@ 2023-08-24 10:55 David Ponce
2023-09-04 16:32 ` David Ponce
0 siblings, 1 reply; 7+ messages in thread
From: David Ponce @ 2023-08-24 10:55 UTC (permalink / raw)
To: 65496
[-- Attachment #1: Type: text/plain, Size: 2378 bytes --]
Hello,
While experimenting with code to create image from data, I encountered
an issue with the regexp in `image-type-header-regexps' used to
auto-detect PBM image type from the first bytes of image data. That is:
"\\`P[1-6]\\(?:\
\\(?:\\(?:#[^\r\n]*[\r\n]\\)*[[:space:]]\\)+\
\\(?:\\(?:#[^\r\n]*[\r\n]\\)*[0-9]\\)+\
\\)\\{2\\}"
Here is a simple recipe to illustrate the issue:
In *scratch* buffer eval:
-------------------------
;; Get content of a pbm file.
(setq test-data
(with-current-buffer
(find-file-noselect "[YourEmacsPath]/etc/images/splash.pbm")
(prog1 (buffer-substring-no-properties (point-min) (point-max))
(kill-buffer (current-buffer)))))
;; Check string data fail for pbm image-type!
(image-type-from-data test-data)
>>> nil
;; With a temp buffer current, the same test works!
(with-temp-buffer
(image-type-from-data test-data))
>>> pbm
-------------------------
After further digging, I found that the problem might be due to the use
of the [:space:] character class whose meaning, according to the manual,
depends on the syntax of whitespace characters setup in current buffer.
So, using discrete values in place of syntax class seems to solve the
issue:
(setcar (nth 1 image-type-header-regexps)
"\\`P[1-6]\\(?:\
\\(?:\\(?:#[^\r\n]*[\r\n]\\)*[ \t\r\n]\\)+\
\\(?:\\(?:#[^\r\n]*[\r\n]\\)*[0-9]\\)+\
\\)\\{2\\}")
(image-type-from-data test-data)
>>> pbm
I attached a patch proposal.
Hope it will help.
Regards
In GNU Emacs 30.0.50 (build 3, x86_64-pc-linux-gnu, GTK+ Version
3.24.38, cairo version 1.17.8) of 2023-08-23
Repository revision: 26ca3e84e167f975afb4e9e9a838935bfe4a19a7
Repository branch: master
Windowing system distributor 'The X.Org Foundation', version 11.0.12014000
System Description: Fedora Linux 38 (KDE Plasma)
Configured using:
'configure --with-x-toolkit=gtk3
--with-native-compilation=no
PKG_CONFIG_PATH=/usr/local/lib/pkgconfig:/usr/lib/pkgconfig'
Configured features:
ACL CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GPM GSETTINGS HARFBUZZ JPEG
JSON LCMS2 LIBOTF LIBSELINUX LIBSYSTEMD LIBXML2 M17N_FLT MODULES NOTIFY
INOTIFY PDUMPER PNG RSVG SECCOMP SOUND SQLITE3 THREADS TIFF
TOOLKIT_SCROLL_BARS TREE_SITTER WEBP X11 XDBE XIM XINPUT2 XPM GTK3 ZLIB
Important settings:
value of $LC_TIME: fr_FR.utf8
value of $LANG: fr_FR.UTF-8
locale-coding-system: utf-8-unix
[-- Attachment #2: image-type-header-regexps-patch-V0.patch --]
[-- Type: text/x-patch, Size: 429 bytes --]
diff --git a/lisp/image.el b/lisp/image.el
index 08190cf86bc..e20fbcf4c98 100644
--- a/lisp/image.el
+++ b/lisp/image.el
@@ -38,7 +38,7 @@ image
(defconst image-type-header-regexps
`(("\\`/[\t\n\r ]*\\*.*XPM.\\*/" . xpm)
("\\`P[1-6]\\(?:\
-\\(?:\\(?:#[^\r\n]*[\r\n]\\)*[[:space:]]\\)+\
+\\(?:\\(?:#[^\r\n]*[\r\n]\\)*[ \t\r\n]\\)+\
\\(?:\\(?:#[^\r\n]*[\r\n]\\)*[0-9]\\)+\
\\)\\{2\\}" . pbm)
("\\`GIF8[79]a" . gif)
^ permalink raw reply related [flat|nested] 7+ messages in thread
* bug#65496: 30.0.50; Issue with the regexp used to auto-detect PBM image data
2023-08-24 10:55 bug#65496: 30.0.50; Issue with the regexp used to auto-detect PBM image data David Ponce
@ 2023-09-04 16:32 ` David Ponce
2023-09-04 17:36 ` Eli Zaretskii
0 siblings, 1 reply; 7+ messages in thread
From: David Ponce @ 2023-09-04 16:32 UTC (permalink / raw)
To: 65496
On 24/08/2023 12:55, David Ponce wrote:
> Hello,
>
> While experimenting with code to create image from data, I encountered
> an issue with the regexp in `image-type-header-regexps' used to
> auto-detect PBM image type from the first bytes of image data. That is:
>
> "\\`P[1-6]\\(?:\
> \\(?:\\(?:#[^\r\n]*[\r\n]\\)*[[:space:]]\\)+\
> \\(?:\\(?:#[^\r\n]*[\r\n]\\)*[0-9]\\)+\
> \\)\\{2\\}"
>
> Here is a simple recipe to illustrate the issue:
>
> In *scratch* buffer eval:
> -------------------------
> ;; Get content of a pbm file.
> (setq test-data
> (with-current-buffer
> (find-file-noselect "[YourEmacsPath]/etc/images/splash.pbm")
> (prog1 (buffer-substring-no-properties (point-min) (point-max))
> (kill-buffer (current-buffer)))))
>
> ;; Check string data fail for pbm image-type!
> (image-type-from-data test-data)
>>>> nil
> ;; With a temp buffer current, the same test works!
> (with-temp-buffer
> (image-type-from-data test-data))
>>>> pbm
> -------------------------
>
> After further digging, I found that the problem might be due to the use
> of the [:space:] character class whose meaning, according to the manual,
> depends on the syntax of whitespace characters setup in current buffer.
> So, using discrete values in place of syntax class seems to solve the
> issue:
>
> (setcar (nth 1 image-type-header-regexps)
> "\\`P[1-6]\\(?:\
> \\(?:\\(?:#[^\r\n]*[\r\n]\\)*[ \t\r\n]\\)+\
> \\(?:\\(?:#[^\r\n]*[\r\n]\\)*[0-9]\\)+\
> \\)\\{2\\}")
>
> (image-type-from-data test-data)
>>>> pbm
>
> I attached a patch proposal.
> Hope it will help.
> Regards
Some additions.
Basic string matching recipe:
In *scratch* buffer eval:
-------------------------
(let ((re "\\`P[1-6]\\(?:\
\\(?:\\(?:#[^\r\n]*[\r\n]\\)*[[:space:]]\\)+\
\\(?:\\(?:#[^\r\n]*[\r\n]\\)*[0-9]\\)+\
\\)\\{2\\}")
(text "P4
333 233"))
(string-match-p re text))
>>> nil
(with-syntax-table (standard-syntax-table)
(let ((re "\\`P[1-6]\\(?:\
\\(?:\\(?:#[^\r\n]*[\r\n]\\)*[[:space:]]\\)+\
\\(?:\\(?:#[^\r\n]*[\r\n]\\)*[0-9]\\)+\
\\)\\{2\\}")
(text "P4
333 233"))
(string-match-p re text)))
>>> 0
I wonder if it is expected that matching a regular expression against a string
object depends on the syntax-table setup in current buffer?
Shouldn't (standard-syntax-table) implied when matching a regexp against a string
object, that is, regardless of any buffer context?
Regards
^ permalink raw reply [flat|nested] 7+ messages in thread
* bug#65496: 30.0.50; Issue with the regexp used to auto-detect PBM image data
2023-09-04 16:32 ` David Ponce
@ 2023-09-04 17:36 ` Eli Zaretskii
[not found] ` <6e4af25a-03b1-ef82-b1c0-2da81938e215@orange.fr>
0 siblings, 1 reply; 7+ messages in thread
From: Eli Zaretskii @ 2023-09-04 17:36 UTC (permalink / raw)
To: David Ponce; +Cc: 65496
> Date: Mon, 4 Sep 2023 18:32:22 +0200
> From: David Ponce <da_vid@orange.fr>
>
> I wonder if it is expected that matching a regular expression
> against a string object depends on the syntax-table setup in current
> buffer? Shouldn't (standard-syntax-table) implied when matching a
> regexp against a string object, that is, regardless of any buffer
> context?
Not necessarily, because you wouldn't expect, say, looking-at to
return a different result than (string-match-p (buffer-string)), would
you?
This belongs to the gray areas of Emacs. The same situation exists
with functions like downcase, which use the buffer-local value of
case-table.
^ permalink raw reply [flat|nested] 7+ messages in thread
* bug#65496: 30.0.50; Issue with the regexp used to auto-detect PBM image data
[not found] ` <6e4af25a-03b1-ef82-b1c0-2da81938e215@orange.fr>
@ 2023-09-05 11:08 ` Eli Zaretskii
2023-09-06 14:05 ` David Ponce
0 siblings, 1 reply; 7+ messages in thread
From: Eli Zaretskii @ 2023-09-05 11:08 UTC (permalink / raw)
To: David Ponce; +Cc: 65496
[I presume you didn't intend to discuss this only with me in private.]
> Date: Mon, 4 Sep 2023 23:43:56 +0200
> From: David Ponce <da_vid@orange.fr>
>
> On 04/09/2023 19:36, Eli Zaretskii wrote:
> >> Date: Mon, 4 Sep 2023 18:32:22 +0200
> >> From: David Ponce <da_vid@orange.fr>
> >>
> >> I wonder if it is expected that matching a regular expression
> >> against a string object depends on the syntax-table setup in current
> >> buffer? Shouldn't (standard-syntax-table) implied when matching a
> >> regexp against a string object, that is, regardless of any buffer
> >> context?
> >
> > Not necessarily, because you wouldn't expect, say, looking-at to
> > return a different result than (string-match-p (buffer-string)), would
> > you?
>
> Sure, from this perspective you are right. However, for other cases
> where the string object is not related to a buffer value, it's not so
> clear ;-)
>
> > This belongs to the gray areas of Emacs. The same situation exists
> > with functions like downcase, which use the buffer-local value of
> > case-table.
>
> I can understand that. Many things are not only black or white ;-)
>
> Maybe for the use case of auto-detecting image type from image data,
> my proposed patch to replace character class by a list of unambiguous
> explicit character values in the regexp could make sense?
Yes, it makes sense, but are you sure you mention there all the
characters that can happen in PBM images, and only those characters?
^ permalink raw reply [flat|nested] 7+ messages in thread
* bug#65496: 30.0.50; Issue with the regexp used to auto-detect PBM image data
2023-09-05 11:08 ` Eli Zaretskii
@ 2023-09-06 14:05 ` David Ponce
2023-09-06 16:00 ` Eli Zaretskii
0 siblings, 1 reply; 7+ messages in thread
From: David Ponce @ 2023-09-06 14:05 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 65496
On 05/09/2023 13:08, Eli Zaretskii wrote:
> [I presume you didn't intend to discuss this only with me in private.]
Hi Eli,
You are right, my mistake, I did reply instead of reply to all :-\
I am sorry.
>
>> Date: Mon, 4 Sep 2023 23:43:56 +0200
>> From: David Ponce <da_vid@orange.fr>
>>
>> On 04/09/2023 19:36, Eli Zaretskii wrote:
>>>> Date: Mon, 4 Sep 2023 18:32:22 +0200
>>>> From: David Ponce <da_vid@orange.fr>
>>>>
>>>> I wonder if it is expected that matching a regular expression
>>>> against a string object depends on the syntax-table setup in current
>>>> buffer? Shouldn't (standard-syntax-table) implied when matching a
>>>> regexp against a string object, that is, regardless of any buffer
>>>> context?
>>>
>>> Not necessarily, because you wouldn't expect, say, looking-at to
>>> return a different result than (string-match-p (buffer-string)), would
>>> you?
>>
>> Sure, from this perspective you are right. However, for other cases
>> where the string object is not related to a buffer value, it's not so
>> clear ;-)
>>
>>> This belongs to the gray areas of Emacs. The same situation exists
>>> with functions like downcase, which use the buffer-local value of
>>> case-table.
>>
>> I can understand that. Many things are not only black or white ;-)
>>
>> Maybe for the use case of auto-detecting image type from image data,
>> my proposed patch to replace character class by a list of unambiguous
>> explicit character values in the regexp could make sense?
>
> Yes, it makes sense, but are you sure you mention there all the
> characters that can happen in PBM images, and only those characters?
Yes, according to the specification of pbm available at
<https://netpbm.sourceforge.net/doc/pbm.html>:
"Each PBM image consists of the following:
* A "magic number" for identifying the file type.
A pbm image's magic number is the two characters "P4".
==> * Whitespace (blanks, TABs, CRs, LFs). <==
* The width in pixels of the image, formatted as ASCII characters in decimal.
..."
Thanks
^ permalink raw reply [flat|nested] 7+ messages in thread
* bug#65496: 30.0.50; Issue with the regexp used to auto-detect PBM image data
2023-09-06 14:05 ` David Ponce
@ 2023-09-06 16:00 ` Eli Zaretskii
2023-09-06 16:19 ` David Ponce
0 siblings, 1 reply; 7+ messages in thread
From: Eli Zaretskii @ 2023-09-06 16:00 UTC (permalink / raw)
To: David Ponce; +Cc: 65496-done
> Date: Wed, 6 Sep 2023 16:05:39 +0200
> Cc: 65496@debbugs.gnu.org
> From: David Ponce <da_vid@orange.fr>
>
> >> Maybe for the use case of auto-detecting image type from image data,
> >> my proposed patch to replace character class by a list of unambiguous
> >> explicit character values in the regexp could make sense?
> >
> > Yes, it makes sense, but are you sure you mention there all the
> > characters that can happen in PBM images, and only those characters?
>
> Yes, according to the specification of pbm available at
> <https://netpbm.sourceforge.net/doc/pbm.html>:
>
> "Each PBM image consists of the following:
>
> * A "magic number" for identifying the file type.
> A pbm image's magic number is the two characters "P4".
>
> ==> * Whitespace (blanks, TABs, CRs, LFs). <==
>
> * The width in pixels of the image, formatted as ASCII characters in decimal.
>
> ..."
Thanks, I've now installed your patch on the emacs-29 branch, and I'm
closing this bug.
^ permalink raw reply [flat|nested] 7+ messages in thread
* bug#65496: 30.0.50; Issue with the regexp used to auto-detect PBM image data
2023-09-06 16:00 ` Eli Zaretskii
@ 2023-09-06 16:19 ` David Ponce
0 siblings, 0 replies; 7+ messages in thread
From: David Ponce @ 2023-09-06 16:19 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 65496-done
On 06/09/2023 18:00, Eli Zaretskii wrote:
[...]
>
> Thanks, I've now installed your patch on the emacs-29 branch, and I'm
> closing this bug.
Great! Thank you very much!
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-09-06 16:19 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-24 10:55 bug#65496: 30.0.50; Issue with the regexp used to auto-detect PBM image data David Ponce
2023-09-04 16:32 ` David Ponce
2023-09-04 17:36 ` Eli Zaretskii
[not found] ` <6e4af25a-03b1-ef82-b1c0-2da81938e215@orange.fr>
2023-09-05 11:08 ` Eli Zaretskii
2023-09-06 14:05 ` David Ponce
2023-09-06 16:00 ` Eli Zaretskii
2023-09-06 16:19 ` David Ponce
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).