unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#58718: Incorrect regex in nXML URI check
@ 2022-10-22 12:58 Martin Jerabek
       [not found] ` <handler.58718.B.16664517395858.ack@debbugs.gnu.org>
  2022-10-24 10:51 ` bug#58718: Incorrect regex in nXML URI check Mattias Engdegård
  0 siblings, 2 replies; 4+ messages in thread
From: Martin Jerabek @ 2022-10-22 12:58 UTC (permalink / raw)
  To: 58718

[-- Attachment #1: Type: text/plain, Size: 3212 bytes --]

Hi!

In the function rng-uri-file-name-1 (file lisp/nxml/rng-uri.el line 71)
the regular expression to check the passed URI is wrong. It is intended
to make sure that special characters in the URI are correctly encoded,
i.e. that the percent sign is followed by exactly two hex digits. The
relevant part of the regular expression is

%[[:xdigit:]]{2}

However, the curly braces only have their special meaning of specifying
the number of repetitions if they are escaped by a backslash. Otherwise
they are interpreted as literal braces, so the current regular
expression would only match a percent sign followed by one hex digit
followed by the literal string "{2}".

I stumbled upon this problem when trying to edit an XML file located in
a path whose name contained space characters. Associating a RELAX-NG
schema with this file (in the schemas.xml file) and reloading the file
with rng-auto-set-schema-and-validate resulted in the error message

"Bad escapes in URI 'file:///home/user/path%20with%20spaces/foo.rnc'"

I used a path without spaces to work around this problem for the time
being. As far as I can tell, this bug has existing since the first
version of rng-uri.el, i.e. it has never worked as intended (unless the
Emacs regular expression syntax changed in the meantime).

I am using Emacs 28.1 from the current Fedora 36 package but I checked
out the master branch of the Emacs source code to make sure that the
problem still exists there.

Find attached a patch to fix this problem. I hope it is trivial enough
to be applied without a formal copyright assignment (it just adds four
backslashes).

Best regards
Martin Jerabek


In GNU Emacs 28.1 (build 1, x86_64-redhat-linux-gnu, GTK+ Version
3.24.34, cairo version 1.17.6)
 of 2022-07-15 built on buildhw-x86-02.iad2.fedoraproject.org Windowing
system distributor 'The X.Org Foundation', version 11.0.12014000 System
Description: Fedora Linux 36 (Workstation Edition)

Configured using:
 'configure --build=x86_64-redhat-linux-gnu
 --host=x86_64-redhat-linux-gnu --program-prefix=
 --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr
 --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc
 --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64
 --libexecdir=/usr/libexec --localstatedir=/var sharedstatedir=/var/lib
 ----mandir=/usr/share/man infodir=/usr/share/info --with-dbus with-gif
 ------with-jpeg --with-png with-rsvg --with-tiff --with-xpm
 ----with-x-toolkit=gtk3 --with-gpm=no with-xwidgets --with-modules
 ----with-harfbuzz --with-cairo --with-json with-native-compilation
 --build_alias=x86_64-redhat-linux-gnu
 host_alias=x86_64-redhat-linux-gnu CC=gcc 'CFLAGS=-DMAIL_USE_LOCKF -O2
 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches
pipe
 --Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2
 -Wp,-D_GLIBCXX_ASSERTIONS specs=/usr/lib/rpm/redhat/redhat-hardened-
cc1
 --fstack-protector-strong specs=/usr/lib/rpm/redhat/redhat-annobin-cc1
 --m64 -mtune=generic fasynchronous-unwind-tables
 --fstack-clash-protection -fcf-protection'
 LDFLAGS=-Wl,-z,relro
 PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig'


[-- Attachment #2: rng-uri.patch --]
[-- Type: text/x-patch, Size: 571 bytes --]

diff --git a/lisp/nxml/rng-uri.el b/lisp/nxml/rng-uri.el
index 77fed8c32d..0a6fa39acb 100644
--- a/lisp/nxml/rng-uri.el
+++ b/lisp/nxml/rng-uri.el
@@ -68,7 +68,7 @@ Signal an error if URI is not a valid file URL."
 
 ;; pattern is either nil or match or replace
 (defun rng-uri-file-name-1 (uri pattern)
-  (unless (string-match "\\`\\(?:[^%]\\|%[[:xdigit:]]{2}\\)*\\'" uri)
+  (unless (string-match "\\`\\(?:[^%]\\|%[[:xdigit:]]\\{2\\}\\)*\\'" uri)
     (rng-uri-error "Bad escapes in URI `%s'" uri))
   (setq uri (rng-uri-unescape-multibyte uri))
   (let* ((components

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* bug#58718: Acknowledgement (Incorrect regex in nXML URI check)
       [not found] ` <handler.58718.B.16664517395858.ack@debbugs.gnu.org>
@ 2022-10-23 10:46   ` Martin Jerabek
  0 siblings, 0 replies; 4+ messages in thread
From: Martin Jerabek @ 2022-10-23 10:46 UTC (permalink / raw)
  To: 58718

[-- Attachment #1: Type: text/plain, Size: 859 bytes --]

It turned out that my patch was incomplete. Fixing the bug in
rng-uri-file-name-1 revealed another bug in rng-uri-unescape-unibyte.
The anonymous function passed to replace-regexp-in-string returned a
number (e.g. 32 for space) but replace-regexp-in-string expected a
string as the return value.

I therefore added a call to "string" to convert the number back to a
string (e.g. " ").

With this fix I was finally able to use a RELAX-NG schema file in a
path containing spaces.

I made the same fix in rng-uri-unescape-unibyte-match because it was
also obviously wrong (regexp-quote expects a string). However,
it would be good if someone more familiar with this code (and Lisp in
general) could check all the other occurrences of
replace-regexp-in-string to make sure they do not contain the same bug.

Find attached the expanded patch.


[-- Attachment #2: rng-uri.patch --]
[-- Type: text/x-patch, Size: 1188 bytes --]

diff --git a/lisp/nxml/rng-uri.el b/lisp/nxml/rng-uri.el
index 77fed8c32d..242238e0d1 100644
--- a/lisp/nxml/rng-uri.el
+++ b/lisp/nxml/rng-uri.el
@@ -68,7 +68,7 @@ Signal an error if URI is not a valid file URL."
 
 ;; pattern is either nil or match or replace
 (defun rng-uri-file-name-1 (uri pattern)
-  (unless (string-match "\\`\\(?:[^%]\\|%[[:xdigit:]]{2}\\)*\\'" uri)
+  (unless (string-match "\\`\\(?:[^%]\\|%[[:xdigit:]]\\{2\\}\\)*\\'" uri)
     (rng-uri-error "Bad escapes in URI `%s'" uri))
   (setq uri (rng-uri-unescape-multibyte uri))
   (let* ((components
@@ -312,7 +312,7 @@ Both FULL and BASE must be absolute URIs."
 (defun rng-uri-unescape-unibyte (str)
   (replace-regexp-in-string "%[0-7][[:xdigit:]]"
 			    (lambda (h)
-			      (string-to-number (substring h 1) 16))
+			      (string (string-to-number (substring h 1) 16)))
 			    str
 			    t
 			    t))
@@ -325,8 +325,8 @@ Both FULL and BASE must be absolute URIs."
 				(regexp-quote
 				 (if (= (length match) 1)
 				     match
-				   (string-to-number (substring match 1)
-						     16)))))
+				   (string (string-to-number (substring match 1)
+						     16))))))
 			    str
 			    t
 			    t))

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* bug#58718: Incorrect regex in nXML URI check
  2022-10-22 12:58 bug#58718: Incorrect regex in nXML URI check Martin Jerabek
       [not found] ` <handler.58718.B.16664517395858.ack@debbugs.gnu.org>
@ 2022-10-24 10:51 ` Mattias Engdegård
  2022-10-24 12:26   ` Robert Pluim
  1 sibling, 1 reply; 4+ messages in thread
From: Mattias Engdegård @ 2022-10-24 10:51 UTC (permalink / raw)
  To: Martin Jerabek; +Cc: 58718-done

I concur with the assessment. Your fixes are straightforward and have now been pushed to master.

I also did a quick scan for similar errors (missing backslashes before curly brackets) but found no obvious cases of the same bug in the Emacs tree. A check could be added to the regexp linter but it's hard to guard against false positives.

Thanks for the report and patch. This code hasn't seen much changes since it was added to Emacs, so if you find something else to improve you are more than welcome to contribute to further development.






^ permalink raw reply	[flat|nested] 4+ messages in thread

* bug#58718: Incorrect regex in nXML URI check
  2022-10-24 10:51 ` bug#58718: Incorrect regex in nXML URI check Mattias Engdegård
@ 2022-10-24 12:26   ` Robert Pluim
  0 siblings, 0 replies; 4+ messages in thread
From: Robert Pluim @ 2022-10-24 12:26 UTC (permalink / raw)
  To: 58718; +Cc: om, mattias.engdegard

>>>>> On Mon, 24 Oct 2022 12:51:30 +0200, Mattias Engdegård <mattias.engdegard@gmail.com> said:

    Mattias> I concur with the assessment. Your fixes are straightforward and have
    Mattias> now been pushed to master.

    Mattias> I also did a quick scan for similar errors (missing backslashes before
    Mattias> curly brackets) but found no obvious cases of the same bug in the
    Mattias> Emacs tree. A check could be added to the regexp linter but it's hard
    Mattias> to guard against false positives.

    Mattias> Thanks for the report and patch. This code hasn't seen much changes
    Mattias> since it was added to Emacs, so if you find something else to improve
    Mattias> you are more than welcome to contribute to further development.

If youʼre [1] looking at the code anyway, there might be opportunities to
use `url-unhex' and similar from 'url-utils.el' rather than
re-implementing the same function in nxml.

Robert

Footnotes:
[1]  For the grammar pedants, of which Emacs appears to have many,
     thatʼs a plural 'you' 😀

-- 





^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-10-24 12:26 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-22 12:58 bug#58718: Incorrect regex in nXML URI check Martin Jerabek
     [not found] ` <handler.58718.B.16664517395858.ack@debbugs.gnu.org>
2022-10-23 10:46   ` bug#58718: Acknowledgement (Incorrect regex in nXML URI check) Martin Jerabek
2022-10-24 10:51 ` bug#58718: Incorrect regex in nXML URI check Mattias Engdegård
2022-10-24 12:26   ` Robert Pluim

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).