unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#34469: 26.1; EWW stops renderring web page on null byte
       [not found] <CGME20190213122718eucas1p26156656a2376e5055452ac4d0385fc6d@eucas1p2.samsung.com>
@ 2019-02-13 12:27 ` Lukasz Pawelczyk
  2019-02-14  4:44   ` Nicholas Drozd
                     ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Lukasz Pawelczyk @ 2019-02-13 12:27 UTC (permalink / raw)
  To: 34469


As in the topic. See this page:
http://blog.eduardofleury.com/archives/2007/09/13
There is a string with a null byte at the beginning. Firefox renders
the page past this point. EWW stops on:
sock.bind(“



In GNU Emacs 26.1 (build 1, x86_64-redhat-linux-gnu, GTK+ Version
3.23.2)
 of 2018-08-13 built on buildvm-13.phx2.fedoraproject.org
Windowing system distributor 'Fedora Project', version 11.0.12003000
Recent messages:
For information about GNU Emacs and the GNU system, type C-h C-a.
Contacting host: blog.eduardofleury.com:80
scroll-up-command: End of buffer [2 times]
Configured using:
 'configure --build=x86_64-redhat-linux-gnu
 --host=x86_64-redhat-linux-gnu --program-prefix=
 --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr
 --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc
 --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64
 --libexecdir=/usr/libexec --localstatedir=/var
 --sharedstatedir=/var/lib --mandir=/usr/share/man
 --infodir=/usr/share/info --with-dbus --with-gif --with-jpeg --with-
png
 --with-rsvg --with-tiff --with-xft --with-xpm --with-x-toolkit=gtk3
 --with-gpm=no --with-xwidgets --with-modules
 build_alias=x86_64-redhat-linux-gnu host_alias=x86_64-redhat-linux-gnu
 'CFLAGS=-DMAIL_USE_LOCKF -O2 -g -pipe -Wall -Werror=format-security
 -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions
 -fstack-protector-strong -grecord-gcc-switches
 -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1
 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic
 -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection'
 LDFLAGS=-Wl,-z,relro
 PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig'

Configured features:
XPM JPEG TIFF GIF PNG RSVG IMAGEMAGICK SOUND DBUS GSETTINGS NOTIFY ACL
LIBSELINUX GNUTLS LIBXML2 FREETYPE M17N_FLT LIBOTF XFT ZLIB
TOOLKIT_SCROLL_BARS GTK3 X11 MODULES THREADS XWIDGETS LCMS2

Important settings:
  value of $LC_COLLATE: C
  value of $LC_CTYPE: pl_PL.UTF-8
  value of $LC_MONETARY: en_US.UTF-8
  value of $LC_NUMERIC: en_US.UTF-8
  value of $LC_TIME: en_US.UTF-8
  value of $LANG: C
  value of $XMODIFIERS: @im=ibus
  locale-coding-system: utf-8-unix

Major mode: eww

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  buffer-read-only: t
  line-number-mode: t
  transient-mark-mode: t

Load-path shadows:
None found.

Features:
(shadow sort mail-extr emacsbug message dired dired-loaddefs rfc822 mml
mml-sec epa derived epg epg-config mm-decode mm-bodies mm-encode
mailabbrev gmm-utils mailheader sendmail cl-extra help-mode
network-stream starttls url-http tls gnutls mail-parse rfc2231 url-gw
nsm rmc url-cache url-auth eww easymenu puny mm-url gnus nnheader
gnus-util rmail rmail-loaddefs rfc2047 rfc2045 ietf-drums mail-utils
wid-edit mm-util mail-prsvr url-queue url url-proxy url-privacy
url-expand url-methods url-history url-cookie url-domsuf url-util
url-parse auth-source cl-seq eieio eieio-core cl-macs eieio-loaddefs
password-cache url-vars mailcap shr svg xml seq byte-opt gv bytecomp
byte-compile cconv dom browse-url format-spec cl-loaddefs cl-lib
elec-pair time-date mule-util tooltip eldoc electric uniquify ediff-
hook
vc-hooks lisp-float-type mwheel term/x-win x-win term/common-win x-dnd
tool-bar dnd fontset image regexp-opt fringe tabulated-list replace
newcomment text-mode elisp-mode lisp-mode prog-mode register page
menu-bar rfn-eshadow isearch timer select scroll-bar mouse jit-lock
font-lock syntax facemenu font-core term/tty-colors frame cl-generic
cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao
korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech
european ethiopic indian cyrillic chinese composite charscript charprop
case-table epa-hook jka-cmpr-hook help simple abbrev obarray minibuffer
cl-preloaded nadvice loaddefs button faces cus-face macroexp files
text-properties overlay sha1 md5 base64 format env code-pages mule
custom widget hashtable-print-readable backquote dbusbind inotify lcms2
dynamic-setting system-font-setting font-render-setting xwidget-
internal
move-toolbar gtk x-toolkit x multi-tty make-network-process emacs)

Memory information:
((conses 16 137138 10359)
 (symbols 48 23803 2)
 (miscs 40 59 148)
 (strings 32 40308 1635)
 (string-bytes 1 1174212)
 (vectors 16 17956)
 (vector-slots 8 544601 12850)
 (floats 8 73 241)
 (intervals 56 3447 0)
 (buffers 992 12))
-- 
Lukasz Pawelczyk
Samsung R&D Institute Poland
Samsung Electronics








^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#34469: 26.1; EWW stops renderring web page on null byte
  2019-02-13 12:27 ` bug#34469: 26.1; EWW stops renderring web page on null byte Lukasz Pawelczyk
@ 2019-02-14  4:44   ` Nicholas Drozd
  2019-02-14 19:14     ` Eli Zaretskii
  2019-02-16 18:13   ` Nicholas Drozd
  2019-02-28  1:52   ` Paul Eggert
  2 siblings, 1 reply; 15+ messages in thread
From: Nicholas Drozd @ 2019-02-14  4:44 UTC (permalink / raw)
  To: l.pawelczyk, 34469

This looks a problem with libxml-parse-html-region (or maybe even
lower than that, I have no idea). Put the following in a buffer

  <p>sock.bind(&#8220;\0MyBindName&#8221;)</p>

and execute

  (libxml-parse-html-region (point-min) (point-max))

This returns

  (html nil (body nil (p nil "sock.bind(“")))





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#34469: 26.1; EWW stops renderring web page on null byte
  2019-02-14  4:44   ` Nicholas Drozd
@ 2019-02-14 19:14     ` Eli Zaretskii
  0 siblings, 0 replies; 15+ messages in thread
From: Eli Zaretskii @ 2019-02-14 19:14 UTC (permalink / raw)
  To: Nicholas Drozd; +Cc: 34469, l.pawelczyk

> From: Nicholas Drozd <nicholasdrozd@gmail.com>
> Date: Wed, 13 Feb 2019 22:44:50 -0600
> 
> This looks a problem with libxml-parse-html-region (or maybe even
> lower than that, I have no idea).

libxml-parse-html-region calls parse_region, which passes a C string
to libxml functions.  So there can be no embedded null bytes.

Does libxml have facilities to deal with such cases?  If not, maybe
this should be taken up with libxml developers.





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#34469: 26.1; EWW stops renderring web page on null byte
  2019-02-13 12:27 ` bug#34469: 26.1; EWW stops renderring web page on null byte Lukasz Pawelczyk
  2019-02-14  4:44   ` Nicholas Drozd
@ 2019-02-16 18:13   ` Nicholas Drozd
  2019-02-19  1:12     ` Glenn Morris
  2019-02-28  1:52   ` Paul Eggert
  2 siblings, 1 reply; 15+ messages in thread
From: Nicholas Drozd @ 2019-02-16 18:13 UTC (permalink / raw)
  To: 34469, eliz

This is a known issue with libxml, or at least it was at some point.
Here's a thread from 2008:
https://mail.gnome.org/archives/xml/2008-August/msg00008.html





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#34469: 26.1; EWW stops renderring web page on null byte
  2019-02-16 18:13   ` Nicholas Drozd
@ 2019-02-19  1:12     ` Glenn Morris
  2019-02-19 10:06       ` Robert Pluim
  0 siblings, 1 reply; 15+ messages in thread
From: Glenn Morris @ 2019-02-19  1:12 UTC (permalink / raw)
  To: Nicholas Drozd; +Cc: 34469


Perhaps eww-display-html should replace null bytes (with whatever the
html standard says is appropriate) before calling
libxml-parse-html-region. It already replaces CRLF.





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#34469: 26.1; EWW stops renderring web page on null byte
  2019-02-19  1:12     ` Glenn Morris
@ 2019-02-19 10:06       ` Robert Pluim
  2019-02-19 16:30         ` Eli Zaretskii
  0 siblings, 1 reply; 15+ messages in thread
From: Robert Pluim @ 2019-02-19 10:06 UTC (permalink / raw)
  To: Glenn Morris; +Cc: 34469, Nicholas Drozd

Glenn Morris <rgm@gnu.org> writes:

> Perhaps eww-display-html should replace null bytes (with whatever the
> html standard says is appropriate) before calling
> libxml-parse-html-region. It already replaces CRLF.

Chrome at least just strips the null byte completely.

There is apparently a class of attacks that uses the null character
for nefarious purposes, so how about something like this:

diff --git a/lisp/net/eww.el b/lisp/net/eww.el
index 1cc4557ce1..9b57bc43e4 100644
--- a/lisp/net/eww.el
+++ b/lisp/net/eww.el
@@ -448,8 +448,8 @@ eww-display-html
 		    (decode-coding-region (point) (point-max) encode)
 		  (coding-system-error nil))
                 (save-excursion
-                  ;; Remove CRLF before parsing.
-                  (while (re-search-forward "\r$" nil t)
+                  ;; Remove CRLF and NULL before parsing.
+                  (while (re-search-forward "\r$\\|\000" nil t)
                     (replace-match "" t t)))
 		(libxml-parse-html-region (point) (point-max))))))
 	(source (and (null document)





^ permalink raw reply related	[flat|nested] 15+ messages in thread

* bug#34469: 26.1; EWW stops renderring web page on null byte
  2019-02-19 10:06       ` Robert Pluim
@ 2019-02-19 16:30         ` Eli Zaretskii
  2019-02-19 17:37           ` Robert Pluim
  0 siblings, 1 reply; 15+ messages in thread
From: Eli Zaretskii @ 2019-02-19 16:30 UTC (permalink / raw)
  To: Robert Pluim; +Cc: 34469, nicholasdrozd

> From: Robert Pluim <rpluim@gmail.com>
> Date: Tue, 19 Feb 2019 11:06:37 +0100
> Cc: 34469@debbugs.gnu.org, Nicholas Drozd <nicholasdrozd@gmail.com>
> 
> Glenn Morris <rgm@gnu.org> writes:
> 
> > Perhaps eww-display-html should replace null bytes (with whatever the
> > html standard says is appropriate) before calling
> > libxml-parse-html-region. It already replaces CRLF.
> 
> Chrome at least just strips the null byte completely.
> 
> There is apparently a class of attacks that uses the null character
> for nefarious purposes, so how about something like this:
> 
> diff --git a/lisp/net/eww.el b/lisp/net/eww.el
> index 1cc4557ce1..9b57bc43e4 100644
> --- a/lisp/net/eww.el
> +++ b/lisp/net/eww.el
> @@ -448,8 +448,8 @@ eww-display-html
>  		    (decode-coding-region (point) (point-max) encode)
>  		  (coding-system-error nil))
>                  (save-excursion
> -                  ;; Remove CRLF before parsing.
> -                  (while (re-search-forward "\r$" nil t)
> +                  ;; Remove CRLF and NULL before parsing.
> +                  (while (re-search-forward "\r$\\|\000" nil t)
>                      (replace-match "" t t)))

It is un-Emacsy, IMO, to remove content without a trace.  (CR is
different: we simply convert text to Unix LF-only EOL format.)  So I'd
suggest to replace with "^@" or "\000" or "NUL" or something to that
effect.  Even U+FFFD would be better than removing.

(We could get fancy and have a defcustom for those who do want the
null bytes removed.)

Thanks.





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#34469: 26.1; EWW stops renderring web page on null byte
  2019-02-19 16:30         ` Eli Zaretskii
@ 2019-02-19 17:37           ` Robert Pluim
  2019-02-19 18:11             ` Eli Zaretskii
  0 siblings, 1 reply; 15+ messages in thread
From: Robert Pluim @ 2019-02-19 17:37 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 34469, nicholasdrozd

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Robert Pluim <rpluim@gmail.com>
>> Date: Tue, 19 Feb 2019 11:06:37 +0100
>> Cc: 34469@debbugs.gnu.org, Nicholas Drozd <nicholasdrozd@gmail.com>
>> 
>> Glenn Morris <rgm@gnu.org> writes:
>> 
>> > Perhaps eww-display-html should replace null bytes (with whatever the
>> > html standard says is appropriate) before calling
>> > libxml-parse-html-region. It already replaces CRLF.
>> 
>> Chrome at least just strips the null byte completely.
>> 
>> There is apparently a class of attacks that uses the null character
>> for nefarious purposes, so how about something like this:
>> 
>> diff --git a/lisp/net/eww.el b/lisp/net/eww.el
>> index 1cc4557ce1..9b57bc43e4 100644
>> --- a/lisp/net/eww.el
>> +++ b/lisp/net/eww.el
>> @@ -448,8 +448,8 @@ eww-display-html
>>  		    (decode-coding-region (point) (point-max) encode)
>>  		  (coding-system-error nil))
>>                  (save-excursion
>> -                  ;; Remove CRLF before parsing.
>> -                  (while (re-search-forward "\r$" nil t)
>> +                  ;; Remove CRLF and NULL before parsing.
>> +                  (while (re-search-forward "\r$\\|\000" nil t)
>>                      (replace-match "" t t)))
>
> It is un-Emacsy, IMO, to remove content without a trace.  (CR is
> different: we simply convert text to Unix LF-only EOL format.)  So I'd
> suggest to replace with "^@" or "\000" or "NUL" or something to that
> effect.  Even U+FFFD would be better than removing.
>

Since this is all due to a C-ism in the handling of content, Iʼd vote
for "\0", although this is inside Emacs, so perhaps "^@" is best.

> (We could get fancy and have a defcustom for those who do want the
> null bytes removed.)

I really donʼt think this is something that needs to be configurable.

Robert





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#34469: 26.1; EWW stops renderring web page on null byte
  2019-02-19 17:37           ` Robert Pluim
@ 2019-02-19 18:11             ` Eli Zaretskii
  2019-02-20 18:48               ` Robert Pluim
  0 siblings, 1 reply; 15+ messages in thread
From: Eli Zaretskii @ 2019-02-19 18:11 UTC (permalink / raw)
  To: Robert Pluim; +Cc: 34469, nicholasdrozd

> From: Robert Pluim <rpluim@gmail.com>
> Cc: 34469@debbugs.gnu.org,  nicholasdrozd@gmail.com
> Date: Tue, 19 Feb 2019 18:37:26 +0100
> 
> Since this is all due to a C-ism in the handling of content, Iʼd vote
> for "\0", although this is inside Emacs, so perhaps "^@" is best.

Either is fine with me.

> > (We could get fancy and have a defcustom for those who do want the
> > null bytes removed.)
> 
> I really donʼt think this is something that needs to be configurable.

Neither do I.





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#34469: 26.1; EWW stops renderring web page on null byte
  2019-02-19 18:11             ` Eli Zaretskii
@ 2019-02-20 18:48               ` Robert Pluim
  2019-02-27 11:31                 ` Robert Pluim
  0 siblings, 1 reply; 15+ messages in thread
From: Robert Pluim @ 2019-02-20 18:48 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 34469, nicholasdrozd

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Robert Pluim <rpluim@gmail.com>
>> Cc: 34469@debbugs.gnu.org,  nicholasdrozd@gmail.com
>> Date: Tue, 19 Feb 2019 18:37:26 +0100
>> 
>> Since this is all due to a C-ism in the handling of content, Iʼd vote
>> for "\0", although this is inside Emacs, so perhaps "^@" is best.
>
> Either is fine with me.

Since the web page that triggered this was showing C code, Iʼve gone
for the "\0" option.

2019-02-20  Robert Pluim  <rpluim@gmail.com>

	* lisp/net/eww.el (eww-display-html): Replace NULL characters with
	"\0", as libxml can't handle embedded NULLs.
diff --git i/lisp/net/eww.el w/lisp/net/eww.el
index 555b3bd591..06075b1ebd 100644
--- i/lisp/net/eww.el
+++ w/lisp/net/eww.el
@@ -462,10 +462,12 @@ eww-display-html
 		(condition-case nil
 		    (decode-coding-region (point) (point-max) encode)
 		  (coding-system-error nil))
-                (save-excursion
-                  ;; Remove CRLF before parsing.
-                  (while (re-search-forward "\r$" nil t)
-                    (replace-match "" t t)))
+		(save-excursion
+		  ;; Remove CRLF and NULL before parsing.
+                  (while (re-search-forward "\\(\r$\\)\\|\\(\000\\)" nil t)
+                    (replace-match (if (match-beginning 1)
+                                       ""
+                                     "\\0") t t)))
 		(libxml-parse-html-region (point) (point-max))))))
 	(source (and (null document)
 		     (buffer-substring (point) (point-max)))))





^ permalink raw reply related	[flat|nested] 15+ messages in thread

* bug#34469: 26.1; EWW stops renderring web page on null byte
  2019-02-20 18:48               ` Robert Pluim
@ 2019-02-27 11:31                 ` Robert Pluim
  2019-02-27 15:55                   ` Eli Zaretskii
  0 siblings, 1 reply; 15+ messages in thread
From: Robert Pluim @ 2019-02-27 11:31 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 34469, nicholasdrozd

Robert Pluim <rpluim@gmail.com> writes:

Ping!

Eli, release or master?

> 2019-02-20  Robert Pluim  <rpluim@gmail.com>
>
> 	* lisp/net/eww.el (eww-display-html): Replace NULL characters with
> 	"\0", as libxml can't handle embedded NULLs.
> diff --git i/lisp/net/eww.el w/lisp/net/eww.el
> index 555b3bd591..06075b1ebd 100644
> --- i/lisp/net/eww.el
> +++ w/lisp/net/eww.el
> @@ -462,10 +462,12 @@ eww-display-html
>  		(condition-case nil
>  		    (decode-coding-region (point) (point-max) encode)
>  		  (coding-system-error nil))
> -                (save-excursion
> -                  ;; Remove CRLF before parsing.
> -                  (while (re-search-forward "\r$" nil t)
> -                    (replace-match "" t t)))
> +		(save-excursion
> +		  ;; Remove CRLF and NULL before parsing.
> +                  (while (re-search-forward "\\(\r$\\)\\|\\(\000\\)" nil t)
> +                    (replace-match (if (match-beginning 1)
> +                                       ""
> +                                     "\\0") t t)))
>  		(libxml-parse-html-region (point) (point-max))))))
>  	(source (and (null document)
>  		     (buffer-substring (point) (point-max)))))





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#34469: 26.1; EWW stops renderring web page on null byte
  2019-02-27 11:31                 ` Robert Pluim
@ 2019-02-27 15:55                   ` Eli Zaretskii
  2019-02-27 16:21                     ` Robert Pluim
  0 siblings, 1 reply; 15+ messages in thread
From: Eli Zaretskii @ 2019-02-27 15:55 UTC (permalink / raw)
  To: Robert Pluim; +Cc: 34469, nicholasdrozd

> From: Robert Pluim <rpluim@gmail.com>
> Cc: 34469@debbugs.gnu.org,  nicholasdrozd@gmail.com
> Date: Wed, 27 Feb 2019 12:31:45 +0100
> 
> Robert Pluim <rpluim@gmail.com> writes:
> 
> Ping!
> 
> Eli, release or master?

Master, please.





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#34469: 26.1; EWW stops renderring web page on null byte
  2019-02-27 15:55                   ` Eli Zaretskii
@ 2019-02-27 16:21                     ` Robert Pluim
  0 siblings, 0 replies; 15+ messages in thread
From: Robert Pluim @ 2019-02-27 16:21 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 34469, nicholasdrozd

tags 34469 fixed
close 34469 27.1
quit

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Robert Pluim <rpluim@gmail.com>
>> Cc: 34469@debbugs.gnu.org,  nicholasdrozd@gmail.com
>> Date: Wed, 27 Feb 2019 12:31:45 +0100
>> 
>> Robert Pluim <rpluim@gmail.com> writes:
>> 
>> Ping!
>> 
>> Eli, release or master?
>
> Master, please.

Done as d07f3aae48
Closing.

Robert





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#34469: 26.1; EWW stops renderring web page on null byte
  2019-02-13 12:27 ` bug#34469: 26.1; EWW stops renderring web page on null byte Lukasz Pawelczyk
  2019-02-14  4:44   ` Nicholas Drozd
  2019-02-16 18:13   ` Nicholas Drozd
@ 2019-02-28  1:52   ` Paul Eggert
  2019-02-28  8:46     ` Robert Pluim
  2 siblings, 1 reply; 15+ messages in thread
From: Paul Eggert @ 2019-02-28  1:52 UTC (permalink / raw)
  To: Robert Pluim; +Cc: 34469, Lukasz Pawelczyk, Nicholas Drozd

[-- Attachment #1: Type: text/plain, Size: 995 bytes --]

Thanks for fixing that bug. However, replacing NUL with \0 sounds iffy.
Even if we assume that a web page contains C-like code, the replacement
would mishandle a NUL followed by an octal digit, since the replacement
would look like \07 which would be interpreted as a BEL character, not
as a NULL followed by a digit 7. And web pages do not typically contain
C code, so the replacement \0 might cause other trouble.

Instead, it sounds better to replace NUL with the four-character
sequence "&#0;", as this is a standard HTML way to represent a NUL
character. I installed the attached patch to do this.

In my little tests with this patch, libxml2 typically handled &#0; by
discarding it and continuing to parse, which is better than ignoring the
rest of the input. In some cases libxml2 handles &#0; by discarding
later input up to a delimiter; although this is bad, it's a libxml2 bug
that attackers can exploit independently of what Emacs does with NUL,
since attackers can simply use &#0;.


[-- Attachment #2: 0001-Escape-HTML-NUL-as-0-in-eww.patch --]
[-- Type: text/x-patch, Size: 1254 bytes --]

From f7c4d5ce2399fc86b130fd55d3da2c313403f638 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Wed, 27 Feb 2019 14:35:51 -0800
Subject: [PATCH] Escape HTML NUL as &#0; in eww

* lisp/net/eww.el (eww-display-html): Escape NUL as &#0; as this
is more appropriate for HTML.
---
 lisp/net/eww.el | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/lisp/net/eww.el b/lisp/net/eww.el
index 3ec6c1cfd3..3e9334532c 100644
--- a/lisp/net/eww.el
+++ b/lisp/net/eww.el
@@ -471,11 +471,9 @@ eww-display-html
 		    (decode-coding-region (point) (point-max) encode)
 		  (coding-system-error nil))
 		(save-excursion
-		  ;; Remove CRLF and NULL before parsing.
-                  (while (re-search-forward "\\(\r$\\)\\|\\(\000\\)" nil t)
-                    (replace-match (if (match-beginning 1)
-                                       ""
-                                     "\\0") t t)))
+		  ;; Remove CRLF and replace NUL with &#0; before parsing.
+		  (while (re-search-forward "\\(\r$\\)\\|\0" nil t)
+		    (replace-match (if (match-beginning 1) "" "&#0;") t t)))
 		(libxml-parse-html-region (point) (point-max))))))
 	(source (and (null document)
 		     (buffer-substring (point) (point-max)))))
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* bug#34469: 26.1; EWW stops renderring web page on null byte
  2019-02-28  1:52   ` Paul Eggert
@ 2019-02-28  8:46     ` Robert Pluim
  0 siblings, 0 replies; 15+ messages in thread
From: Robert Pluim @ 2019-02-28  8:46 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 34469, Lukasz Pawelczyk, Nicholas Drozd

Paul Eggert <eggert@cs.ucla.edu> writes:

> Thanks for fixing that bug. However, replacing NUL with \0 sounds iffy.
> Even if we assume that a web page contains C-like code, the replacement
> would mishandle a NUL followed by an octal digit, since the replacement
> would look like \07 which would be interpreted as a BEL character, not
> as a NULL followed by a digit 7. And web pages do not typically contain
> C code, so the replacement \0 might cause other trouble.
>

In my sample of 1 website, 100% of them contained C code :-)

> Instead, it sounds better to replace NUL with the four-character
> sequence "&#0;", as this is a standard HTML way to represent a NUL
> character. I installed the attached patch to do this.
>

OK by me.

Robert





^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-02-28  8:46 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CGME20190213122718eucas1p26156656a2376e5055452ac4d0385fc6d@eucas1p2.samsung.com>
2019-02-13 12:27 ` bug#34469: 26.1; EWW stops renderring web page on null byte Lukasz Pawelczyk
2019-02-14  4:44   ` Nicholas Drozd
2019-02-14 19:14     ` Eli Zaretskii
2019-02-16 18:13   ` Nicholas Drozd
2019-02-19  1:12     ` Glenn Morris
2019-02-19 10:06       ` Robert Pluim
2019-02-19 16:30         ` Eli Zaretskii
2019-02-19 17:37           ` Robert Pluim
2019-02-19 18:11             ` Eli Zaretskii
2019-02-20 18:48               ` Robert Pluim
2019-02-27 11:31                 ` Robert Pluim
2019-02-27 15:55                   ` Eli Zaretskii
2019-02-27 16:21                     ` Robert Pluim
2019-02-28  1:52   ` Paul Eggert
2019-02-28  8:46     ` Robert Pluim

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).