From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Robert Pluim Newsgroups: gmane.emacs.bugs Subject: bug#34469: 26.1; EWW stops renderring web page on null byte Date: Tue, 19 Feb 2019 18:37:26 +0100 Message-ID: References: <02sgwk1sza.fsf@fencepost.gnu.org> <83mumrivuv.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="22085"; mail-complaints-to="usenet@blaine.gmane.org" Cc: 34469@debbugs.gnu.org, nicholasdrozd@gmail.com To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Tue Feb 19 18:38:21 2019 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1gw9LY-0005dS-ES for geb-bug-gnu-emacs@m.gmane.org; Tue, 19 Feb 2019 18:38:20 +0100 Original-Received: from localhost ([127.0.0.1]:52668 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gw9LX-0001li-AF for geb-bug-gnu-emacs@m.gmane.org; Tue, 19 Feb 2019 12:38:19 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:41958) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gw9LH-0001jL-68 for bug-gnu-emacs@gnu.org; Tue, 19 Feb 2019 12:38:04 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gw9LG-0004sv-Ag for bug-gnu-emacs@gnu.org; Tue, 19 Feb 2019 12:38:03 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]:56223) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1gw9LG-0004sk-67 for bug-gnu-emacs@gnu.org; Tue, 19 Feb 2019 12:38:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1gw9LF-0005Wx-P3 for bug-gnu-emacs@gnu.org; Tue, 19 Feb 2019 12:38:01 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Robert Pluim Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Tue, 19 Feb 2019 17:38:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 34469 X-GNU-PR-Package: emacs Original-Received: via spool by 34469-submit@debbugs.gnu.org id=B34469.155059785721227 (code B ref 34469); Tue, 19 Feb 2019 17:38:01 +0000 Original-Received: (at 34469) by debbugs.gnu.org; 19 Feb 2019 17:37:37 +0000 Original-Received: from localhost ([127.0.0.1]:55504 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gw9Kq-0005WJ-Tp for submit@debbugs.gnu.org; Tue, 19 Feb 2019 12:37:37 -0500 Original-Received: from mail-wr1-f52.google.com ([209.85.221.52]:33631) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gw9Kp-0005W6-Aa for 34469@debbugs.gnu.org; Tue, 19 Feb 2019 12:37:35 -0500 Original-Received: by mail-wr1-f52.google.com with SMTP id i12so22999829wrw.0 for <34469@debbugs.gnu.org>; Tue, 19 Feb 2019 09:37:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:references:mail-copies-to:gmane-reply-to-list :date:in-reply-to:message-id:mime-version:content-transfer-encoding; bh=t0nOa1Nx1FNaspwTPSG5os9Q7D7iSHPGa5fd5zj2mz0=; b=HebFMQlO0a2rvR5PGyXNet8Lj3t6qHj3AnJVXlpgbudDhxhDaFgSQDwW5ewEnnbDU7 ppm/ESNKxQTiaR/04Y/cO+yaQrY2/r8o41r5mDHuzE0kGzTs1C/NWC3PBazFJt7gp0Bi VN+a/TyK2S4P2XJ7z6vbm1H1CO2InsXUCl3cjszWj2hSwFLxV8JUkQhbFMJmmmTo5fMC sTYZD+ugpc0Tbs0Sw8In29r1XaYCFFQlz1xKZEgYZSJJ3mte5zChLHF0w6LR6X3hfCfA e5fX2zCpCpWd5tEPX3NzRG6pNQYha+EiirQUXPURayIbnuavkTWu0G99qj/rFxTEKNYy jfOg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:references:mail-copies-to :gmane-reply-to-list:date:in-reply-to:message-id:mime-version :content-transfer-encoding; bh=t0nOa1Nx1FNaspwTPSG5os9Q7D7iSHPGa5fd5zj2mz0=; b=dpKfgM0ndrs21YdRvumRP/ZUxogkAwyuLK3UIUS0gHhYCE7XHWVZiKlGhKGoPSYRGO 4HGvScgLwzb+BpY9RHUxM6t90uPqK9UgyiEc9R//Us4EdwaVj3guGVXV35lGLuGOKHJR 4gPaAgZ0DYT1EXtGkFAiicxYIaZJfhx+AXKxUIeab+j++ujaR8J0Z2W6hMXkwRf6uRGf STdU4c355OaiVX+xbsILAo7ohbjfEsv+wKJnM72PahcO7Sv0O/+ch8lHwX8SqyAOUfBR JknQDF074BEQMnw8S8Z9e5L0eSfwDo3EHusIdFz/f98PtGN+OXPtqXDELDT1VOz0I24Q igQQ== X-Gm-Message-State: AHQUAuY+IpyXi/nSBlCPiBBqpf3girdaYY/v5pspK4FKQHeQmJau4yse GD77rpxPK15z3TDwRaTzV8E= X-Google-Smtp-Source: AHgI3Ia3lcreH7aC7cQLC348TrsH01jHg1fG+nxjoW4q4mX3TQhVcp/QYihbocCpVrXCCyau9ucRbw== X-Received: by 2002:a5d:668b:: with SMTP id l11mr20989096wru.116.1550597849098; Tue, 19 Feb 2019 09:37:29 -0800 (PST) Original-Received: from rpluim-mac ([2a01:e34:ecfc:a860:c571:c640:baa2:65db]) by smtp.gmail.com with ESMTPSA id y139sm4227154wmd.22.2019.02.19.09.37.27 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 19 Feb 2019 09:37:27 -0800 (PST) Mail-Copies-To: never Gmane-Reply-To-List: yes In-Reply-To: <83mumrivuv.fsf@gnu.org> (Eli Zaretskii's message of "Tue, 19 Feb 2019 18:30:48 +0200") X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.51.188.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:155565 Archived-At: Eli Zaretskii writes: >> From: Robert Pluim >> Date: Tue, 19 Feb 2019 11:06:37 +0100 >> Cc: 34469@debbugs.gnu.org, Nicholas Drozd >>=20 >> Glenn Morris writes: >>=20 >> > Perhaps eww-display-html should replace null bytes (with whatever the >> > html standard says is appropriate) before calling >> > libxml-parse-html-region. It already replaces CRLF. >>=20 >> Chrome at least just strips the null byte completely. >>=20 >> There is apparently a class of attacks that uses the null character >> for nefarious purposes, so how about something like this: >>=20 >> diff --git a/lisp/net/eww.el b/lisp/net/eww.el >> index 1cc4557ce1..9b57bc43e4 100644 >> --- a/lisp/net/eww.el >> +++ b/lisp/net/eww.el >> @@ -448,8 +448,8 @@ eww-display-html >> (decode-coding-region (point) (point-max) encode) >> (coding-system-error nil)) >> (save-excursion >> - ;; Remove CRLF before parsing. >> - (while (re-search-forward "\r$" nil t) >> + ;; Remove CRLF and NULL before parsing. >> + (while (re-search-forward "\r$\\|\000" nil t) >> (replace-match "" t t))) > > It is un-Emacsy, IMO, to remove content without a trace. (CR is > different: we simply convert text to Unix LF-only EOL format.) So I'd > suggest to replace with "^@" or "\000" or "NUL" or something to that > effect. Even U+FFFD would be better than removing. > Since this is all due to a C-ism in the handling of content, I=CA=BCd vote for "\0", although this is inside Emacs, so perhaps "^@" is best. > (We could get fancy and have a defcustom for those who do want the > null bytes removed.) I really don=CA=BCt think this is something that needs to be configurable. Robert