From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Robert Pluim Newsgroups: gmane.emacs.bugs Subject: bug#34469: 26.1; EWW stops renderring web page on null byte Date: Wed, 20 Feb 2019 19:48:50 +0100 Message-ID: References: <02sgwk1sza.fsf@fencepost.gnu.org> <83mumrivuv.fsf@gnu.org> <83bm37ir7e.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="119085"; mail-complaints-to="usenet@blaine.gmane.org" Cc: 34469@debbugs.gnu.org, nicholasdrozd@gmail.com To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Wed Feb 20 19:54:22 2019 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1gwX0f-000Umh-S2 for geb-bug-gnu-emacs@m.gmane.org; Wed, 20 Feb 2019 19:54:22 +0100 Original-Received: from localhost ([127.0.0.1]:45757 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gwX0Y-0006NP-71 for geb-bug-gnu-emacs@m.gmane.org; Wed, 20 Feb 2019 13:54:14 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:40134) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gwWvZ-0001xk-1r for bug-gnu-emacs@gnu.org; Wed, 20 Feb 2019 13:49:05 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gwWvY-0003En-3t for bug-gnu-emacs@gnu.org; Wed, 20 Feb 2019 13:49:04 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]:57687) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1gwWvX-0003EV-Td for bug-gnu-emacs@gnu.org; Wed, 20 Feb 2019 13:49:04 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1gwWvX-00013W-Om for bug-gnu-emacs@gnu.org; Wed, 20 Feb 2019 13:49:03 -0500 X-Loop: help-debbugs@gnu.org In-Reply-To: Resent-From: Robert Pluim Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 20 Feb 2019 18:49:03 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 34469 X-GNU-PR-Package: emacs Original-Received: via spool by 34469-submit@debbugs.gnu.org id=B34469.15506885404035 (code B ref 34469); Wed, 20 Feb 2019 18:49:03 +0000 Original-Received: (at 34469) by debbugs.gnu.org; 20 Feb 2019 18:49:00 +0000 Original-Received: from localhost ([127.0.0.1]:58451 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gwWvT-00012z-TY for submit@debbugs.gnu.org; Wed, 20 Feb 2019 13:49:00 -0500 Original-Received: from mail-wr1-f52.google.com ([209.85.221.52]:42416) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gwWvS-00012m-Du for 34469@debbugs.gnu.org; Wed, 20 Feb 2019 13:48:58 -0500 Original-Received: by mail-wr1-f52.google.com with SMTP id r5so13886399wrg.9 for <34469@debbugs.gnu.org>; Wed, 20 Feb 2019 10:48:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:references:mail-copies-to:gmane-reply-to-list :date:message-id:mime-version:content-transfer-encoding; bh=62Da0mtVJqRj8ZLRv5Ucpr1YynQh4sFXFW0SmEDM1Aw=; b=oZ2Xoa6eTZ30kybDyua/OE6sSOSsBViHS5CBL36RkX++s+JapqILUoJbyquRVmM2p0 gpXqLuW/gcLRk1zd08h9M1W/iwlz1bsDNe7aQzvn9Heax4JOAJBEvZcGOmWl9ztexGRq kiJOCghCGgLghz1FrsxwcUQ/e3YAPpQXsmyiMGWqFdyUZJXotsrkL5WhX0Og5YcFU7Sw F04ET7cTmkxEzgKZwGsU9X/FsKeD1f0snzHG4zHmq19yaCfPkkMQEp+lmeVC4dpR9bOS 3Hk0519y7PBAIW/ylH5gtSPOQOGcO79rslAR/kU9sQ1asCj6g69rb6P4CoNU5ji/5dJw Iy4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:references:mail-copies-to :gmane-reply-to-list:date:message-id:mime-version :content-transfer-encoding; bh=62Da0mtVJqRj8ZLRv5Ucpr1YynQh4sFXFW0SmEDM1Aw=; b=TH0GKou3WrzKaMuH4WVm+qjpldpgiGXgUPzQP/s2GMjbcOrZxJLI8kUG5clbug5Zdf 4eEWTSz9u0T84G8UkJ+settD/Dqy4sHcLc6w9/dyS+ufVmdU3hmP+quv2OulIIyfaMbO 630xg5rjzquLEqqOVuQNYqqGxFGpBAsMUevtNKF/5mpWBX/KLRKB3HNgdPk07MVCla2S +1u7jR7kZdczdQOpVSGvsxNouD3QjEI7lEEJ9l51fwBACCxopmvOI6thw57qWaEtwLIc XUCRR9iEmLPM1Z94lapYpJdw92R9KyEVKTyk2lJWQLWLNfBQdN35mtEj6ARZBH7VOfkz WzZA== X-Gm-Message-State: AHQUAuZpaT7yJFhUyDjQH3HXokTUnWsU1fNNb+dB7u+NY7ilejHv9NLb +e0fF9yrFxjKBrWS7m6tCeM= X-Google-Smtp-Source: AHgI3IaobBMor6P48qIuTyS7+/XOMuiKHLEfYq4sRVEA/8uqbEA219zPU8nq1IdO0yiFEjlOazTxWQ== X-Received: by 2002:adf:efc4:: with SMTP id i4mr28158235wrp.42.1550688532191; Wed, 20 Feb 2019 10:48:52 -0800 (PST) Original-Received: from rpluim-mac ([2a01:e34:ecfc:a860:dda:cfc2:7168:6ad8]) by smtp.gmail.com with ESMTPSA id t9sm15633431wrx.73.2019.02.20.10.48.50 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Wed, 20 Feb 2019 10:48:51 -0800 (PST) Mail-Copies-To: never Gmane-Reply-To-List: yes X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.51.188.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:155595 Archived-At: Eli Zaretskii writes: >> From: Robert Pluim >> Cc: 34469@debbugs.gnu.org, nicholasdrozd@gmail.com >> Date: Tue, 19 Feb 2019 18:37:26 +0100 >>=20 >> Since this is all due to a C-ism in the handling of content, I=CA=BCd vo= te >> for "\0", although this is inside Emacs, so perhaps "^@" is best. > > Either is fine with me. Since the web page that triggered this was showing C code, I=CA=BCve gone for the "\0" option. 2019-02-20 Robert Pluim * lisp/net/eww.el (eww-display-html): Replace NULL characters with "\0", as libxml can't handle embedded NULLs. diff --git i/lisp/net/eww.el w/lisp/net/eww.el index 555b3bd591..06075b1ebd 100644 --- i/lisp/net/eww.el +++ w/lisp/net/eww.el @@ -462,10 +462,12 @@ eww-display-html (condition-case nil (decode-coding-region (point) (point-max) encode) (coding-system-error nil)) - (save-excursion - ;; Remove CRLF before parsing. - (while (re-search-forward "\r$" nil t) - (replace-match "" t t))) + (save-excursion + ;; Remove CRLF and NULL before parsing. + (while (re-search-forward "\\(\r$\\)\\|\\(\000\\)" nil t) + (replace-match (if (match-beginning 1) + "" + "\\0") t t))) (libxml-parse-html-region (point) (point-max)))))) (source (and (null document) (buffer-substring (point) (point-max)))))