From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Stefan Kangas Newsgroups: gmane.emacs.bugs Subject: bug#48211: 28.0.50; eww strips whitespace between elements Date: Mon, 3 May 2021 19:35:35 -0500 Message-ID: References: <87y2cvl6eg.fsf@tcd.ie> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="24259"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux) Cc: Lars Ingebrigtsen , 48211@debbugs.gnu.org To: "Basil L. Contovounesios" Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Tue May 04 02:37:36 2021 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1ldj4B-0006DD-SY for geb-bug-gnu-emacs@m.gmane-mx.org; Tue, 04 May 2021 02:37:35 +0200 Original-Received: from localhost ([::1]:46702 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ldj4A-0006lT-Tu for geb-bug-gnu-emacs@m.gmane-mx.org; Mon, 03 May 2021 20:37:35 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:52176) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ldj2j-00066s-5N for bug-gnu-emacs@gnu.org; Mon, 03 May 2021 20:36:05 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:38080) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1ldj2g-0001iS-Mr for bug-gnu-emacs@gnu.org; Mon, 03 May 2021 20:36:04 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1ldj2g-0000xj-Kg for bug-gnu-emacs@gnu.org; Mon, 03 May 2021 20:36:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Stefan Kangas Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Tue, 04 May 2021 00:36:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 48211 X-GNU-PR-Package: emacs Original-Received: via spool by 48211-submit@debbugs.gnu.org id=B48211.16200885433652 (code B ref 48211); Tue, 04 May 2021 00:36:02 +0000 Original-Received: (at 48211) by debbugs.gnu.org; 4 May 2021 00:35:43 +0000 Original-Received: from localhost ([127.0.0.1]:49617 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ldj2N-0000wo-Jc for submit@debbugs.gnu.org; Mon, 03 May 2021 20:35:43 -0400 Original-Received: from mail-pg1-f173.google.com ([209.85.215.173]:34324) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ldj2M-0000wh-8y for 48211@debbugs.gnu.org; Mon, 03 May 2021 20:35:42 -0400 Original-Received: by mail-pg1-f173.google.com with SMTP id z16so5153580pga.1 for <48211@debbugs.gnu.org>; Mon, 03 May 2021 17:35:42 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:in-reply-to:references:user-agent :mime-version:date:message-id:subject:to:cc; bh=mlFBIx3IaBhe7uQixLd2j9mDE6F85fKP2vS/PPkedtQ=; b=aa+6FLYaLSdSoZ3xjFN8nNWiJ3QpEQevvKM1HAUj7BV8NG0APpCVcaVhU9e+O8Xy64 7gVWnY6HGa6A8TfNLXnCja54O4HEnqGTL+0bwuUqHKsCDiZY9u5b/+WQOw5nXqXFPQn/ wkUwh6o4sTcXrtvy+6aaNxcVi1s+ohtMSG0AZOK8bzBMizt/yPkHIAnCEtw++jEiFde1 eBykucpj5YG7viX8xIyJkQTsiBL9XLmwkJoovP/YFxhP7J+xWc7Kt+QCw1jfEqothJgh 7mlt3vS7qO57DSK+ue6V3kZ8tv5zKPFubmQDD7/n9C8UCQdvHBubdcD6uTBijDJ/ZZla /JXw== X-Gm-Message-State: AOAM532XEwro65Sqx7YqUA/VbF2A9aOtuGf0gsCR/MplWNvjTyb/DBQO Zef27l2T6wziBJFJg87Rpz+o48q2NLq1Mz8YBfM= X-Google-Smtp-Source: ABdhPJzrRunWXyTuiSAGfMceq9zGy2dnFw/5pIDRwiA1mR1BAWECxah6Xxq+kMwkqo/hI8Biunm5xCeTrFpGaFhX2ro= X-Received: by 2002:aa7:8113:0:b029:278:dfa6:8ac with SMTP id b19-20020aa781130000b0290278dfa608acmr21184109pfi.57.1620088536477; Mon, 03 May 2021 17:35:36 -0700 (PDT) Original-Received: from 753933720722 named unknown by gmailapi.google.com with HTTPREST; Mon, 3 May 2021 19:35:35 -0500 In-Reply-To: <87y2cvl6eg.fsf@tcd.ie> (Basil L. Contovounesios's message of "Tue, 04 May 2021 00:55:03 +0100") X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:205567 Archived-At: "Basil L. Contovounesios" writes: > I think this is because libxml-parse-html-region specifies > HTML_PARSE_NOBLANKS: > > Return CDATA sections (like ) as text nodes. > 3c2317e891 2010-12-06 17:59:52 +0100 > https://git.sv.gnu.org/cgit/emacs.git/commit/?id=3c2317e89100833812a7194c0d9d39ae0f52cb33 Hmm, okay. For now, I'm seeing this issue with basically any tag that libxml2 does not already know about, e.g. "" or "". This is what I came up with before reading Basil's reply: (with-temp-buffer (insert "

foo bar

") (libxml-parse-html-region (point-min) (point-max))) => (html nil (body nil (p nil (tt nil "foo") " " (tt nil "bar")))) (with-temp-buffer (insert "

foo bar

") (libxml-parse-html-region (point-min) (point-max))) => (html nil (body nil (p nil (mark nil "foo") (mark nil "bar")))) I guess this is a bug in libxml2, so I reported it here: https://gitlab.gnome.org/GNOME/libxml2/-/issues/247 FWIW, the below diff works around this bug for me. diff --git a/lisp/net/shr.el b/lisp/net/shr.el index cbdeb65ba8..3eb3a5bc49 100644 --- a/lisp/net/shr.el +++ b/lisp/net/shr.el @@ -1485,6 +1485,12 @@ shr-tag-tt ;; The `tt' tag is deprecated in favor of `code'. (shr-tag-code dom)) +(defun shr-tag-mark (dom) + (shr-generic dom) + ;; Hack to work around bug in libxml2 (Bug#48211): + ;; https://gitlab.gnome.org/GNOME/libxml2/-/issues/247 + (insert " ")) + (defun shr-tag-ins (cont) (let* ((start (point)) (color "green")