From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.ciao.gmane.io!not-for-mail From: "Basil L. Contovounesios" Newsgroups: gmane.emacs.bugs Subject: bug#40532: 28.0.50; eww/shr: Anchor link does not work Date: Fri, 08 May 2020 02:10:02 +0100 Message-ID: <87h7wrkz9x.fsf@tcd.ie> References: <87sghckn7p.fsf@milkypond.org> <87o8rk597f.fsf@milkypond.org> <87blnj92dy.fsf@tcd.ie> <87imhr65p2.fsf@tcd.ie> <878sijpqr0.fsf@milkypond.org> <87y2qjxqbu.fsf@tcd.ie> <87d07p4npz.fsf@gnus.org> <87zhatxojk.fsf@tcd.ie> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Injection-Info: ciao.gmane.io; posting-host="ciao.gmane.io:159.69.161.202"; logging-data="40597"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux) Cc: 40532@debbugs.gnu.org, Arnaud Fontaine To: Lars Ingebrigtsen Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Fri May 08 03:11:10 2020 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1jWrXi-000AS8-LW for geb-bug-gnu-emacs@m.gmane-mx.org; Fri, 08 May 2020 03:11:10 +0200 Original-Received: from localhost ([::1]:42350 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jWrXh-0005MT-O6 for geb-bug-gnu-emacs@m.gmane-mx.org; Thu, 07 May 2020 21:11:09 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:53286) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jWrXa-0005MI-Sa for bug-gnu-emacs@gnu.org; Thu, 07 May 2020 21:11:02 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:60880) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1jWrXa-0000r0-Jc for bug-gnu-emacs@gnu.org; Thu, 07 May 2020 21:11:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1jWrXa-0005nZ-Ek for bug-gnu-emacs@gnu.org; Thu, 07 May 2020 21:11:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: "Basil L. Contovounesios" Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Fri, 08 May 2020 01:11:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 40532 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch Original-Received: via spool by 40532-submit@debbugs.gnu.org id=B40532.158890021322227 (code B ref 40532); Fri, 08 May 2020 01:11:02 +0000 Original-Received: (at 40532) by debbugs.gnu.org; 8 May 2020 01:10:13 +0000 Original-Received: from localhost ([127.0.0.1]:44193 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jWrWn-0005mQ-BW for submit@debbugs.gnu.org; Thu, 07 May 2020 21:10:13 -0400 Original-Received: from mail-wr1-f66.google.com ([209.85.221.66]:40478) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jWrWl-0005mC-VK for 40532@debbugs.gnu.org; Thu, 07 May 2020 21:10:12 -0400 Original-Received: by mail-wr1-f66.google.com with SMTP id e16so8622973wra.7 for <40532@debbugs.gnu.org>; Thu, 07 May 2020 18:10:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tcd-ie.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version; bh=5MrrHw7kyGpQJL6ADW2EH5AEzvr2sFiOO4x2VYckR2g=; b=v6uouM8GeHGmOifmQfSdpW87jhfkl8nsFMLX7y1K3+gOi9D1WfFuhoqXrMajT+r5Tr cWwxeK3a2mCF0oqSZNY2wVqTsIlJeHypyoEu1D+zpaWcp1DOq39F3ZSdFFziEB8+WnfG 5TLs6GrB7u3QmZNdtpuJAHwgu1RPY6UhllrTxSuU3FL6PW2T9oi+pN6CPo95FCoKX9UH a1DtmBBumfpJABa2yEwh+0CFKIG6NkT8c3WCxA6BT3pZF2uLALfK7EaJK7jnjtv6yCha ZBRwr53ysCpdli9HZdL1eBHt+Iun+Cuz4IzUI3OjLywApY6Z5GMbNojRpuvhjFmNhc96 E0Mg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:references:date:in-reply-to :message-id:user-agent:mime-version; bh=5MrrHw7kyGpQJL6ADW2EH5AEzvr2sFiOO4x2VYckR2g=; b=JeaQEwX8KwzRwBWh7ska8tXso4B33Ofc/iI6USDVw1YpII//48IdCssG8Ou6uX2Ct+ 0N7XBgnOlu/Yt+skovX0rcj4U9rbBJGRPvReAAQtTO+JwSQF2YqGbY7qTXxqwfQn1pjM VPd7mJaP3qi9paTcRLms2Jw63Mt8NkdMQDM2IakZFzu8Qnsdr2RoGuFcHAgziqxW+DaA 2xVEKOKxldOmw9FOl+6WZ9JFtd7lOLtNxKoBAKoK31WxYZu6/rVGDwoVV4AkVPjLSVgY xWW/4fBtxbhWE9o0gD/92WfHJnTUjKmvd0CYkZivYiL3C4en9dEqbDi4mF2Mbob2XZSn DxYg== X-Gm-Message-State: AGi0PuZVnrihDVs+s03N95ZNPuTEYklSMp/NfurcnaUbcS4InhG8zFjk Qe3fFkChkadBRfbV1wK3qinZig== X-Google-Smtp-Source: APiQypIJ837s+8YrMcFMcqTEryZjL9pIxtT/zjw+TRZN2d+JtpDwtO2JWdzIdGpSvcxOCDbIw/wmpA== X-Received: by 2002:a5d:548c:: with SMTP id h12mr1994634wrv.373.1588900206031; Thu, 07 May 2020 18:10:06 -0700 (PDT) Original-Received: from localhost ([2a02:8084:20e2:c380:1f68:7ff5:120d:64e]) by smtp.gmail.com with ESMTPSA id k17sm10555624wmi.10.2020.05.07.18.10.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 May 2020 18:10:04 -0700 (PDT) In-Reply-To: <87zhatxojk.fsf@tcd.ie> (Basil L. Contovounesios's message of "Thu, 30 Apr 2020 11:20:47 +0100") X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:179904 Archived-At: --=-=-= Content-Type: text/plain "Basil L. Contovounesios" writes: > Lars Ingebrigtsen writes: > >> "Basil L. Contovounesios" writes: >> >>> Perhaps shr should unconditionally (without requiring shr-target-id to >>> be bound) mark all possible targets with text properties during >>> rendering. I'll see if I can get this or any other way working today or >>> tomorrow, but either way this won't be fixed in Emacs 27. >> >> Hm... marking all the targets sounds like overkill (if I understand you >> correctly, but it's likely that I don't). :-) > > Marking all the targets is what I meant, but I was only thinking aloud, > not seriously suggesting it. I'm still stepping through eww and shr > trying to figure out why shr-target-id doesn't work sometimes (my > current guess is shr modifies the DOM to avoid re-rendering tables, so > shr-target-id doesn't get set in such cases); progress is slow. :( Now that I understand what's happening better, I think it's necessary to mark all targets, and I don't think it's overkill. The problem is indeed that shr caches rendered tables. What this means is that, if eww first visits a URL with no #target, then its tables will be rendered and cached with no shr-target-id text property attached to any fragment identifiers contained within them. If eww-follow-link is subsequently invoked on a #target within the same page, then the cached table contents will be inserted and so there will be no shr-target-id property in the buffer. The recipe provided in [1] illustrates this. [1]: https://debbugs.gnu.org/40532#51 OTOH if the shr-target-id property is always attached to the relevant 'id' and (deprecated) 'name' attributes, then cached table contents will still be searchable. This shouldn't be overkill in terms of performance because yet another text property on a subet of the DOM should be comparatively cheap, right? So WDYT of the following fix? Can you think of any better solutions? --=-=-= Content-Type: text/x-diff Content-Disposition: inline; filename=0001-Propertize-all-shr-fragment-IDs-as-shr-target-id.patch >From ef0058cb4a70b1d78e55f6b61ff0e1e8ffad9169 Mon Sep 17 00:00:00 2001 From: "Basil L. Contovounesios" Date: Fri, 8 May 2020 00:25:38 +0100 Subject: [PATCH] Propertize all shr fragment IDs as shr-target-id * lisp/net/shr.el (shr-target-id): Add docstring. (shr-descend, shr-tag-a): Display dummy anchor characters as the empty string. Give all relevant 'id' or 'name' fragment identifier attributes the shr-target-id text property. This ensures that cached content, such as tables, retains the property across renders. (Bug#40532) * lisp/net/eww.el (eww-display-html): Adapt shr-target-id property search accordingly. --- lisp/net/eww.el | 19 ++++++++++--------- lisp/net/shr.el | 26 ++++++++++++++------------ 2 files changed, 24 insertions(+), 21 deletions(-) diff --git a/lisp/net/eww.el b/lisp/net/eww.el index a6c1abdbb1..acb7cc7e40 100644 --- a/lisp/net/eww.el +++ b/lisp/net/eww.el @@ -26,13 +26,14 @@ (require 'cl-lib) (require 'format-spec) -(require 'shr) -(require 'url) -(require 'url-queue) -(require 'thingatpt) (require 'mm-url) (require 'puny) -(eval-when-compile (require 'subr-x)) ;; for string-trim +(require 'shr) +(require 'text-property-search) +(require 'thingatpt) +(require 'url) +(require 'url-queue) +(eval-when-compile (require 'subr-x)) (defgroup eww nil "Emacs Web Wowser" @@ -543,10 +544,10 @@ eww-display-html (goto-char point)) (shr-target-id (goto-char (point-min)) - (let ((point (next-single-property-change - (point-min) 'shr-target-id))) - (when point - (goto-char point)))) + (let ((match (text-property-search-forward + 'shr-target-id shr-target-id t))) + (when match + (goto-char (prop-match-beginning match))))) (t (goto-char (point-min)) ;; Don't leave point inside forms, because the normal eww diff --git a/lisp/net/shr.el b/lisp/net/shr.el index 1f80ab74db..ea174e5d77 100644 --- a/lisp/net/shr.el +++ b/lisp/net/shr.el @@ -185,13 +185,15 @@ shr-base (defvar shr-depth 0) (defvar shr-warning nil) (defvar shr-ignore-cache nil) -(defvar shr-target-id nil) (defvar shr-table-separator-length 1) (defvar shr-table-separator-pixel-width 0) (defvar shr-table-id nil) (defvar shr-current-font nil) (defvar shr-internal-bullet nil) +(defvar shr-target-id nil + "Target fragment identifier anchor.") + (defvar shr-map (let ((map (make-sparse-keymap))) (define-key map "a" 'shr-show-alt-text) @@ -531,13 +533,13 @@ shr-descend (funcall function dom)) (t (shr-generic dom))) - (when (and shr-target-id - (equal (dom-attr dom 'id) shr-target-id)) + (when-let* ((id (dom-attr dom 'id))) ;; If the element was empty, we don't have anything to put the ;; anchor on. So just insert a dummy character. (when (= start (point)) - (insert "*")) - (put-text-property start (1+ start) 'shr-target-id shr-target-id)) + (insert ?*) + (put-text-property (1- (point)) (point) 'display "")) + (put-text-property start (1+ start) 'shr-target-id id)) ;; If style is set, then this node has set the color. (when style (shr-colorize-region @@ -1497,14 +1499,14 @@ shr-tag-a (start (point)) shr-start) (shr-generic dom) - (when (and shr-target-id - (equal (dom-attr dom 'name) shr-target-id)) - ;; We have a zero-length element, so just - ;; insert... something. + (when-let* ((id (or (dom-attr dom 'id) + ;; Obsolete since HTML5. + (dom-attr dom 'name)))) + ;; We have an empty element, so just insert... something. (when (= start (point)) - (shr-ensure-newline) - (insert " ")) - (put-text-property start (1+ start) 'shr-target-id shr-target-id)) + (insert ?\s) + (put-text-property (1- (point)) (point) 'display "")) + (put-text-property start (1+ start) 'shr-target-id id)) (when url (shr-urlify (or shr-start start) (shr-expand-url url) title)))) -- 2.26.2 --=-=-= Content-Type: text/plain Thanks, -- Basil --=-=-=--