From: Ivan Shmakov <ivan@siamics.net>
To: control@debbugs.gnu.org, 17958@debbugs.gnu.org
Subject: bug#17958: SHR: base handling broken (shr-parse-base, shr-expand-url)
Date: Thu, 14 Aug 2014 18:50:20 +0000 [thread overview]
Message-ID: <871tsirjmb.fsf@violet.siamics.net> (raw)
In-Reply-To: <8738eeconi.fsf@violet.siamics.net> (Ivan Shmakov's message of "Sun, 06 Jul 2014 18:45:05 +0000")
[-- Attachment #1: Type: text/plain, Size: 2506 bytes --]
retitle 17958 SHR: base handling broken (shr-parse-base, shr-expand-url)
tag 17958 + patch
thanks
>>>>> Ivan Shmakov <ivan@siamics.net> writes:
[…]
> However, I believe that the real culprit is shr-expand-url, which
> mishandles the nil ‘uri’ case:
> (mapcar (lambda (x) (shr-expand-url x "http://example.com/welcome/"))
> '("hello" "/world" nil))
> ;; ⇒
> ("http://example.com/welcome/hello"
> "http://example.com/world"
> "http://example.com")
> My expectation for the last result would be the ‘base’ argument
> unchanged (i. e., http://example.com/welcome/.)
> Thus, I suggest changing shr-expand-url to return not the 0th element
> of the (parsed) ‘base’ (see below), but the 3rd.
> 596 (cond ((or (not url)
> 597 (not base)
> 598 (string-match "\\`[a-z]*:" url))
> 599 ;; Absolute URL.
> 600 (or url (car base)))
> [1] https://tools.wmflabs.org/guc/?user=2001:db8:1337::cafe
As it seems, there’s one more issue with SHR “base” handling.
Namely, the <base href="" /> URI may actually itself be
relative, and SHR fails to handle that properly. As per [2]:
To set the frozen base URL, resolve the value of the element's href
content attribute relative to the Document's fallback base URL; if
this is successful, set the frozen base URL to the resulting
absolute URL, otherwise, set the frozen base URL to the fallback
base URL.
The SHR behavior doesn’t match the above. Consider, e. g.:
(let ((shr-base (shr-parse-base "http://example.org/")))
(shr-tag-base '((:href . "/relative")))
shr-base)
;; ⇒
("" "/" nil "/relative")
With the patch MIMEd (which also fixes the issue described in my
initial bug report), it instead gives what I deem to be the
correct result:
(let ((shr-base (shr-parse-base "http://example.org/")))
(shr-tag-base '((:href . "/relative")))
shr-base)
;; ⇒
("http://example.org" "/" "http" "http://example.org/relative")
For proper compliance to the specification, SHR should also
ignore all the <base /> elements but the first one, but I guess
that may be fixed separately.
The relative <base /> URIs appear, e. g., on the Internet
Wayback Machine archive pages, when the original page uses the
<base /> element.
[2] http://www.w3.org/TR/html5/document-metadata.html#the-base-element
--
FSF associate member #7257 http://boycottsystemd.org/ … 3013 B6A0 230E 334A
[-- Attachment #2: Type: text/x-diff, Size: 1039 bytes --]
--- a/lisp/net/shr.el
+++ b/lisp/net/shr.el
@@ -574,6 +574,8 @@ size, and full-buffer size."
;; Always chop off anchors.
(when (string-match "#.*" url)
(setq url (substring url 0 (match-beginning 0))))
+ ;; NB: <base href="" > URI may itself be relative to the document’s URI
+ (setq url (shr-expand-url url))
(let* ((parsed (url-generic-parse-url url))
(local (url-filename parsed)))
(setf (url-filename parsed) "")
@@ -592,6 +594,7 @@ size, and full-buffer size."
(defun shr-expand-url (url &optional base)
(setq base
(if base
+ ;; shr-parse-base should never call this with non-nil base!
(shr-parse-base base)
;; Bound by the parser.
shr-base))
@@ -600,8 +603,8 @@ size, and full-buffer size."
(cond ((or (not url)
(not base)
(string-match "\\`[a-z]*:" url))
- ;; Absolute URL.
- (or url (car base)))
+ ;; Absolute or empty URI
+ (or url (nth 3 base)))
((eq (aref url 0) ?/)
(if (and (> (length url) 1)
(eq (aref url 1) ?/))
next prev parent reply other threads:[~2014-08-14 18:50 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-07-06 18:45 bug#17958: eww-submit mishandles the POST method, no action forms Ivan Shmakov
2014-08-14 18:50 ` Ivan Shmakov [this message]
2014-11-04 16:44 ` bug#17958: SHR: base handling broken (shr-parse-base, shr-expand-url) Ted Zlatanov
2014-11-13 18:41 ` Lars Magne Ingebrigtsen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=871tsirjmb.fsf@violet.siamics.net \
--to=ivan@siamics.net \
--cc=17958@debbugs.gnu.org \
--cc=control@debbugs.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).