From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Ivan Shmakov Newsgroups: gmane.emacs.bugs Subject: bug#17958: SHR: base handling broken (shr-parse-base, shr-expand-url) Date: Thu, 14 Aug 2014 18:50:20 +0000 Message-ID: <871tsirjmb.fsf@violet.siamics.net> References: <8738eeconi.fsf@violet.siamics.net> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Trace: ger.gmane.org 1408042283 12455 80.91.229.3 (14 Aug 2014 18:51:23 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 14 Aug 2014 18:51:23 +0000 (UTC) To: control@debbugs.gnu.org, 17958@debbugs.gnu.org Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Thu Aug 14 20:51:18 2014 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1XI07K-0007RJ-5u for geb-bug-gnu-emacs@m.gmane.org; Thu, 14 Aug 2014 20:51:18 +0200 Original-Received: from localhost ([::1]:55687 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XI07J-0005Xp-OY for geb-bug-gnu-emacs@m.gmane.org; Thu, 14 Aug 2014 14:51:17 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:58784) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XI07B-0005Vr-7D for bug-gnu-emacs@gnu.org; Thu, 14 Aug 2014 14:51:13 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XI075-0003BG-9U for bug-gnu-emacs@gnu.org; Thu, 14 Aug 2014 14:51:09 -0400 Original-Received: from debbugs.gnu.org ([140.186.70.43]:36579) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XI074-0003Ar-Hk for bug-gnu-emacs@gnu.org; Thu, 14 Aug 2014 14:51:03 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.80) (envelope-from ) id 1XI074-0001uC-2G for bug-gnu-emacs@gnu.org; Thu, 14 Aug 2014 14:51:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Ivan Shmakov Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 14 Aug 2014 18:51:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 17958 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 17958-submit@debbugs.gnu.org id=B17958.14080422337274 (code B ref 17958); Thu, 14 Aug 2014 18:51:01 +0000 Original-Received: (at 17958) by debbugs.gnu.org; 14 Aug 2014 18:50:33 +0000 Original-Received: from localhost ([127.0.0.1]:43520 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XI06a-0001tF-Cu for submit@debbugs.gnu.org; Thu, 14 Aug 2014 14:50:32 -0400 Original-Received: from fely.am-1.org ([78.47.74.50]:50509) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XI06X-0001sv-Cj; Thu, 14 Aug 2014 14:50:30 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=siamics.net; s=a2013295; h=Content-Type:MIME-Version:Message-ID:In-Reply-To:Date:Sender:References:Subject:To:From; bh=DNuVgY2tpMs1olTw03iLlfDg+CH3KgRnHGu7hK2FpV4=; b=n5OwQD3rQq8/a1p05yU3p5JZfA117fv8D85URlfryb0WyStqN+eZ5siJpgtJG1ElJvH+jGk8TB3U/+/Q9LaLbv6LG7McrU3HvFTy2Nf7PaZXDoR9iYr2wgNaUlMjMDESu8keIfbITvGdwElmjyEvtFpYn21zc+QHFpbRY1EJPZA=; Original-Received: from [2a02:2560:6d4:26ca::1:1d] (helo=violet.siamics.net) by fely.am-1.org with esmtps (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from ) id 1XI06V-0003Vs-Jh; Thu, 14 Aug 2014 18:50:28 +0000 Original-Received: from localhost ([::1] helo=violet.siamics.net) by violet.siamics.net with esmtps (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from ) id 1XI06O-0002Lu-KP; Fri, 15 Aug 2014 01:50:20 +0700 In-Reply-To: <8738eeconi.fsf@violet.siamics.net> (Ivan Shmakov's message of "Sun, 06 Jul 2014 18:45:05 +0000") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:92472 Archived-At: --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable retitle 17958 SHR: base handling broken (shr-parse-base, shr-expand-url)=20 tag 17958 + patch thanks >>>>> Ivan Shmakov writes: [=E2=80=A6] > However, I believe that the real culprit is shr-expand-url, which > mishandles the nil =E2=80=98uri=E2=80=99 case: > (mapcar (lambda (x) (shr-expand-url x "http://example.com/welcome/")) > '("hello" "/world" nil)) > ;; =E2=87=92 > ("http://example.com/welcome/hello" > "http://example.com/world" > "http://example.com") > My expectation for the last result would be the =E2=80=98base=E2=80=99 a= rgument > unchanged (i.=C2=A0e., http://example.com/welcome/.) > Thus, I suggest changing shr-expand-url to return not the 0th element > of the (parsed) =E2=80=98base=E2=80=99 (see below), but the 3rd. > 596 (cond ((or (not url) > 597 (not base) > 598 (string-match "\\`[a-z]*:" url)) > 599 ;; Absolute URL. > 600 (or url (car base))) > [1] https://tools.wmflabs.org/guc/?user=3D2001:db8:1337::cafe As it seems, there=E2=80=99s one more issue with SHR =E2=80=9Cbase=E2=80= =9D handling. Namely, the URI may actually itself be relative, and SHR fails to handle that properly. As per [2]: To set the frozen base URL, resolve the value of the element's href content attribute relative to the Document's fallback base URL; if this is successful, set the frozen base URL to the resulting absolute URL, otherwise, set the frozen base URL to the fallback base URL. The SHR behavior doesn=E2=80=99t match the above. Consider, e.=C2=A0g.: (let ((shr-base (shr-parse-base "http://example.org/"))) (shr-tag-base '((:href . "/relative"))) shr-base) ;; =E2=87=92 ("" "/" nil "/relative") With the patch MIMEd (which also fixes the issue described in my initial bug report), it instead gives what I deem to be the correct result: (let ((shr-base (shr-parse-base "http://example.org/"))) (shr-tag-base '((:href . "/relative"))) shr-base) ;; =E2=87=92 ("http://example.org" "/" "http" "http://example.org/relative") For proper compliance to the specification, SHR should also ignore all the elements but the first one, but I guess that may be fixed separately. The relative URIs appear, e.=C2=A0g., on the Internet Wayback Machine archive pages, when the original page uses the element. [2] http://www.w3.org/TR/html5/document-metadata.html#the-base-element --=20 FSF associate member #7257 http://boycottsystemd.org/ =E2=80=A6 3013 B6A0= 230E 334A --=-=-= Content-Type: text/x-diff; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable --- a/lisp/net/shr.el +++ b/lisp/net/shr.el @@ -574,6 +574,8 @@ size, and full-buffer size." ;; Always chop off anchors. (when (string-match "#.*" url) (setq url (substring url 0 (match-beginning 0)))) + ;; NB: URI may itself be relative to the document=E2= =80=99s URI + (setq url (shr-expand-url url)) (let* ((parsed (url-generic-parse-url url)) (local (url-filename parsed))) (setf (url-filename parsed) "") @@ -592,6 +594,7 @@ size, and full-buffer size." (defun shr-expand-url (url &optional base) (setq base (if base + ;; shr-parse-base should never call this with non-nil base! (shr-parse-base base) ;; Bound by the parser. shr-base)) @@ -600,8 +603,8 @@ size, and full-buffer size." (cond ((or (not url) (not base) (string-match "\\`[a-z]*:" url)) - ;; Absolute URL. - (or url (car base))) + ;; Absolute or empty URI + (or url (nth 3 base))) ((eq (aref url 0) ?/) (if (and (> (length url) 1) (eq (aref url 1) ?/)) --=-=-=--