From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Magnus Henoch Newsgroups: gmane.emacs.devel Subject: url-generic-parse-url fails in certain cases Date: Sun, 08 Oct 2006 15:41:42 +0200 Message-ID: <873b9y3ox5.fsf@freemail.hu> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Trace: sea.gmane.org 1160314959 3797 80.91.229.2 (8 Oct 2006 13:42:39 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Sun, 8 Oct 2006 13:42:39 +0000 (UTC) Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Oct 08 15:42:36 2006 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1GWYvG-0003ks-DK for ged-emacs-devel@m.gmane.org; Sun, 08 Oct 2006 15:42:30 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1GWYvF-0005xL-T0 for ged-emacs-devel@m.gmane.org; Sun, 08 Oct 2006 09:42:29 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1GWYv0-0005wA-WE for emacs-devel@gnu.org; Sun, 08 Oct 2006 09:42:15 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1GWYv0-0005vU-Bb for emacs-devel@gnu.org; Sun, 08 Oct 2006 09:42:14 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1GWYv0-0005vD-2r for emacs-devel@gnu.org; Sun, 08 Oct 2006 09:42:14 -0400 Original-Received: from [80.91.229.2] (helo=ciao.gmane.org) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA:32) (Exim 4.52) id 1GWZ2K-0006xt-KO for emacs-devel@gnu.org; Sun, 08 Oct 2006 09:49:48 -0400 Original-Received: from list by ciao.gmane.org with local (Exim 4.43) id 1GWYut-0003fR-9A for emacs-devel@gnu.org; Sun, 08 Oct 2006 15:42:07 +0200 Original-Received: from etthundrat.olf.sgsnet.se ([193.11.222.85]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sun, 08 Oct 2006 15:42:07 +0200 Original-Received: from mange by etthundrat.olf.sgsnet.se with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sun, 08 Oct 2006 15:42:07 +0200 X-Injected-Via-Gmane: http://gmane.org/ Mail-Followup-To: emacs-devel@gnu.org Original-To: emacs-devel@gnu.org Original-Lines: 183 Original-X-Complaints-To: usenet@sea.gmane.org X-Gmane-NNTP-Posting-Host: etthundrat.olf.sgsnet.se Mail-Copies-To: never Jabber-Id: legoscia@jabber.cd.chalmers.se User-Agent: Gnus/5.110006 (No Gnus v0.6) Emacs/22.0.50 (berkeley-unix) Cancel-Lock: sha1:VoiBBIjrcj7HVj7DK7ZoVouMRWQ= X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:60525 Archived-At: --=-=-= I tried to track down the bug described in . I found that while URLs of the form "http://server/?query;arg=val" are correctly handled, a URL with no path but a query string (e.g. "http://server?query;arg=val") is not - "?query" is treated as part of the hostname. I compared the code in url-generic-parse-url to RFC 3986, and tried to minimize the differences. I found that the function treated "?query" in the first URL above as part of the path, while the RFC treats it as part of the query string, along with the arguments. After adapting the code to the RFC and fixing url-recreate-url-attributes, everything that used to work seems to keep working. As noted, relative URLs are not parsed "correctly", but as all applications using the library seem to cope with that, it would probably be silly to try to fix it (and potentially breaking things) this close to a release. 2006-10-08 Magnus Henoch * url-parse.el (url-generic-parse-url): Handle URLs with empty path component and non-empty query component. Untangle path, query and fragment parsing code. Add references to RFC 3986 in comments. (url-recreate-url-attributes): Start query string with "?", not ";". --=-=-= Content-Type: text/x-patch Content-Disposition: inline; filename=url-parse.el.patch *** orig/lisp/url/url-parse.el --- mod/lisp/url/url-parse.el *************** *** 108,114 **** (defun url-recreate-url-attributes (urlobj) "Recreate the attributes of an URL string from the parsed URLOBJ." (when (url-attributes urlobj) ! (concat ";" (mapconcat (lambda (x) (if (cdr x) (concat (car x) "=" (cdr x)) --- 108,114 ---- (defun url-recreate-url-attributes (urlobj) "Recreate the attributes of an URL string from the parsed URLOBJ." (when (url-attributes urlobj) ! (concat "?" (mapconcat (lambda (x) (if (cdr x) (concat (car x) "=" (cdr x)) *************** *** 120,130 **** --- 120,135 ---- "Return a vector of the parts of URL. Format is: \[TYPE USER PASSWORD HOST PORT FILE TARGET ATTRIBUTES FULL\]" + ;; See RFC 3986. (cond ((null url) (make-vector 9 nil)) ((or (not (string-match url-nonrelative-link url)) (= ?/ (string-to-char url))) + ;; This isn't correct, as a relative URL can be a fragment link + ;; (e.g. "#foo") and many other things (see section 4.2). + ;; However, let's not fix something that isn't broken, especially + ;; when close to a release. (let ((retval (make-vector 9 nil))) (url-set-filename retval url) (url-set-full retval nil) *************** *** 148,153 **** --- 153,160 ---- (insert url) (goto-char (point-min)) (setq save-pos (point)) + + ;; 3.1. Scheme (if (not (looking-at "//")) (progn (skip-chars-forward "a-zA-Z+.\\-") *************** *** 156,168 **** (skip-chars-forward ":") (setq save-pos (point)))) ! ;; We are doing a fully specified URL, with hostname and all (if (looking-at "//") (progn (setq full t) (forward-char 2) (setq save-pos (point)) ! (skip-chars-forward "^/") (setq host (buffer-substring save-pos (point))) (if (string-match "^\\([^@]+\\)@" host) (setq user (match-string 1 host) --- 163,175 ---- (skip-chars-forward ":") (setq save-pos (point)))) ! ;; 3.2. Authority (if (looking-at "//") (progn (setq full t) (forward-char 2) (setq save-pos (point)) ! (skip-chars-forward "^/\\?#") (setq host (buffer-substring save-pos (point))) (if (string-match "^\\([^@]+\\)@" host) (setq user (match-string 1 host) *************** *** 170,175 **** --- 177,183 ---- (if (and user (string-match "\\([^:]+\\):\\(.*\\)" user)) (setq pass (match-string 2 user) user (match-string 1 user))) + ;; This gives wrong results for IPv6 literal addresses. (if (string-match ":\\([0-9+]+\\)" host) (setq port (string-to-number (match-string 1 host)) host (substring host 0 (match-beginning 0)))) *************** *** 181,209 **** (if (not port) (setq port (url-scheme-get-property prot 'default-port))) ! ;; Gross hack to preserve ';' in data URLs ! (setq save-pos (point)) ! (if (string= "data" prot) ! (goto-char (point-max)) ! ;; Now check for references (skip-chars-forward "^#") ! (if (eobp) ! nil ! (delete-region ! (point) ! (progn ! (skip-chars-forward "#") ! (setq refs (buffer-substring (point) (point-max))) ! (point-max)))) ! (goto-char save-pos) ! (skip-chars-forward "^;") ! (if (not (eobp)) ! (setq attr (url-parse-args (buffer-substring (point) (point-max)) t) ! attr (nreverse attr)))) - (setq file (buffer-substring save-pos (point))) (if (and host (string-match "%[0-9][0-9]" host)) (setq host (url-unhex-string host))) (vector prot user pass host port file refs attr full)))))) --- 189,214 ---- (if (not port) (setq port (url-scheme-get-property prot 'default-port))) ! ;; 3.3. Path (setq save-pos (point)) + (skip-chars-forward "^#?") + (setq file (buffer-substring save-pos (point))) ! ;; 3.4. Query ! (when (looking-at "\\?") ! (forward-char 1) ! (setq save-pos (point)) (skip-chars-forward "^#") ! ;; RFC 3986 specifies no general way of parsing the query ! ;; string, but `url-parse-args' seems universal enough. ! (setq attr (url-parse-args (buffer-substring save-pos (point)) t) ! attr (nreverse attr))) ! ! ;; 3.5. Fragment ! (when (looking-at "#") ! (forward-char 1) ! (setq refs (buffer-substring (point) (point-max)))) (if (and host (string-match "%[0-9][0-9]" host)) (setq host (url-unhex-string host))) (vector prot user pass host port file refs attr full)))))) --=-=-= Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Emacs-devel mailing list Emacs-devel@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-devel --=-=-=--