all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* url-generic-parse-url fails in certain cases
@ 2006-10-08 13:41 Magnus Henoch
  2006-10-08 22:26 ` Richard Stallman
  0 siblings, 1 reply; 5+ messages in thread
From: Magnus Henoch @ 2006-10-08 13:41 UTC (permalink / raw)


[-- Attachment #1: Type: text/plain, Size: 1273 bytes --]

I tried to track down the bug described in <URL:http://lists.gnu.
org/archive/html/w3-dev/2004-02/msg00003.html>.  I found that while
URLs of the form "http://server/?query;arg=val" are correctly handled,
a URL with no path but a query string
(e.g. "http://server?query;arg=val") is not - "?query" is treated as
part of the hostname.

I compared the code in url-generic-parse-url to RFC 3986, and tried to
minimize the differences.  I found that the function treated "?query"
in the first URL above as part of the path, while the RFC treats it as
part of the query string, along with the arguments.  After adapting
the code to the RFC and fixing url-recreate-url-attributes, everything
that used to work seems to keep working.

As noted, relative URLs are not parsed "correctly", but as all
applications using the library seem to cope with that, it would
probably be silly to try to fix it (and potentially breaking things)
this close to a release.

2006-10-08  Magnus Henoch  <mange@freemail.hu>

	* url-parse.el (url-generic-parse-url): Handle URLs with empty
	path component and non-empty query component.  Untangle path,
	query and fragment parsing code.  Add references to RFC 3986 in
	comments.
	(url-recreate-url-attributes): Start query string with "?", not
	";".


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: url-parse.el.patch --]
[-- Type: text/x-patch, Size: 4716 bytes --]

*** orig/lisp/url/url-parse.el
--- mod/lisp/url/url-parse.el
***************
*** 108,114 ****
  (defun url-recreate-url-attributes (urlobj)
    "Recreate the attributes of an URL string from the parsed URLOBJ."
    (when (url-attributes urlobj)
!     (concat ";"
  	    (mapconcat (lambda (x)
                           (if (cdr x)
                               (concat (car x) "=" (cdr x))
--- 108,114 ----
  (defun url-recreate-url-attributes (urlobj)
    "Recreate the attributes of an URL string from the parsed URLOBJ."
    (when (url-attributes urlobj)
!     (concat "?"
  	    (mapconcat (lambda (x)
                           (if (cdr x)
                               (concat (car x) "=" (cdr x))
***************
*** 120,130 ****
--- 120,135 ----
    "Return a vector of the parts of URL.
  Format is:
  \[TYPE USER PASSWORD HOST PORT FILE TARGET ATTRIBUTES FULL\]"
+   ;; See RFC 3986.
    (cond
     ((null url)
      (make-vector 9 nil))
     ((or (not (string-match url-nonrelative-link url))
  	(= ?/ (string-to-char url)))
+     ;; This isn't correct, as a relative URL can be a fragment link
+     ;; (e.g. "#foo") and many other things (see section 4.2).
+     ;; However, let's not fix something that isn't broken, especially
+     ;; when close to a release.
      (let ((retval (make-vector 9 nil)))
        (url-set-filename retval url)
        (url-set-full retval nil)
***************
*** 148,153 ****
--- 153,160 ----
  	(insert url)
  	(goto-char (point-min))
  	(setq save-pos (point))
+ 
+ 	;; 3.1. Scheme
  	(if (not (looking-at "//"))
  	    (progn
  	      (skip-chars-forward "a-zA-Z+.\\-")
***************
*** 156,168 ****
  	      (skip-chars-forward ":")
  	      (setq save-pos (point))))
  
! 	;; We are doing a fully specified URL, with hostname and all
  	(if (looking-at "//")
  	    (progn
  	      (setq full t)
  	      (forward-char 2)
  	      (setq save-pos (point))
! 	      (skip-chars-forward "^/")
  	      (setq host (buffer-substring save-pos (point)))
  	      (if (string-match "^\\([^@]+\\)@" host)
  		  (setq user (match-string 1 host)
--- 163,175 ----
  	      (skip-chars-forward ":")
  	      (setq save-pos (point))))
  
! 	;; 3.2. Authority
  	(if (looking-at "//")
  	    (progn
  	      (setq full t)
  	      (forward-char 2)
  	      (setq save-pos (point))
! 	      (skip-chars-forward "^/\\?#")
  	      (setq host (buffer-substring save-pos (point)))
  	      (if (string-match "^\\([^@]+\\)@" host)
  		  (setq user (match-string 1 host)
***************
*** 170,175 ****
--- 177,183 ----
  	      (if (and user (string-match "\\([^:]+\\):\\(.*\\)" user))
  		  (setq pass (match-string 2 user)
  			user (match-string 1 user)))
+ 	      ;; This gives wrong results for IPv6 literal addresses.
  	      (if (string-match ":\\([0-9+]+\\)" host)
  		  (setq port (string-to-number (match-string 1 host))
  			host (substring host 0 (match-beginning 0))))
***************
*** 181,209 ****
  	(if (not port)
  	    (setq port (url-scheme-get-property prot 'default-port)))
  
! 	;; Gross hack to preserve ';' in data URLs
! 
  	(setq save-pos (point))
  
! 	(if (string= "data" prot)
! 	    (goto-char (point-max))
! 	  ;; Now check for references
  	  (skip-chars-forward "^#")
! 	  (if (eobp)
! 	      nil
! 	    (delete-region
! 	     (point)
! 	     (progn
! 	       (skip-chars-forward "#")
! 	       (setq refs (buffer-substring (point) (point-max)))
! 	       (point-max))))
! 	  (goto-char save-pos)
! 	  (skip-chars-forward "^;")
! 	  (if (not (eobp))
! 	      (setq attr (url-parse-args (buffer-substring (point) (point-max)) t)
! 		    attr (nreverse attr))))
  
- 	(setq file (buffer-substring save-pos (point)))
  	(if (and host (string-match "%[0-9][0-9]" host))
  	    (setq host (url-unhex-string host)))
  	(vector prot user pass host port file refs attr full))))))
--- 189,214 ----
  	(if (not port)
  	    (setq port (url-scheme-get-property prot 'default-port)))
  
! 	;; 3.3. Path
  	(setq save-pos (point))
+ 	(skip-chars-forward "^#?")
+ 	(setq file (buffer-substring save-pos (point)))
  
! 	;; 3.4. Query
! 	(when (looking-at "\\?")
! 	  (forward-char 1)
! 	  (setq save-pos (point))
  	  (skip-chars-forward "^#")
! 	  ;; RFC 3986 specifies no general way of parsing the query
! 	  ;; string, but `url-parse-args' seems universal enough.
! 	  (setq attr (url-parse-args (buffer-substring save-pos (point)) t)
! 		attr (nreverse attr)))
! 
! 	;; 3.5. Fragment
! 	(when (looking-at "#")
! 	  (forward-char 1)
! 	  (setq refs (buffer-substring (point) (point-max))))
  
  	(if (and host (string-match "%[0-9][0-9]" host))
  	    (setq host (url-unhex-string host)))
  	(vector prot user pass host port file refs attr full))))))

[-- Attachment #3: Type: text/plain, Size: 142 bytes --]

_______________________________________________
Emacs-devel mailing list
Emacs-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: url-generic-parse-url fails in certain cases
  2006-10-08 13:41 url-generic-parse-url fails in certain cases Magnus Henoch
@ 2006-10-08 22:26 ` Richard Stallman
  2006-10-09 19:43   ` Eli Zaretskii
  2006-10-09 19:43   ` Eli Zaretskii
  0 siblings, 2 replies; 5+ messages in thread
From: Richard Stallman @ 2006-10-08 22:26 UTC (permalink / raw)
  Cc: emacs-devel

Let's give you write access to the Emacs repository
so you can install these fixes.

Would someone please do that?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: url-generic-parse-url fails in certain cases
  2006-10-08 22:26 ` Richard Stallman
@ 2006-10-09 19:43   ` Eli Zaretskii
  2006-10-09 20:10     ` Magnus Henoch
  2006-10-09 19:43   ` Eli Zaretskii
  1 sibling, 1 reply; 5+ messages in thread
From: Eli Zaretskii @ 2006-10-09 19:43 UTC (permalink / raw)
  Cc: mange, emacs-devel

> From: Richard Stallman <rms@gnu.org>
> Date: Sun, 08 Oct 2006 18:26:53 -0400
> Cc: emacs-devel@gnu.org
> 
> Let's give you write access to the Emacs repository
> so you can install these fixes.
> 
> Would someone please do that?

Done.

Welcome on board, Magnus!

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: url-generic-parse-url fails in certain cases
  2006-10-08 22:26 ` Richard Stallman
  2006-10-09 19:43   ` Eli Zaretskii
@ 2006-10-09 19:43   ` Eli Zaretskii
  1 sibling, 0 replies; 5+ messages in thread
From: Eli Zaretskii @ 2006-10-09 19:43 UTC (permalink / raw)
  Cc: mange, emacs-devel

> From: Richard Stallman <rms@gnu.org>
> Date: Sun, 08 Oct 2006 18:26:53 -0400
> Cc: emacs-devel@gnu.org
> 
> Let's give you write access to the Emacs repository
> so you can install these fixes.
> 
> Would someone please do that?

Done.

Welcome on board, Magnus!

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: url-generic-parse-url fails in certain cases
  2006-10-09 19:43   ` Eli Zaretskii
@ 2006-10-09 20:10     ` Magnus Henoch
  0 siblings, 0 replies; 5+ messages in thread
From: Magnus Henoch @ 2006-10-09 20:10 UTC (permalink / raw)


Eli Zaretskii <eliz@gnu.org> writes:

> Done.
>
> Welcome on board, Magnus!

Thanks!  I just committed my changes.

Magnus

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2006-10-09 20:10 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-10-08 13:41 url-generic-parse-url fails in certain cases Magnus Henoch
2006-10-08 22:26 ` Richard Stallman
2006-10-09 19:43   ` Eli Zaretskii
2006-10-09 20:10     ` Magnus Henoch
2006-10-09 19:43   ` Eli Zaretskii

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.