unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#7017: Suggestion: (url-retrieve-internal) hexify multibyte URL string first
       [not found] <plhbhvu5zr.fsf@fencepost.gnu.org>
@ 2012-04-10 11:22 ` Lars Magne Ingebrigtsen
  2012-05-07 21:51 ` bug#7017: url-retrieve seems busted Seth Mason
  1 sibling, 0 replies; 5+ messages in thread
From: Lars Magne Ingebrigtsen @ 2012-04-10 11:22 UTC (permalink / raw)
  To: William Xu; +Cc: 7017

William Xu <william.xwl@gmail.com> writes:

> Feeding the same url to `wget', it would first hexify it, then download
> it successfully.  I suggest we do the same in url-retrieve, like this: 
>
> (url-retrieve-internal): Hexify multibye URL string first when necessary.

Thanks; applied to the Emacs trunk.

-- 
(domestic pets only, the antidote for overdose, milk.)
  bloggy blog http://lars.ingebrigtsen.no/





^ permalink raw reply	[flat|nested] 5+ messages in thread

* bug#7017: url-retrieve seems busted
       [not found] <plhbhvu5zr.fsf@fencepost.gnu.org>
  2012-04-10 11:22 ` bug#7017: Suggestion: (url-retrieve-internal) hexify multibyte URL string first Lars Magne Ingebrigtsen
@ 2012-05-07 21:51 ` Seth Mason
  2012-05-08  4:52   ` Chong Yidong
  1 sibling, 1 reply; 5+ messages in thread
From: Seth Mason @ 2012-05-07 21:51 UTC (permalink / raw)
  To: 7017

If you put the following in a buffer and eval it, you'll get a 404:

    ;; http://httpbin.org/get?x=1
    ;; eval this buffer
    (url-retrieve (buffer-substring-no-properties 4 30) (lambda (&rest args) (switch-to-buffer (current-buffer))))

If you curl/wget the same URL, it'll work fine.

If you look at the request, it's going to "/get%3fx%3d1". It seems to me
that the URL is getting improperly encoded for multibyte strings.





^ permalink raw reply	[flat|nested] 5+ messages in thread

* bug#7017: url-retrieve seems busted
  2012-05-07 21:51 ` bug#7017: url-retrieve seems busted Seth Mason
@ 2012-05-08  4:52   ` Chong Yidong
  2012-05-08  5:25     ` Chong Yidong
  0 siblings, 1 reply; 5+ messages in thread
From: Chong Yidong @ 2012-05-08  4:52 UTC (permalink / raw)
  To: Seth Mason; +Cc: 7017

Seth Mason <seth@edgecast.com> writes:

> If you put the following in a buffer and eval it, you'll get a 404:
>
>     ;; http://httpbin.org/get?x=1
>     ;; eval this buffer
>     (url-retrieve (buffer-substring-no-properties 4 30) (lambda (&rest
> args) (switch-to-buffer (current-buffer))))
>
> If you curl/wget the same URL, it'll work fine.
>
> If you look at the request, it's going to "/get%3fx%3d1". It seems to me
> that the URL is getting improperly encoded for multibyte strings.

Thanks for pointing this out.

Applying url-hexify-string on the entire URL, as the previous patch did,
is wrong.  We musn't hexify reserved characters that are being used in
their special role.  Unfortunately, figuring out when those characters
are being used in their special role requires an implementation of
RFC2396, which I don't think we currently have in Emacs.

Or, the following not-strictly-correct hack leaves out reserved
characters from hexification.


=== modified file 'lisp/url/url.el'
*** lisp/url/url.el	2012-04-26 12:43:28 +0000
--- lisp/url/url.el	2012-05-08 04:46:45 +0000
***************
*** 180,188 ****
    (url-gc-dead-buffers)
    (if (stringp url)
         (set-text-properties 0 (length url) nil url))
    (when (multibyte-string-p url)
!     (let ((url-unreserved-chars (append '(?: ?/) url-unreserved-chars)))
        (setq url (url-hexify-string url))))
    (if (not (vectorp url))
        (setq url (url-generic-parse-url url)))
    (if (not (functionp callback))
--- 180,193 ----
    (url-gc-dead-buffers)
    (if (stringp url)
         (set-text-properties 0 (length url) nil url))
+ 
    (when (multibyte-string-p url)
!     (let* ((reserved-chars '(?! ?# ?$ ?& ?' ?( ?) ?* ?+ ?, ?/ ?: ?\;
! 			     ?= ?? ?@ ?[ ?]))
! 	   (url-unreserved-chars (append reserved-chars
! 					 url-unreserved-chars)))
        (setq url (url-hexify-string url))))
+ 
    (if (not (vectorp url))
        (setq url (url-generic-parse-url url)))
    (if (not (functionp callback))






^ permalink raw reply	[flat|nested] 5+ messages in thread

* bug#7017: url-retrieve seems busted
  2012-05-08  4:52   ` Chong Yidong
@ 2012-05-08  5:25     ` Chong Yidong
  2012-05-09  8:34       ` Chong Yidong
  0 siblings, 1 reply; 5+ messages in thread
From: Chong Yidong @ 2012-05-08  5:25 UTC (permalink / raw)
  To: Seth Mason; +Cc: 7017

Chong Yidong <cyd@gnu.org> writes:

> Applying url-hexify-string on the entire URL, as the previous patch did,
> is wrong.  We musn't hexify reserved characters that are being used in
> their special role.  Unfortunately, figuring out when those characters
> are being used in their special role requires an implementation of
> RFC2396, which I don't think we currently have in Emacs.

Actually, I think we could use url-generic-parse-url for this.





^ permalink raw reply	[flat|nested] 5+ messages in thread

* bug#7017: url-retrieve seems busted
  2012-05-08  5:25     ` Chong Yidong
@ 2012-05-09  8:34       ` Chong Yidong
  0 siblings, 0 replies; 5+ messages in thread
From: Chong Yidong @ 2012-05-09  8:34 UTC (permalink / raw)
  To: Seth Mason; +Cc: 7017

Chong Yidong <cyd@gnu.org> writes:

> Chong Yidong <cyd@gnu.org> writes:
>
>> Applying url-hexify-string on the entire URL, as the previous patch did,
>> is wrong.  We musn't hexify reserved characters that are being used in
>> their special role.  Unfortunately, figuring out when those characters
>> are being used in their special role requires an implementation of
>> RFC2396, which I don't think we currently have in Emacs.
>
> Actually, I think we could use url-generic-parse-url for this.

Fixed in trunk (revision 108172).





^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-05-09  8:34 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <plhbhvu5zr.fsf@fencepost.gnu.org>
2012-04-10 11:22 ` bug#7017: Suggestion: (url-retrieve-internal) hexify multibyte URL string first Lars Magne Ingebrigtsen
2012-05-07 21:51 ` bug#7017: url-retrieve seems busted Seth Mason
2012-05-08  4:52   ` Chong Yidong
2012-05-08  5:25     ` Chong Yidong
2012-05-09  8:34       ` Chong Yidong

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).