all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* URL library problem
@ 2005-10-02 18:48 Paul Pogonyshev
  2005-10-02 19:32 ` Mark A. Hershberger
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Paul Pogonyshev @ 2005-10-02 18:48 UTC (permalink / raw)


Hello,

I believe I have found a serious problem in the URL library.  If you
look at the very end of function `url-http', you can see that the
result of `url-http-create-request' is sent to the connection as-is.
But encoding of the connection is binary!  It means, that multibyte
strings are sent in Emacs internal coding, which nothing but Emacs
understands.

Form data sent as `multipart/form-data' is usually sent in the
encoding of the page, e.g. UTF-8.  With the current state of URL, it
seems to be impossible to send non-ASCII `multipart/form-data'.

Here is a test:


(let ((url-request-method "POST")
      (url-request-extra-headers '(("Content-Type" . "multipart/form-data; boundary=---")))
      (url-request-data (concat "-----\r\nContent-Disposition: form-data; name=\"wpTextbox1\"\r\n\r\n"
				"проверка\r\n"
				"-------\r\n")))
  (url-retrieve "http://en.wikipedia.org/w/index.php?title=Test_page&action=submit"
		(lambda () (pop-to-buffer (current-buffer)))))


Save the buffer it pops up as an HTML and open it in a browser.  It
should be a Wikipedia preview page with Russian word ``проверка''
(`test'), but it isn't.  Instead of UTF-8, the word got sent in Emacs
internal coding.

Note how explicit UTF-8 encoding helps nothing, because
`url-request-data' is later concatenated with some strings turning
multibyte again:


(let ((url-request-method "POST")
      (url-request-extra-headers '(("Content-Type" . "multipart/form-data; boundary=---")))
      (url-request-data (encode-coding-string
			 (concat "-----\r\nContent-Disposition: form-data; name=\"wpTextbox1\"\r\n\r\n"
				 "проверка\r\n"
				 "-------\r\n")
			 'utf-8)))
  (url-retrieve "http://en.wikipedia.org/w/index.php?title=Test_page&action=submit"
		(lambda () (pop-to-buffer (current-buffer)))))


However, this trivial (and not-for-production) patch makes the first
test work, because it encode the complete request, which is then sent
to Wikipedia server unmodified:


--- /home/paul/emacs/lisp/url/url-http.el	2005-09-28 16:56:02.000000000 +0300
+++ /tmp/buffer-content-2240ocC	2005-10-02 21:30:00.000000000 +0300
@@ -268,7 +268,7 @@ request.
 	   ;; Any data
 	   url-request-data))
     (url-http-debug "Request is: \n%s" request)
-    request))
+    (encode-coding-string request 'utf-8))
 
 ;; Parsing routines
 (defun url-http-clean-headers ()


Of course, uncoditional encoding in UTF-8 is not a right thing to do.
Actually, encoding of the complete request is not right.  A proper
patch would simply avoid concatenating `url-request-data' with
anything and send it to the connection verbatim, assuming that the
user of the library has already properly encoded it.  The reason for
this is that `multipart/form-data' can have different parts in
different encoding (even if it is hardly ever used.)

Are you interested in a patch?

Paul

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: URL library problem
  2005-10-02 18:48 URL library problem Paul Pogonyshev
@ 2005-10-02 19:32 ` Mark A. Hershberger
  2005-10-03  5:09 ` Richard M. Stallman
  2005-10-03 14:36 ` Stefan Monnier
  2 siblings, 0 replies; 6+ messages in thread
From: Mark A. Hershberger @ 2005-10-02 19:32 UTC (permalink / raw)



[-- Attachment #1.1: Type: text/plain, Size: 564 bytes --]

Paul Pogonyshev wrote:

> I believe I have found a serious problem in the URL library.

You aren't the first.  I raised this issue back in June 
(http://thread.gmane.org/gmane.emacs.devel/38531) and offered a patch. 
However, because I didn't (and, to some extent, still don't) understand 
all the encoding issues involved, I didn't apply it.

> Are you interested in a patch?

I am.

-- 
http://mah.everybody.org/weblog/
GPG Fingerprint: 7E15 362D A32C DFAB E4D2  B37A 735E F10A 2DFC BFF5

Someone will always sell you for 30 pieces of silver.
  -- Andrei Rublev

[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 256 bytes --]

[-- Attachment #2: Type: text/plain, Size: 142 bytes --]

_______________________________________________
Emacs-devel mailing list
Emacs-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: URL library problem
  2005-10-02 18:48 URL library problem Paul Pogonyshev
  2005-10-02 19:32 ` Mark A. Hershberger
@ 2005-10-03  5:09 ` Richard M. Stallman
  2005-10-03 14:36 ` Stefan Monnier
  2 siblings, 0 replies; 6+ messages in thread
From: Richard M. Stallman @ 2005-10-03  5:09 UTC (permalink / raw)
  Cc: emacs-devel

    Of course, uncoditional encoding in UTF-8 is not a right thing to do.
    Actually, encoding of the complete request is not right.  A proper
    patch would simply avoid concatenating `url-request-data' with
    anything and send it to the connection verbatim, assuming that the
    user of the library has already properly encoded it.  The reason for
    this is that `multipart/form-data' can have different parts in
    different encoding (even if it is hardly ever used.)

    Are you interested in a patch?

It would be a very good thing if you could write a clean and
fully correct fix which handles this right.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: URL library problem
  2005-10-02 18:48 URL library problem Paul Pogonyshev
  2005-10-02 19:32 ` Mark A. Hershberger
  2005-10-03  5:09 ` Richard M. Stallman
@ 2005-10-03 14:36 ` Stefan Monnier
  2005-10-03 15:26   ` Stefan Monnier
  2 siblings, 1 reply; 6+ messages in thread
From: Stefan Monnier @ 2005-10-03 14:36 UTC (permalink / raw)
  Cc: emacs-devel

> Note how explicit UTF-8 encoding helps nothing, because `url-request-data'
> is later concatenated with some strings turning multibyte again:

The problem here is not so much the use of string concatenation but the fact
that string concatenation uses string-make-multibyte rather than
string-to-multibyte.

Another way to look at it and to say that this concatenation should only be
done with unibyte strings, so those strings that are concatenated to your
string should all be made unibyte.

Would the patch below do the trick?


        Stefan


--- orig/lisp/url/url-http.el
+++ mod/lisp/url/url-http.el
@@ -198,7 +198,11 @@
     ;; allows us to elide null lines directly, at the cost of making
     ;; the layout less clear.
     (setq request
-	  (concat
+         (mapconcat
+          ;; We'd really want here `string-to-unibyte', so as to signal an
+          ;; error if one of the strings contains a multibyte char.
+          'string-as-unibyte
+          (list
 	   ;; The request
 	   (or url-request-method "GET") " "
 	   (if proxy-obj (url-recreate-url proxy-obj) real-fname)
@@ -266,7 +270,8 @@
 	   ;; End request
 	   "\r\n"
 	   ;; Any data
-	   url-request-data))
+	   url-request-data)
+          ""))
     (url-http-debug "Request is: \n%s" request)
     request))

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: URL library problem
  2005-10-03 14:36 ` Stefan Monnier
@ 2005-10-03 15:26   ` Stefan Monnier
  2005-10-03 17:53     ` Paul Pogonyshev
  0 siblings, 1 reply; 6+ messages in thread
From: Stefan Monnier @ 2005-10-03 15:26 UTC (permalink / raw)
  Cc: emacs-devel

> Would the patch below do the trick?

No, it clearly wouldn't.  Try this one instead.


        Stefan


--- url-http.el	25 aoû 2005 10:56:43 -0400	1.20
+++ url-http.el	03 oct 2005 11:26:27 -0400	
@@ -198,7 +198,12 @@
     ;; allows us to elide null lines directly, at the cost of making
     ;; the layout less clear.
     (setq request
-	  (concat
+         (mapconcat
+          ;; We'd really want here `string-to-unibyte', so as to signal an
+          ;; error if one of the strings contains a multibyte char.
+          'string-as-unibyte
+          (delq nil
+          (list
 	   ;; The request
 	   (or url-request-method "GET") " "
 	   (if proxy-obj (url-recreate-url proxy-obj) real-fname)
@@ -267,6 +272,7 @@
 	   "\r\n"
 	   ;; Any data
 	   url-request-data))
+          ""))
     (url-http-debug "Request is: \n%s" request)
     request))
 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: URL library problem
  2005-10-03 15:26   ` Stefan Monnier
@ 2005-10-03 17:53     ` Paul Pogonyshev
  0 siblings, 0 replies; 6+ messages in thread
From: Paul Pogonyshev @ 2005-10-03 17:53 UTC (permalink / raw)
  Cc: emacs-devel

Stefan Monnier wrote:
> > Would the patch below do the trick?
>
> No, it clearly wouldn't.  Try this one instead.
>
>
>         Stefan
>
>
> --- url-http.el	25 aoû 2005 10:56:43 -0400	1.20
> +++ url-http.el	03 oct 2005 11:26:27 -0400
> @@ -198,7 +198,12 @@
>      ;; allows us to elide null lines directly, at the cost of making
>      ;; the layout less clear.
>      (setq request
> -	  (concat
> +         (mapconcat
> +          ;; We'd really want here `string-to-unibyte', so as to signal an
> +          ;; error if one of the strings contains a multibyte char.
> +          'string-as-unibyte
> +          (delq nil
> +          (list
>  	   ;; The request
>  	   (or url-request-method "GET") " "
>  	   (if proxy-obj (url-recreate-url proxy-obj) real-fname)
> @@ -267,6 +272,7 @@
>  	   "\r\n"
>  	   ;; Any data
>  	   url-request-data))
> +          ""))
>      (url-http-debug "Request is: \n%s" request)
>      request))

Yes, it works.  I was thinking more along the lines of this patch:

--- url-http.el	24 Aug 2005 22:29:10 +0300	1.20
+++ url-http.el	03 Oct 2005 20:37:50 +0300	
@@ -264,9 +264,10 @@ request.
 				    (length url-request-data))
 		"\r\n"))
 	   ;; End request
-	   "\r\n"
-	   ;; Any data
-	   url-request-data))
+	   "\r\n"))
+    (setq request (concat (encode-coding-string request 'binary t)
+			  ;; Any data
+			  url-request-data))
     (url-http-debug "Request is: \n%s" request)
     request))
 
@@ -1016,6 +1017,12 @@ CBARGS as the arguments."
 		    url-http-chunked-start
 		    url-http-chunked-counter
 		    url-http-process))
+
+  (when (and url-request-data (multibyte-string-p url-request-data))
+    (let ((ascii-string (encode-coding-string url-request-data 'iso-safe)))
+      (if (string= ascii-string url-request-data)
+	  (setq url-request-data ascii-string)
+	(error "`url-request-data' must be properly encoded or consist only of ASCII characters"))))
   (let ((connection (url-http-find-free-connection (url-host url)
 						   (url-port url)))
 	(buffer (generate-new-buffer (format " *http %s:%d*"

I assume you will commit something solving it to CVS.

Paul

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2005-10-03 17:53 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-10-02 18:48 URL library problem Paul Pogonyshev
2005-10-02 19:32 ` Mark A. Hershberger
2005-10-03  5:09 ` Richard M. Stallman
2005-10-03 14:36 ` Stefan Monnier
2005-10-03 15:26   ` Stefan Monnier
2005-10-03 17:53     ` Paul Pogonyshev

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.