all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Paul Pogonyshev <pogonyshev@gmx.net>
Subject: URL library problem
Date: Sun, 2 Oct 2005 21:48:06 +0300	[thread overview]
Message-ID: <200510022132.54931.pogonyshev@gmx.net> (raw)

Hello,

I believe I have found a serious problem in the URL library.  If you
look at the very end of function `url-http', you can see that the
result of `url-http-create-request' is sent to the connection as-is.
But encoding of the connection is binary!  It means, that multibyte
strings are sent in Emacs internal coding, which nothing but Emacs
understands.

Form data sent as `multipart/form-data' is usually sent in the
encoding of the page, e.g. UTF-8.  With the current state of URL, it
seems to be impossible to send non-ASCII `multipart/form-data'.

Here is a test:


(let ((url-request-method "POST")
      (url-request-extra-headers '(("Content-Type" . "multipart/form-data; boundary=---")))
      (url-request-data (concat "-----\r\nContent-Disposition: form-data; name=\"wpTextbox1\"\r\n\r\n"
				"проверка\r\n"
				"-------\r\n")))
  (url-retrieve "http://en.wikipedia.org/w/index.php?title=Test_page&action=submit"
		(lambda () (pop-to-buffer (current-buffer)))))


Save the buffer it pops up as an HTML and open it in a browser.  It
should be a Wikipedia preview page with Russian word ``проверка''
(`test'), but it isn't.  Instead of UTF-8, the word got sent in Emacs
internal coding.

Note how explicit UTF-8 encoding helps nothing, because
`url-request-data' is later concatenated with some strings turning
multibyte again:


(let ((url-request-method "POST")
      (url-request-extra-headers '(("Content-Type" . "multipart/form-data; boundary=---")))
      (url-request-data (encode-coding-string
			 (concat "-----\r\nContent-Disposition: form-data; name=\"wpTextbox1\"\r\n\r\n"
				 "проверка\r\n"
				 "-------\r\n")
			 'utf-8)))
  (url-retrieve "http://en.wikipedia.org/w/index.php?title=Test_page&action=submit"
		(lambda () (pop-to-buffer (current-buffer)))))


However, this trivial (and not-for-production) patch makes the first
test work, because it encode the complete request, which is then sent
to Wikipedia server unmodified:


--- /home/paul/emacs/lisp/url/url-http.el	2005-09-28 16:56:02.000000000 +0300
+++ /tmp/buffer-content-2240ocC	2005-10-02 21:30:00.000000000 +0300
@@ -268,7 +268,7 @@ request.
 	   ;; Any data
 	   url-request-data))
     (url-http-debug "Request is: \n%s" request)
-    request))
+    (encode-coding-string request 'utf-8))
 
 ;; Parsing routines
 (defun url-http-clean-headers ()


Of course, uncoditional encoding in UTF-8 is not a right thing to do.
Actually, encoding of the complete request is not right.  A proper
patch would simply avoid concatenating `url-request-data' with
anything and send it to the connection verbatim, assuming that the
user of the library has already properly encoded it.  The reason for
this is that `multipart/form-data' can have different parts in
different encoding (even if it is hardly ever used.)

Are you interested in a patch?

Paul

             reply	other threads:[~2005-10-02 18:48 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-10-02 18:48 Paul Pogonyshev [this message]
2005-10-02 19:32 ` URL library problem Mark A. Hershberger
2005-10-03  5:09 ` Richard M. Stallman
2005-10-03 14:36 ` Stefan Monnier
2005-10-03 15:26   ` Stefan Monnier
2005-10-03 17:53     ` Paul Pogonyshev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200510022132.54931.pogonyshev@gmx.net \
    --to=pogonyshev@gmx.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.