From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Paul Pogonyshev Newsgroups: gmane.emacs.devel Subject: URL library problem Date: Sun, 2 Oct 2005 21:48:06 +0300 Message-ID: <200510022132.54931.pogonyshev@gmx.net> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Trace: sea.gmane.org 1128278562 12745 80.91.229.2 (2 Oct 2005 18:42:42 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Sun, 2 Oct 2005 18:42:42 +0000 (UTC) Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Oct 02 20:42:41 2005 Return-path: Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1EM8m5-0003Bh-Ku for ged-emacs-devel@m.gmane.org; Sun, 02 Oct 2005 20:41:25 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1EM8m5-0004tz-2n for ged-emacs-devel@m.gmane.org; Sun, 02 Oct 2005 14:41:25 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1EM8jC-0003ny-IK for emacs-devel@gnu.org; Sun, 02 Oct 2005 14:38:26 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1EM8jB-0003nM-DJ for emacs-devel@gnu.org; Sun, 02 Oct 2005 14:38:25 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1EM8jB-0003nH-90 for emacs-devel@gnu.org; Sun, 02 Oct 2005 14:38:25 -0400 Original-Received: from [213.165.64.20] (helo=mail.gmx.net) by monty-python.gnu.org with smtp (Exim 4.34) id 1EM8fS-0001Ad-RA for emacs-devel@gnu.org; Sun, 02 Oct 2005 14:34:35 -0400 Original-Received: (qmail invoked by alias); 02 Oct 2005 18:34:31 -0000 Original-Received: from unknown (EHLO localhost.localdomain) [194.158.218.165] by mail.gmx.net (mp017) with SMTP; 02 Oct 2005 20:34:31 +0200 X-Authenticated: #16844820 Original-To: emacs-devel@gnu.org User-Agent: KMail/1.4.3 X-Y-GMX-Trusted: 0 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:43464 Archived-At: Hello, I believe I have found a serious problem in the URL library. If you look at the very end of function `url-http', you can see that the result of `url-http-create-request' is sent to the connection as-is. But encoding of the connection is binary! It means, that multibyte strings are sent in Emacs internal coding, which nothing but Emacs understands. Form data sent as `multipart/form-data' is usually sent in the encoding of the page, e.g. UTF-8. With the current state of URL, it seems to be impossible to send non-ASCII `multipart/form-data'. Here is a test: (let ((url-request-method "POST") (url-request-extra-headers '(("Content-Type" . "multipart/form-data= ; boundary=3D---"))) (url-request-data (concat "-----\r\nContent-Disposition: form-data;= name=3D\"wpTextbox1\"\r\n\r\n" =09=09=09=09"=D0=BF=D1=80=D0=BE=D0=B2=D0=B5=D1=80=D0=BA=D0=B0\r\n" =09=09=09=09"-------\r\n"))) (url-retrieve "http://en.wikipedia.org/w/index.php?title=3DTest_page&ac= tion=3Dsubmit" =09=09(lambda () (pop-to-buffer (current-buffer))))) Save the buffer it pops up as an HTML and open it in a browser. It should be a Wikipedia preview page with Russian word ``=D0=BF=D1=80=D0=BE= =D0=B2=D0=B5=D1=80=D0=BA=D0=B0'' (`test'), but it isn't. Instead of UTF-8, the word got sent in Emacs internal coding. Note how explicit UTF-8 encoding helps nothing, because `url-request-data' is later concatenated with some strings turning multibyte again: (let ((url-request-method "POST") (url-request-extra-headers '(("Content-Type" . "multipart/form-data= ; boundary=3D---"))) (url-request-data (encode-coding-string =09=09=09 (concat "-----\r\nContent-Disposition: form-data; name=3D\"wpTe= xtbox1\"\r\n\r\n" =09=09=09=09 "=D0=BF=D1=80=D0=BE=D0=B2=D0=B5=D1=80=D0=BA=D0=B0\r\n" =09=09=09=09 "-------\r\n") =09=09=09 'utf-8))) (url-retrieve "http://en.wikipedia.org/w/index.php?title=3DTest_page&ac= tion=3Dsubmit" =09=09(lambda () (pop-to-buffer (current-buffer))))) However, this trivial (and not-for-production) patch makes the first test work, because it encode the complete request, which is then sent to Wikipedia server unmodified: --- /home/paul/emacs/lisp/url/url-http.el=092005-09-28 16:56:02.000000000= +0300 +++ /tmp/buffer-content-2240ocC=092005-10-02 21:30:00.000000000 +0300 @@ -268,7 +268,7 @@ request. =09 ;; Any data =09 url-request-data)) (url-http-debug "Request is: \n%s" request) - request)) + (encode-coding-string request 'utf-8)) =20 ;; Parsing routines (defun url-http-clean-headers () Of course, uncoditional encoding in UTF-8 is not a right thing to do. Actually, encoding of the complete request is not right. A proper patch would simply avoid concatenating `url-request-data' with anything and send it to the connection verbatim, assuming that the user of the library has already properly encoded it. The reason for this is that `multipart/form-data' can have different parts in different encoding (even if it is hardly ever used.) Are you interested in a patch? Paul