From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: encoding and content-length for url-http.el Date: Thu, 16 Jun 2005 16:05:50 +0900 Message-ID: References: <1118418076.8854.41.camel@localhost.localdomain> <87u0k69fhn.fsf-monnier+emacs@gnu.org> <1118423681.8854.58.camel@localhost.localdomain> <87is0lap4t.fsf-monnier+emacs@gnu.org> <1118895704.7936.19.camel@localhost.localdomain> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Trace: sea.gmane.org 1118905530 5985 80.91.229.2 (16 Jun 2005 07:05:30 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Thu, 16 Jun 2005 07:05:30 +0000 (UTC) Cc: monnier@iro.umontreal.ca, emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Jun 16 09:05:22 2005 Return-path: Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1DioR2-0000ae-Qu for ged-emacs-devel@m.gmane.org; Thu, 16 Jun 2005 09:05:09 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1DioWP-000143-Ed for ged-emacs-devel@m.gmane.org; Thu, 16 Jun 2005 03:10:41 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1DioVn-00011B-Hh for emacs-devel@gnu.org; Thu, 16 Jun 2005 03:10:03 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1DioVk-00010R-P8 for emacs-devel@gnu.org; Thu, 16 Jun 2005 03:10:01 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1DioVk-0000zS-2Z for emacs-devel@gnu.org; Thu, 16 Jun 2005 03:10:00 -0400 Original-Received: from [192.47.44.130] (helo=tsukuba.m17n.org) by monty-python.gnu.org with esmtp (TLS-1.0:DHE_RSA_3DES_EDE_CBC_SHA:24) (Exim 4.34) id 1DioX6-0002Z4-Jo for emacs-devel@gnu.org; Thu, 16 Jun 2005 03:11:25 -0400 Original-Received: from nfs.m17n.org (nfs.m17n.org [192.47.44.7]) by tsukuba.m17n.org (8.12.3/8.12.3/Debian-7.1) with ESMTP id j5G77q86021023; Thu, 16 Jun 2005 16:07:52 +0900 Original-Received: from etlken (etlken.m17n.org [192.47.44.125]) by nfs.m17n.org (8.12.3/8.12.3/Debian-7.1) with ESMTP id j5G75pCq011689; Thu, 16 Jun 2005 16:05:51 +0900 Original-Received: from handa by etlken with local (Exim 3.36 #1 (Debian)) id 1DioRi-0006bF-00; Thu, 16 Jun 2005 16:05:50 +0900 Original-To: "Mark A. Hershberger" In-reply-to: <1118895704.7936.19.camel@localhost.localdomain> (mah@everybody.org) User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/22.0.50 (i686-pc-linux-gnu) MULE/5.0 (SAKAKI) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:38939 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:38939 In article <1118895704.7936.19.camel@localhost.localdomain>, "Mark A. Hersh= berger" writes: [...] > I'm not using url-dav.el -- I'm using xml-rpc.el which I maintain. > However, to eliminate the reliance on external code, I've pulled the bit > from xml-rpc.el that makes the call to post to a weblog hosted on > Blogger.com: > (let ((url-debug t)) (setq url-request-data "blogger.newPost0123456789ABCDEF<= param>9380140= usrnamepasswrdI=F1t=EBrn=E2ti=F4n=E0liz=E6ti=F8n from em= acs with patch1") =20 [...] > result)))))) > Without the patch that I supplied, this results in a server error: > "unexpected end of file found" > With the patch, it works perfectly. The result can be seen at > http://emacs-weblogger.blogspot.com/ In the code above, you set url-request-data to a multibyte string. All non-ascii characters in "I=F1t=EBrn=E2ti=F4n=E0liz=E6ti=F8n" are iso-8859-1 and Emacs internally represents each character in iso-8859-1 in 2-byte. That means string-bytes on url-request-data returns, by chance, the same byte length of the result of encoding it by utf-8. (string-bytes "I=F1t=EBrn=E2ti=F4n=E0liz=E6ti=F8n") =3D=3D (length (encode-coding-string "I=F1t=EBrn=E2ti=F4n=E0liz=E6ti=F8n" '= utf-8)) =3D=3D 27 That's why your change to url-http.el works for the above case. But, that is just coincidence. If the string contains, for instance, an Ethiopic character, it doesn't work. What I still don't know is what value url-request-data should have? If it should be an already encoded string (and make it callers responsibility to pre-encode a string), just using `length' as now is ok. And you can use this kind of code: < (let ((url-debug t)) (setq url-request-data (encode-coding-string= "blo= gger.newPost0123456789ABCDEF9380140= usrnamepasswrdI=F1t=EBrn=E2ti=F4n= =E0liz=E6ti=F8n from emacs with patch1" 'utf-8)) Please try it after cancelling your change. If it should be a multibyte string, the correct way to calculate Content-length: is to use this code: (length (encode-coding-string "I=F1t=EBrn=E2ti=F4n=E0liz=E6ti=F8n"=20 url-request-coding-system)) with your patch for introducing url-request-coding-system. Anyway, this change of yours: > + (set-process-coding-system connection > + (detect-coding-string url-request-data t) > + url-request-coding-system) is bad as Stefan wrote. The second arg must be `binary', and we should decode the received data according to the contents (perhaps by parsing the header and detecting what charset is specified and falling back to Emacs' code detection routine). --- Kenichi Handa handa@m17n.org