From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel Subject: Re: encoding and content-length for url-http.el Date: Fri, 10 Jun 2005 17:22:37 -0400 Message-ID: <87is0lap4t.fsf-monnier+emacs@gnu.org> References: <1118418076.8854.41.camel@localhost.localdomain> <87u0k69fhn.fsf-monnier+emacs@gnu.org> <1118423681.8854.58.camel@localhost.localdomain> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1118439165 2422 80.91.229.2 (10 Jun 2005 21:32:45 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Fri, 10 Jun 2005 21:32:45 +0000 (UTC) Cc: Emacs Development Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Jun 10 23:32:42 2005 Return-path: Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1Dgr7I-0007GC-MH for ged-emacs-devel@m.gmane.org; Fri, 10 Jun 2005 23:32:40 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1DgrBY-0001EY-I4 for ged-emacs-devel@m.gmane.org; Fri, 10 Jun 2005 17:37:04 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1DgrA3-0000gS-Q1 for emacs-devel@gnu.org; Fri, 10 Jun 2005 17:35:32 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1Dgr9v-0000co-Ev for emacs-devel@gnu.org; Fri, 10 Jun 2005 17:35:28 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Dgr9u-0000Yy-Dn for emacs-devel@gnu.org; Fri, 10 Jun 2005 17:35:22 -0400 Original-Received: from [209.226.175.184] (helo=tomts22-srv.bellnexxia.net) by monty-python.gnu.org with esmtp (Exim 4.34) id 1Dgqy1-0008Kq-Cm for emacs-devel@gnu.org; Fri, 10 Jun 2005 17:23:05 -0400 Original-Received: from alfajor ([70.48.82.205]) by tomts22-srv.bellnexxia.net (InterMail vM.5.01.06.10 201-253-122-130-110-20040306) with ESMTP id <20050610212236.PBXT21470.tomts22-srv.bellnexxia.net@alfajor>; Fri, 10 Jun 2005 17:22:36 -0400 Original-Received: by alfajor (Postfix, from userid 1000) id 5EA00D73D3; Fri, 10 Jun 2005 17:22:37 -0400 (EDT) Original-To: "Mark A. Hershberger" In-Reply-To: <1118423681.8854.58.camel@localhost.localdomain> (Mark A. Hershberger's message of "Fri, 10 Jun 2005 13:14:41 -0400") User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.0.50 (gnu/linux) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:38537 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:38537 >> > - (length url-request-data)) >> > + (string-bytes url-request-data)) >> >> I must say I haven't looked at the code, but it's anything but >> a no-brainer. I'd rather say that it's obviously wrong. `string-bytes' >> will give you the number of bytes used by Emacs for the internal >> representation of the string, not the number of bytes that the string will >> use on the write. > So I was wrong. But length is even more obviously wrong than > string-bytes. > The description for length says "If the string contains multibyte > characters, this is not necessarily the number of bytes in the string; > it is the number of characters. To get the number of bytes, use > `string-bytes'." > Which is why I thought this was a no-brainer. We want number of bytes, > not number of characters. RFC2616 says "The Content-Length > entity-header field indicates the size of the entity-body, in decimal > number of OCTETs, sent to the recipient" Problem is that the byte length depends on the encoding that will be used. I.e. it's not just a property of the string itself. I think the code should keep `length' while making sure that url-request-data is always a sequence of bytes rather than a sequence of strings (i.e. its content has already been explicitly encoded in whichever coding-system was deemed appropriate). >> If the change from length to string-bytes solves your problem, it means that >> url-request-data is not unibyte (i.e. not a seq of bytes, but a seq of >> chars), in which case using `binary' when sending can't be right. > I've been using the patch successfully for some time on unicode strings > (seq of chars). It works for me and works were what is currently in CVS > fails. I believe you, that your code worked on your test cases, but if it does it seems to be by accident. > I'm quite willing to concede that its wrong, but I've had trouble > finding documentation for this stuff. And, like I said, this works > better for me than what is in CVS. Could you describe much more precisely what you're doing (especially how you use the URL package: which functions of it you call, etc...). Are you using WebDAV (i.e. url-dav.el)? I've found url-dav.el to be pretty buggy and looking through it, I see some places where a few more encode-coding-string wouldn't be amiss. Stefan