From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel Subject: Re: encoding and content-length for url-http.el Date: Fri, 10 Jun 2005 15:47:53 -0400 Message-ID: <87u0k69fhn.fsf-monnier+emacs@gnu.org> References: <1118418076.8854.41.camel@localhost.localdomain> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1118433414 15896 80.91.229.2 (10 Jun 2005 19:56:54 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Fri, 10 Jun 2005 19:56:54 +0000 (UTC) Cc: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Jun 10 21:56:53 2005 Return-path: Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1DgpcW-0004FU-8O for ged-emacs-devel@m.gmane.org; Fri, 10 Jun 2005 21:56:48 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Dgpjh-0008Kz-KV for ged-emacs-devel@m.gmane.org; Fri, 10 Jun 2005 16:04:13 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1Dgphf-0007KT-LF for emacs-devel@gnu.org; Fri, 10 Jun 2005 16:02:07 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1DgphZ-0007GL-L2 for emacs-devel@gnu.org; Fri, 10 Jun 2005 16:02:04 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1DgphX-0007BA-OG for emacs-devel@gnu.org; Fri, 10 Jun 2005 16:01:59 -0400 Original-Received: from [209.226.175.34] (helo=tomts13-srv.bellnexxia.net) by monty-python.gnu.org with esmtp (Exim 4.34) id 1DgpUM-0003Nq-Or for emacs-devel@gnu.org; Fri, 10 Jun 2005 15:48:22 -0400 Original-Received: from alfajor ([70.48.82.205]) by tomts13-srv.bellnexxia.net (InterMail vM.5.01.06.10 201-253-122-130-110-20040306) with ESMTP id <20050610194754.JVWT25800.tomts13-srv.bellnexxia.net@alfajor>; Fri, 10 Jun 2005 15:47:54 -0400 Original-Received: by alfajor (Postfix, from userid 1000) id 2E818D73D3; Fri, 10 Jun 2005 15:47:54 -0400 (EDT) Original-To: "Mark A. Hershberger" In-Reply-To: <1118418076.8854.41.camel@localhost.localdomain> (Mark A. Hershberger's message of "Fri, 10 Jun 2005 11:41:16 -0400") User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.0.50 (gnu/linux) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:38534 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:38534 > Could I get input on the following patch before I apply it? The first > part (using string-bytes instead of length) seems like a no-brainer. > The second part, I'm less sure about. > --- url-http.el 4 Jun 2005 18:37:16 -0000 1.14 > +++ url-http.el 10 Jun 2005 18:36:06 -0000 > @@ -259,7 +259,7 @@ > (if url-request-data > (concat > "Content-length: " (number-to-string > - (length url-request-data)) > + (string-bytes url-request-data)) I must say I haven't looked at the code, but it's anything but a no-brainer. I'd rather say that it's obviously wrong. `string-bytes' will give you the number of bytes used by Emacs for the internal representation of the string, not the number of bytes that the string will use on the write. Actually the two will be the same in 2 cases: 1 - url-request-data is a unibyte string (in which case `length' also returns the same value and the match makes no difference). 2 - the process's coding system is `emacs-mule'. The first case is probably what we want. The second is unlikely to ever be right. > + (defvar url-request-coding-system 'binary "The coding system to use for the request.") [...] > + (set-process-coding-system connection > + (detect-coding-string url-request-data t) > + url-request-coding-system) This says "the data we send is binary (i.e. unibyte, thus case 1 above) and the data we receive uses the coding system that we infer from url-request-data". Does that sound right to you? Assuming the data we send is url-request-data, it doesn't sound right to me. Using binary when sending sounds right (assuming url-request-data is unibyte, which is desirable). But when receiving we then probably would want to use binary as well. Also I'm not sure what is the purpose of url-request-coding-system. Would it make sense to set it to something else? If the change from length to string-bytes solves your problem, it means that url-request-data is not unibyte (i.e. not a seq of bytes, but a seq of chars), in which case using `binary' when sending can't be right. Stefan