From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel Subject: Re: decode-coding-string gone awry? Date: Tue, 15 Feb 2005 11:17:10 -0500 Message-ID: References: <200502140150.KAA29610@etlken.m17n.org> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1108486166 16617 80.91.229.2 (15 Feb 2005 16:49:26 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Tue, 15 Feb 2005 16:49:26 +0000 (UTC) Cc: emacs-devel@gnu.org, Kenichi Handa Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Feb 15 17:49:25 2005 Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1D15sy-0002IL-Mz for ged-emacs-devel@m.gmane.org; Tue, 15 Feb 2005 17:49:17 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1D168k-0006OE-KP for ged-emacs-devel@m.gmane.org; Tue, 15 Feb 2005 12:05:34 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1D160g-0004hb-94 for emacs-devel@gnu.org; Tue, 15 Feb 2005 11:57:14 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1D160Z-0004fA-Vm for emacs-devel@gnu.org; Tue, 15 Feb 2005 11:57:08 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1D160W-0004UG-AV for emacs-devel@gnu.org; Tue, 15 Feb 2005 11:57:05 -0500 Original-Received: from [132.204.24.67] (helo=mercure.iro.umontreal.ca) by monty-python.gnu.org with esmtp (Exim 4.34) id 1D15O1-0003JO-SC; Tue, 15 Feb 2005 11:17:18 -0500 Original-Received: from hidalgo.iro.umontreal.ca (hidalgo.iro.umontreal.ca [132.204.27.50]) by mercure.iro.umontreal.ca (Postfix) with ESMTP id BE5CE8282A9; Tue, 15 Feb 2005 11:17:16 -0500 (EST) Original-Received: from asado.iro.umontreal.ca (asado.iro.umontreal.ca [132.204.24.84]) by hidalgo.iro.umontreal.ca (Postfix) with ESMTP id 30A3A4AC21A; Tue, 15 Feb 2005 11:17:11 -0500 (EST) Original-Received: by asado.iro.umontreal.ca (Postfix, from userid 20848) id 1993F4BB76; Tue, 15 Feb 2005 11:17:11 -0500 (EST) Original-To: rms@gnu.org User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.0.50 (gnu/linux) X-DIRO-MailScanner-Information: Please contact the ISP for more information X-DIRO-MailScanner: Found to be clean X-DIRO-MailScanner-SpamCheck: n'est pas un polluriel, SpamAssassin (score=-4.799, requis 5, autolearn=not spam, AWL 0.10, BAYES_00 -4.90) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org X-MailScanner-To: ged-emacs-devel@m.gmane.org Xref: main.gmane.org gmane.emacs.devel:33486 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:33486 > Is it reasonable to operate with decode-coding-string on a multibyte > string? If that is nonsense, maybe we should make it get an error, > to help people debug such problems. I think it would indeed make sense to signal errors when decoding a multibyte string or when encoding a unibyte string. > If there are some few cases where decode-coding-string makes sense on > a multibyte string, maybe we can make it get an error except in those > few cases. The problem I suspect is that it's pretty common for ASCII-only strings to be arbitrarily marked unibyte or multibyte depending on the circumstance. So we would have to check for the case where the string is ASCII-only before signalling an error. I'm actually running right now with an Emacs that does signal such errors. I've changed the notion of "multibyte/unibyte" string by saying: - [same as now] if size_byte < 0, it's UNIBYTE. - [same as now] if size_byte > size, it's MULTIBYTE. - [changed] if size_byte == size, it's neither/both (ASCII-only). Then I've changed several parts of the C code to try and set size_byte==size whenever possible (instead of marking the string as unibyte). Stefan PS: As of now, the only place where Emacs has signalled a bad encoding/decoding with the proposed error is in Gnus, though I haven't checked any further whether this error really is a bug in Gnus.