From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Alan Mackenzie Newsgroups: gmane.emacs.devel Subject: Re: Fwd: Re: Inadequate documentation of silly characters on screen. Date: Thu, 19 Nov 2009 14:18:52 +0000 Message-ID: <20091119141852.GC1720@muc.de> References: <20091118191258.GA2676@muc.de> <20091119082040.GA1720@muc.de> <874ooq8xay.fsf@wanchan.jasonrumney.net> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Trace: ger.gmane.org 1258640267 32705 80.91.229.12 (19 Nov 2009 14:17:47 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 19 Nov 2009 14:17:47 +0000 (UTC) Cc: Andreas Schwab , Stefan Monnier , emacs-devel@gnu.org To: Jason Rumney Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Nov 19 15:17:39 2009 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1NB7pK-0005pZ-WE for ged-emacs-devel@m.gmane.org; Thu, 19 Nov 2009 15:17:39 +0100 Original-Received: from localhost ([127.0.0.1]:52262 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NB7pK-0006oX-Da for ged-emacs-devel@m.gmane.org; Thu, 19 Nov 2009 09:17:38 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NB7ll-000347-7M for emacs-devel@gnu.org; Thu, 19 Nov 2009 09:13:57 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1NB7lg-00030K-AZ for emacs-devel@gnu.org; Thu, 19 Nov 2009 09:13:56 -0500 Original-Received: from [199.232.76.173] (port=33250 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NB7lg-000306-38 for emacs-devel@gnu.org; Thu, 19 Nov 2009 09:13:52 -0500 Original-Received: from colin.muc.de ([193.149.48.1]:2129 helo=mail.muc.de) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1NB7le-0006NM-Pz for emacs-devel@gnu.org; Thu, 19 Nov 2009 09:13:51 -0500 Original-Received: (qmail 5206 invoked by uid 3782); 19 Nov 2009 14:13:49 -0000 Original-Received: from acm.muc.de (pD9E51409.dip.t-dialin.net [217.229.20.9]) by colin2.muc.de (tmda-ofmipd) with ESMTP; Thu, 19 Nov 2009 15:13:45 +0100 Original-Received: (qmail 6895 invoked by uid 1000); 19 Nov 2009 14:18:52 -0000 Content-Disposition: inline In-Reply-To: <874ooq8xay.fsf@wanchan.jasonrumney.net> User-Agent: Mutt/1.5.9i X-Delivery-Agent: TMDA/1.1.5 (Fettercairn) X-Primary-Address: acm@muc.de X-detected-operating-system: by monty-python.gnu.org: FreeBSD 4.6-4.9 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:117249 Archived-At: On Thu, Nov 19, 2009 at 09:21:41PM +0800, Jason Rumney wrote: > Andreas Schwab writes: > > Nothing gets truncated. In Emacs 23 ?ñ is simply the number 241, > > whereas in Emacs 22 is it the number 2289. You can put 2289 in a > > string in Emacs 23, but there is no defined unicode character with > > that value. > The bug here is likely that setting a character in a unibyte string to > a value between 160 and 255 does not result in an automatic conversion > to multibyte. That was correct in 22.3, since values in that range > were raw binary bytes outside of any character set, but in 23.1 they > correspond to valid Latin-1 codepoints. Putting point over the \361 and doing C-x = shows the character is Char: \361 (4194289, #o17777761, #x3ffff1, raw-byte) The actual character in the string is ñ (#x3f). Going through all the motions, here is what I think is happening: the \361 is put there by `insert'. insert calls general_insert_function, calls insert_from_string (via a function pointer), calls insert_from_string_1, calls copy_text at this stage, I'm assuming to_multibyte (the screen buffer, in some form) is TRUE, and from_multibyte (a string holding the single character #xf1) is FALSE. We thus execute this code in copy_txt: else { unsigned char *initial_to_addr = to_addr; /* Convert single-byte to multibyte. */ while (nbytes > 0) { int c = *from_addr++; <============================== if (c >= 0200) { c = unibyte_char_to_multibyte (c); to_addr += CHAR_STRING (c, to_addr); nbytes--; } else /* Special case for speed. */ *to_addr++ = c, nbytes--; } return to_addr - initial_to_addr; } At the indicated line, c is a SIGNED integer, therefore will get the value 0xfffffff1, not 0xf1. copy_text then invokes the macro unibyte_char_to_multibyte (-15), at which point there's no point going any further. At least, that's my guess as to what's happening. A fix would be to change the declaration of "int c" to "unsigned int c". I'm going to try that now. -- Alan Mackenzie (Nuremberg, Germany).