From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: =?iso-8859-1?B?SvxyZ2VuIEhhcnRtYW5u?= Newsgroups: gmane.emacs.help Subject: RE: [Solved] RE: Differences between identical strings in Emacs lisp Date: Thu, 9 Apr 2015 12:38:43 +0200 Message-ID: References: , <87pp7gu7by.fsf@kuiper.lan.informatimago.com>, <83mw2khvc1.fsf@gnu.org>, <834morj19g.fsf@gnu.org>, <83egnuhlu0.fsf@gnu.org>, NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1428575950 19021 80.91.229.3 (9 Apr 2015 10:39:10 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 9 Apr 2015 10:39:10 +0000 (UTC) To: "help-gnu-emacs@gnu.org" Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Thu Apr 09 12:39:05 2015 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Yg9rU-0001vT-HL for geh-help-gnu-emacs@m.gmane.org; Thu, 09 Apr 2015 12:39:04 +0200 Original-Received: from localhost ([::1]:60443 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Yg9rP-0006Ne-26 for geh-help-gnu-emacs@m.gmane.org; Thu, 09 Apr 2015 06:38:59 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:54530) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Yg9rD-0006N9-2V for help-gnu-emacs@gnu.org; Thu, 09 Apr 2015 06:38:48 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Yg9rA-0004yt-CA for help-gnu-emacs@gnu.org; Thu, 09 Apr 2015 06:38:47 -0400 Original-Received: from dub004-omc4s26.hotmail.com ([157.55.2.101]:49834) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Yg9rA-0004yW-4f for help-gnu-emacs@gnu.org; Thu, 09 Apr 2015 06:38:44 -0400 Original-Received: from DUB124-W12 ([157.55.2.72]) by DUB004-OMC4S26.hotmail.com over TLS secured channel with Microsoft SMTPSVC(7.5.7601.22751); Thu, 9 Apr 2015 03:38:43 -0700 X-TMN: [S7UYrE7PrsE1YYqFAenIUa4oG0YejdBW] X-Originating-Email: [juergen_hartmann_@hotmail.com] Importance: Normal In-Reply-To: X-OriginalArrivalTime: 09 Apr 2015 10:38:43.0235 (UTC) FILETIME=[59A29730:01D072B1] X-detected-operating-system: by eggs.gnu.org: Windows 7 or 8 [fuzzy] X-Received-From: 157.55.2.101 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:103601 Archived-At: Thank you for the clarification=2C Stefan Monnier:=0A= =0A= >>>> the use cases you tried -- Emacs will sometimes silently convert=0A= >>>> unibyte characters to their locale-dependent multibyte equivalents.=0A= >=0A= > Nowadays this should happen extremely rarely=2C or never.=0A= >=0A= >>> On which occasion such a conversion is done?=0A= >> One example that comes to mind is (insert 160)=2C i.e. when inserting=0A= >> text into a buffer.=0A= >=0A= > This doesn't do any conversion (although it did=2C in Emacs<23).=0A= > 160 is simply taken as the code of the corresponding character in=0A= > Emacs's character space (which is basically Unicode)=2C hence regardless= =0A= > of locale.=0A= >=0A= > If this `insert' is performed inside a unibyte buffer=2C then this 160 is= =0A= > instead taken to be a the code of a byte.=A0 Again=2C regardless of the l= ocale.=0A= =0A= So this is comparable to the output of \xA0 in an unibyte string=0A= (e.g. in "\xA0\ A") in contrast to the same in a mutibyte string (e.g. in= =0A= "\xA0 =C4"): The former yields the raw byte \240=2C the latter a no-break s= pace.=0A= =0A= > AFAIR=2C the only "dwimish" conversion that still takes place on occasion= =0A= > is between things like #x3FFFBA and #xBA (i.e. between a byte and=0A= > a character representing that same byte).=0A= =0A= (*Broad grin*) I think that I appoint this one to my favorite trap. (See my= =0A= previous post.)=0A= =0A= >>> It seems that all my related observations that puzzled me before can be= well=0A= >>> explained by the strict distinction between characters and raw bytes an= d the=0A= >>> mapping between the latter's integer representations in the range=0A= >>> [0x80..0xFF] in an unibyte context and in the range [0x3FFF80..0x3FFFFF= ] in a=0A= >>> multibyte context.=0A= >> Pretty much=2C yes.=0A= >=0A= > Yes=2C distinguishing bytes (and byte strings/buffers) from chars (and=0A= > char strings/buffers) is key.=A0 Sadly=2C Emacs doesn't make it easy beca= use=0A= > the terms used evolved from a time where byte=3Dchar and where people wer= e=0A= > focused too much on the underlying/internal representation (hence the=0A= > terms "multibyte" vs "unibyte")=2C plus the fact that too much code relie= d=0A= > on byte=3Dchar to be able to make a clean design.=A0 So when Emacs-20=0A= > appeared=2C it included all kinds of dwimish (and locale-dependent)=0A= > conversions to try and accommodate incorrect byte=3Dchar assumptions.=0A= > Over time=2C the design has been significantly cleaned up=2C but the=0A= > terminology is still problematic.=0A= =0A= I could imagine that the step from the equivalence char=3Dbyte to=0A= char=3Dunicode code point (long(er) integer) is not so difficult. But we ha= ve=0A= in addition the UTF-8 representation. To what of the two latter--unicode co= de=0A= point (integer=2C several bytes long) or its UTF-8 representation (sequence= of=0A= several bytes) does the term "multibyte" refer?=0A= =0A= Thank you for the insight in the historic background.=0A= =0A= Juergen=0A= =0A= =