From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: =?iso-8859-1?B?SvxyZ2VuIEhhcnRtYW5u?= Newsgroups: gmane.emacs.help Subject: RE: [Solved] RE: Differences between identical strings in Emacs lisp Date: Wed, 8 Apr 2015 13:01:16 +0200 Message-ID: References: , <87pp7gu7by.fsf@kuiper.lan.informatimago.com>, , <83mw2khvc1.fsf@gnu.org>, , <834morj19g.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1428490910 32265 80.91.229.3 (8 Apr 2015 11:01:50 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 8 Apr 2015 11:01:50 +0000 (UTC) To: "help-gnu-emacs@gnu.org" Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Wed Apr 08 13:01:41 2015 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Yfnjp-00016Q-4U for geh-help-gnu-emacs@m.gmane.org; Wed, 08 Apr 2015 13:01:41 +0200 Original-Received: from localhost ([::1]:52065 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Yfnjo-00021F-75 for geh-help-gnu-emacs@m.gmane.org; Wed, 08 Apr 2015 07:01:40 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:50609) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YfnjY-0001tl-BT for help-gnu-emacs@gnu.org; Wed, 08 Apr 2015 07:01:25 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YfnjR-0005Bc-J3 for help-gnu-emacs@gnu.org; Wed, 08 Apr 2015 07:01:24 -0400 Original-Received: from dub004-omc4s7.hotmail.com ([157.55.2.82]:65446) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YfnjR-00058Y-Ao for help-gnu-emacs@gnu.org; Wed, 08 Apr 2015 07:01:17 -0400 Original-Received: from DUB124-W33 ([157.55.2.72]) by DUB004-OMC4S7.hotmail.com over TLS secured channel with Microsoft SMTPSVC(7.5.7601.22751); Wed, 8 Apr 2015 04:01:16 -0700 X-TMN: [R6nvj8QjJetSg5ER0WeuwvQ6AQLNR56B] X-Originating-Email: [juergen_hartmann_@hotmail.com] Importance: Normal In-Reply-To: <834morj19g.fsf@gnu.org> X-OriginalArrivalTime: 08 Apr 2015 11:01:16.0544 (UTC) FILETIME=[55DB6800:01D071EB] X-detected-operating-system: by eggs.gnu.org: Windows 7 or 8 [fuzzy] X-Received-From: 157.55.2.82 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:103571 Archived-At: Thank you=2C Eli Zaretskii=2C for your explanations:=0A= =0A= >> [About mapping between unibyte and multibyte strings]=0A= >>=0A= >> First I thought that some hidden decoding based on some charsets or=0A= >> coding=0A= >> systems occurs.=0A= >=0A= > Actually=2C some sort of "decoding" does occur=2C albeit perhaps not in= =0A= > the use cases you tried -- Emacs will sometimes silently convert=0A= > unibyte characters to their locale-dependent multibyte equivalents.=0A= =0A= On which occasion such a conversion is done? Has this anything to do with t= he=0A= the charset that is individually defined in language-info-alist for nearly= =0A= each language environment?=0A= =0A= > This whole area of unibyte strings is replete with dwim-ish hacks and=0A= > kludges=2C all in an attempt to do what the user expects. Thus the=0A= > confusion and the advice to stay away of that gray area.=0A= =0A= Sounds like the well known design conflict between "behaving smart" and=0A= "being straight".=0A= =0A= >> [About "\x3FFFBA\x3FFFBB\x3FFFBC"]=0A= >=0A= > It's a "unibyte string"=2C which=2C by definition=2C contains raw bytes.= =0A= >=0A= > But it is actually better to say that the raw bytes there are \272 and=0A= > not \x3FFFBC. The latter is just the representation Emacs uses for=0A= > the former=2C Emacs goes out of its way not to show that internal=0A= > representation to the user.=0A= >=0A= >> ...=0A= >>=0A= >> Ah=2C that seems to be the key: raw bytes are not characters.=0A= >=0A= > Exactly.=0A= =0A= Great! Lesson learned.=0A= =0A= >> [About raw bytes]=0A= >=0A= > They _are_ a special "character set"=2C but only in the very technical=0A= > sense of "character set" in Emacs. By their nature and their=0A= > properties in Emacs=2C they are not characters.=0A= >=0A= >> [About characters and raw bytes in unibyte context]=0A= >=0A= > Raw bytes are only those whose value is above 127=2C so A is a=0A= > character.=0A= >=0A= > For subtle technical reasons (or maybe by some historical accident)=2C a= =0A= > pure-ASCII string is a unibyte string=2C although it contains=0A= > characters=2C not raw bytes. So having a unibyte string does not yet=0A= > mean you have raw bytes in it.=0A= =0A= It seems that all my related observations that puzzled me before can be wel= l=0A= explained by the strict distinction between characters and raw bytes and th= e=0A= mapping between the latter's integer representations in the range=0A= [0x80..0xFF] in an unibyte context and in the range [0x3FFF80..0x3FFFFF] in= a=0A= multibyte context.=0A= =0A= > By far the only valid use case where you need to manipulate unibyte=0A= > strings of raw bytes is if you need to encode or decode strings by=0A= > calling encode-coding-region and its ilk. E.g.=2C an application that=0A= > needs to send base64-encoded text needs first to encode it using=0A= > whatever coding-system is appropriate=2C which produces unibyte text=0A= > containing raw bytes=2C and then call base64-encode-region to produce=0A= > the final result. And similarly for decoding such stuff. You will=0A= > see examples of this in Gnus and Rmail=2C for example.=0A= >=0A= >> So=2C thank you very much for your enlightening answers.=0A= >=0A= > You are welcome.=0A= =0A= Thank you very much.=0A= =0A= Juergen=0A= =0A= =