From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.help Subject: Re: [Solved] RE: Differences between identical strings in Emacs lisp Date: Thu, 09 Apr 2015 15:45:06 +0300 Message-ID: <837ftlfp2l.fsf@gnu.org> References: <87pp7gu7by.fsf@kuiper.lan.informatimago.com> <83mw2khvc1.fsf@gnu.org> <834morj19g.fsf@gnu.org> <83egnuhlu0.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE X-Trace: ger.gmane.org 1428583529 16252 80.91.229.3 (9 Apr 2015 12:45:29 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 9 Apr 2015 12:45:29 +0000 (UTC) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Thu Apr 09 14:45:14 2015 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1YgBpY-0001VZ-IY for geh-help-gnu-emacs@m.gmane.org; Thu, 09 Apr 2015 14:45:12 +0200 Original-Received: from localhost ([::1]:34387 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YgBpX-0000rG-Rg for geh-help-gnu-emacs@m.gmane.org; Thu, 09 Apr 2015 08:45:11 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:33723) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YgBpK-0000qz-Ia for help-gnu-emacs@gnu.org; Thu, 09 Apr 2015 08:44:59 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YgBpH-0005LR-47 for help-gnu-emacs@gnu.org; Thu, 09 Apr 2015 08:44:58 -0400 Original-Received: from mtaout25.012.net.il ([80.179.55.181]:56868) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YgBpG-0005K8-Ql for help-gnu-emacs@gnu.org; Thu, 09 Apr 2015 08:44:55 -0400 Original-Received: from conversion-daemon.mtaout25.012.net.il by mtaout25.012.net.il (HyperSendmail v2007.08) id <0NMJ00D00HEE5Y00@mtaout25.012.net.il> for help-gnu-emacs@gnu.org; Thu, 09 Apr 2015 15:40:19 +0300 (IDT) Original-Received: from HOME-C4E4A596F7 ([87.69.4.28]) by mtaout25.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NMJ00BRZHV6YW20@mtaout25.012.net.il> for help-gnu-emacs@gnu.org; Thu, 09 Apr 2015 15:40:19 +0300 (IDT) In-reply-to: X-012-Sender: halo1@inter.net.il X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 80.179.55.181 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:103605 Archived-At: > From: J=FCrgen Hartmann > Date: Thu, 9 Apr 2015 12:38:43 +0200 >=20 > > If this `insert' is performed inside a unibyte buffer, then this = 160 is > > instead taken to be a the code of a byte.=A0 Again, regardless of= the locale. >=20 > So this is comparable to the output of \xA0 in an unibyte string > (e.g. in "\xA0\ A") in contrast to the same in a mutibyte string (e= .g. in > "\xA0 =C4"): The former yields the raw byte \240, the latter a no-b= reak space. Yes, Emacs tries to treat buffers and strings alike. > I could imagine that the step from the equivalence char=3Dbyte to > char=3Dunicode code point (long(er) integer) is not so difficult. The problem with this is that an encoded character could span several bytes, and then how do you call each byte of such a multibyte sequence? You cannot call it a character. > But we have in addition the UTF-8 representation. If you mean the internal representation, then it's a superset of UTF-8, not UTF-8. If you mean the external encoding of text, then UTF-8 is not the only representation, not even the only multibyte representation. There are others, mostly used in Far East, but not only there. Even UTF-16, used natively by MS-Windows, is technically a multibyte representation. > To what of the two latter--unicode code point (integer, several > bytes long) or its UTF-8 representation (sequence of several bytes) > does the term "multibyte" refer? In the context of Emacs, it refers to the internal representation of characters, which is a superset of UTF-8.