From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.help Subject: Re: [Solved] RE: Differences between identical strings in Emacs lisp Date: Wed, 08 Apr 2015 08:37:17 -0400 Message-ID: References: <87pp7gu7by.fsf@kuiper.lan.informatimago.com> <83mw2khvc1.fsf@gnu.org> <834morj19g.fsf@gnu.org> <83egnuhlu0.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1428496678 31285 80.91.229.3 (8 Apr 2015 12:37:58 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 8 Apr 2015 12:37:58 +0000 (UTC) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Wed Apr 08 14:37:50 2015 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1YfpEr-0007fm-Vo for geh-help-gnu-emacs@m.gmane.org; Wed, 08 Apr 2015 14:37:50 +0200 Original-Received: from localhost ([::1]:52600 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YfpEr-0000tZ-6G for geh-help-gnu-emacs@m.gmane.org; Wed, 08 Apr 2015 08:37:49 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:46006) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YfpEg-0000tC-1l for help-gnu-emacs@gnu.org; Wed, 08 Apr 2015 08:37:38 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YfpEb-0001ts-S9 for help-gnu-emacs@gnu.org; Wed, 08 Apr 2015 08:37:37 -0400 Original-Received: from plane.gmane.org ([80.91.229.3]:59190) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YfpEb-0001ti-Kr for help-gnu-emacs@gnu.org; Wed, 08 Apr 2015 08:37:33 -0400 Original-Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1YfpEa-0007Yw-8W for help-gnu-emacs@gnu.org; Wed, 08 Apr 2015 14:37:32 +0200 Original-Received: from 65-110-216-75.cpe.pppoe.ca ([65.110.216.75]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 08 Apr 2015 14:37:32 +0200 Original-Received: from monnier by 65-110-216-75.cpe.pppoe.ca with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 08 Apr 2015 14:37:32 +0200 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 41 Original-X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: 65-110-216-75.cpe.pppoe.ca User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.0.50 (gnu/linux) Cancel-Lock: sha1:czhytmYF/wlLIxMgwjd+4qIQGVA= X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 80.91.229.3 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:103574 Archived-At: >> > the use cases you tried -- Emacs will sometimes silently convert >> > unibyte characters to their locale-dependent multibyte equivalents. Nowadays this should happen extremely rarely, or never. >> On which occasion such a conversion is done? > One example that comes to mind is (insert 160), i.e. when inserting > text into a buffer. This doesn't do any conversion (although it did, in Emacs<23). 160 is simply taken as the code of the corresponding character in Emacs's character space (which is basically Unicode), hence regardless of locale. If this `insert' is performed inside a unibyte buffer, then this 160 is instead taken to be a the code of a byte. Again, regardless of the locale. AFAIR, the only "dwimish" conversion that still takes place on occasion is between things like #x3FFFBA and #xBA (i.e. between a byte and a character representing that same byte). >> It seems that all my related observations that puzzled me before can be well >> explained by the strict distinction between characters and raw bytes and the >> mapping between the latter's integer representations in the range >> [0x80..0xFF] in an unibyte context and in the range [0x3FFF80..0x3FFFFF] in a >> multibyte context. > Pretty much, yes. Yes, distinguishing bytes (and byte strings/buffers) from chars (and char strings/buffers) is key. Sadly, Emacs doesn't make it easy because the terms used evolved from a time where byte=char and where people were focused too much on the underlying/internal representation (hence the terms "multibyte" vs "unibyte"), plus the fact that too much code relied on byte=char to be able to make a clean design. So when Emacs-20 appeared, it included all kinds of dwimish (and locale-dependent) conversions to try and accommodate incorrect byte=char assumptions. Over time, the design has been significantly cleaned up, but the terminology is still problematic. Stefan