From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: David Kastrup Newsgroups: gmane.emacs.devel Subject: Re: Unibyte characters, strings, and buffers Date: Fri, 28 Mar 2014 11:58:27 +0100 Organization: Organization?!? Message-ID: <87ob0qsi7g.fsf@fencepost.gnu.org> References: <831txozsqa.fsf@gnu.org> <83ppl7y30l.fsf@gnu.org> <87r45nouvx.fsf@uwakimon.sk.tsukuba.ac.jp> <8361myyac6.fsf@gnu.org> <87a9capqfr.fsf@uwakimon.sk.tsukuba.ac.jp> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1396004581 15239 80.91.229.3 (28 Mar 2014 11:03:01 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 28 Mar 2014 11:03:01 +0000 (UTC) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Mar 28 12:03:10 2014 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1WTUZ1-0007N0-TG for ged-emacs-devel@m.gmane.org; Fri, 28 Mar 2014 12:03:08 +0100 Original-Received: from localhost ([::1]:60421 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WTUZ1-0005Yo-GC for ged-emacs-devel@m.gmane.org; Fri, 28 Mar 2014 07:03:07 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:53012) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WTUUs-0007yW-T6 for emacs-devel@gnu.org; Fri, 28 Mar 2014 06:58:56 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WTUUm-0003rX-Mn for emacs-devel@gnu.org; Fri, 28 Mar 2014 06:58:50 -0400 Original-Received: from plane.gmane.org ([80.91.229.3]:42748) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WTUUm-0003rG-8R for emacs-devel@gnu.org; Fri, 28 Mar 2014 06:58:44 -0400 Original-Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1WTUUj-0004cb-D8 for emacs-devel@gnu.org; Fri, 28 Mar 2014 11:58:41 +0100 Original-Received: from x2f4fc20.dyn.telefonica.de ([2.244.252.32]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 28 Mar 2014 11:58:41 +0100 Original-Received: from dak by x2f4fc20.dyn.telefonica.de with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 28 Mar 2014 11:58:41 +0100 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 37 Original-X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: x2f4fc20.dyn.telefonica.de X-Face: 2FEFf>]>q>2iw=B6, xrUubRI>pR&Ml9=ao@P@i)L:\urd*t9M~y1^:+Y]'C0~{mAl`oQuAl \!3KEIp?*w`|bL5qr,H)LFO6Q=qx~iH4DN; i"; /yuIsqbLLCh/!U#X[S~(5eZ41to5f%E@'ELIi$t^ Vc\LWP@J5p^rst0+('>Er0=^1{]M9!p?&:\z]|;&=NP3AhB!B_bi^]Pfkw User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.4.50 (gnu/linux) Cancel-Lock: sha1:S1HIXDeBMrxm437WwltQ+ltMxcc= X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 80.91.229.3 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:171068 Archived-At: "Stephen J. Turnbull" writes: > I agree that having a way to represent "undecodable bytes" in a string > or buffer is extremely convenient. XEmacs's lack of this capability > is surely a deficiency (Hi, David K!) Doing this in an utf-8 based internal coding is somewhat doable by employing non-utf-8 sequences. Either using code points above the Unicode code range (2^20 + something, requiring 4 bytes), or by using non-minimal encodings (since the minimal ones are two bytes, requiring 3 bytes). Either way, the size increases significantly. > But this is a completely different issue from unibyte buffers. Emacs > doesn't need unibyte buffers to perform its work, and if they are > desirable on the grounds of space or time efficiency, they should be > opaque to Lisp. Well, Emacs is more following the non-opaque philosophy (XEmacs, in contrast, has even an opaque character type and several other ones). That has the advantage that you can use all sorts of available tools as long as they don't break. It has the disadvantage that the question "what is the right behavior for x?" needs to be answered quite more often since you can't take the "x does not apply to y anyway" route out as often. > > We cannot [...] > > No, I still disagree. Sure, everything is actually "We cannot efficiently" rather than "We cannot". But we still changed buffer positions from byte counts (as in early Emacs 20) to character counts. Efficiency took a dive but the alternatives were just too horrible API-wise. -- David Kastrup