From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: David Kastrup Newsgroups: gmane.emacs.devel Subject: Re: Unibyte characters, strings, and buffers Date: Sat, 29 Mar 2014 16:55:52 +0100 Organization: Organization?!? Message-ID: <87bnwpov7b.fsf@fencepost.gnu.org> References: <831txozsqa.fsf@gnu.org> <83ppl7y30l.fsf@gnu.org> <87r45nouvx.fsf@uwakimon.sk.tsukuba.ac.jp> <8361myyac6.fsf@gnu.org> <87a9capqfr.fsf@uwakimon.sk.tsukuba.ac.jp> <83eh1mfd09.fsf@gnu.org> <87ob0pnyt6.fsf@uwakimon.sk.tsukuba.ac.jp> <87ioqxnhhk.fsf@uwakimon.sk.tsukuba.ac.jp> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1396108589 17517 80.91.229.3 (29 Mar 2014 15:56:29 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 29 Mar 2014 15:56:29 +0000 (UTC) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Mar 29 16:56:23 2014 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1WTvcL-0004YA-ME for ged-emacs-devel@m.gmane.org; Sat, 29 Mar 2014 16:56:21 +0100 Original-Received: from localhost ([::1]:39948 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WTvcL-0001F4-Bb for ged-emacs-devel@m.gmane.org; Sat, 29 Mar 2014 11:56:21 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:39089) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WTvcD-0001Cp-0I for emacs-devel@gnu.org; Sat, 29 Mar 2014 11:56:18 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WTvc4-0002aG-9P for emacs-devel@gnu.org; Sat, 29 Mar 2014 11:56:12 -0400 Original-Received: from plane.gmane.org ([80.91.229.3]:33390) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WTvc4-0002a2-3a for emacs-devel@gnu.org; Sat, 29 Mar 2014 11:56:04 -0400 Original-Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1WTvc3-0004LY-6E for emacs-devel@gnu.org; Sat, 29 Mar 2014 16:56:03 +0100 Original-Received: from x2f4c93b.dyn.telefonica.de ([2.244.201.59]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sat, 29 Mar 2014 16:56:03 +0100 Original-Received: from dak by x2f4c93b.dyn.telefonica.de with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sat, 29 Mar 2014 16:56:03 +0100 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 52 Original-X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: x2f4c93b.dyn.telefonica.de X-Face: 2FEFf>]>q>2iw=B6, xrUubRI>pR&Ml9=ao@P@i)L:\urd*t9M~y1^:+Y]'C0~{mAl`oQuAl \!3KEIp?*w`|bL5qr,H)LFO6Q=qx~iH4DN; i"; /yuIsqbLLCh/!U#X[S~(5eZ41to5f%E@'ELIi$t^ Vc\LWP@J5p^rst0+('>Er0=^1{]M9!p?&:\z]|;&=NP3AhB!B_bi^]Pfkw User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.4.50 (gnu/linux) Cancel-Lock: sha1:ammjYOrQ99TTdW3gY6FTecZo/nw= X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 80.91.229.3 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:171150 Archived-At: "Stephen J. Turnbull" writes: > Andreas Schwab writes: > > "Stephen J. Turnbull" writes: > > > > > *sigh* No, it's about unibyte being a premature pessimization. > > > > Unibyte is a pure space optimisation. > > It may be a space optimization, but it's hardly pure. Else this > discussion wouldn't be happening. And `string-as-unibyte' exposes the > internal representation of strings to Lisp. > > > Everything else should work as if all bytes in the range 128-255 > > are decoded in the eight-bit charset. > > There seem to be conflicting opinions about that, and I would > certainly disagree as there are scads of European charsets that > happily fit into bytes. That's not what unibyte buffers are for. They are for byte streams, not characters. You would not want to edit a unibyte buffer, for example, by inserting text and stuff. Now for byte stream manipulation, code points other than 0..255 are a nuisance. Certainly a larger nuisance than having to clear case-fold-search if you really want to do a byte search. > I see no reason why character operations (such as case conversion) > shouldn't work transparently on bytes in GR interpreted as the > corresponding Latin-1 (or any ISO Latin) charset -- with a little > extra metadata in (internal unibyte) buffers and strings to indicate > the charset implied. (This charset is independent of the various > coding systems associated with buffers; it only says how to interpret > a byte as a character in operations on characters in buffers.) We have that "extra metadata", it is the unibyte flag. But I consider it a mistake to use it for anything but "character codes in this buffer happen to range from 0..255 rather than 0..1000000 or whatever". And since Unicode 128..255 happens to be the latin-1 plane where the latin-1 plane is defined as all, this will mean that the result will behave like the latin-1 plane. Exactly because Emacs has _one_ underlying character set which happens to be Unicode. Which does not mean that it would be a good idea to use unibyte buffers/strings for actual text that happens to be Latin-1 only. -- David Kastrup