From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "Stephen J. Turnbull" Newsgroups: gmane.emacs.devel Subject: Re: Unibyte characters, strings, and buffers Date: Sun, 30 Mar 2014 00:37:27 +0900 Message-ID: <87ioqxnhhk.fsf@uwakimon.sk.tsukuba.ac.jp> References: <831txozsqa.fsf@gnu.org> <83ppl7y30l.fsf@gnu.org> <87r45nouvx.fsf@uwakimon.sk.tsukuba.ac.jp> <8361myyac6.fsf@gnu.org> <87a9capqfr.fsf@uwakimon.sk.tsukuba.ac.jp> <83eh1mfd09.fsf@gnu.org> <87ob0pnyt6.fsf@uwakimon.sk.tsukuba.ac.jp> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 X-Trace: ger.gmane.org 1396107621 7331 80.91.229.3 (29 Mar 2014 15:40:21 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 29 Mar 2014 15:40:21 +0000 (UTC) Cc: Eli Zaretskii , monnier@IRO.UMontreal.CA, emacs-devel@gnu.org To: Andreas Schwab Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Mar 29 16:40:14 2014 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1WTvMk-0002IO-9c for ged-emacs-devel@m.gmane.org; Sat, 29 Mar 2014 16:40:14 +0100 Original-Received: from localhost ([::1]:39932 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WTvMj-0003d1-Ti for ged-emacs-devel@m.gmane.org; Sat, 29 Mar 2014 11:40:13 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:37375) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WTvMa-0003NG-3Q for emacs-devel@gnu.org; Sat, 29 Mar 2014 11:40:10 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WTvMU-0005zI-87 for emacs-devel@gnu.org; Sat, 29 Mar 2014 11:40:04 -0400 Original-Received: from mgmt2.sk.tsukuba.ac.jp ([130.158.97.224]:49164) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WTvMN-0005MA-Ll; Sat, 29 Mar 2014 11:39:51 -0400 Original-Received: from uwakimon.sk.tsukuba.ac.jp (uwakimon.sk.tsukuba.ac.jp [130.158.99.156]) by mgmt2.sk.tsukuba.ac.jp (Postfix) with ESMTP id F18F29707DD; Sun, 30 Mar 2014 00:37:27 +0900 (JST) Original-Received: by uwakimon.sk.tsukuba.ac.jp (Postfix, from userid 1000) id DFF0A1A28DC; Sun, 30 Mar 2014 00:37:27 +0900 (JST) In-Reply-To: X-Mailer: VM undefined under 21.5 (beta34) "kale" 2a0f42961ed4 XEmacs Lucid (x86_64-unknown-linux) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 130.158.97.224 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:171148 Archived-At: Andreas Schwab writes: > "Stephen J. Turnbull" writes: > > > *sigh* No, it's about unibyte being a premature pessimization. > > Unibyte is a pure space optimisation. It may be a space optimization, but it's hardly pure. Else this discussion wouldn't be happening. And `string-as-unibyte' exposes the internal representation of strings to Lisp. > Everything else should work as if all bytes in the range 128-255 > are decoded in the eight-bit charset. There seem to be conflicting opinions about that, and I would certainly disagree as there are scads of European charsets that happily fit into bytes. I see no reason why character operations (such as case conversion) shouldn't work transparently on bytes in GR interpreted as the corresponding Latin-1 (or any ISO Latin) charset -- with a little extra metadata in (internal unibyte) buffers and strings to indicate the charset implied. (This charset is independent of the various coding systems associated with buffers; it only says how to interpret a byte as a character in operations on characters in buffers.)