From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "Stephen J. Turnbull" Newsgroups: gmane.emacs.devel Subject: Re: Unibyte characters, strings, and buffers Date: Sun, 30 Mar 2014 01:28:21 +0900 Message-ID: <87eh1lnf4q.fsf@uwakimon.sk.tsukuba.ac.jp> References: <831txozsqa.fsf@gnu.org> <83ppl7y30l.fsf@gnu.org> <87r45nouvx.fsf@uwakimon.sk.tsukuba.ac.jp> <8361myyac6.fsf@gnu.org> <87a9capqfr.fsf@uwakimon.sk.tsukuba.ac.jp> <83eh1mfd09.fsf@gnu.org> <87ob0pnyt6.fsf@uwakimon.sk.tsukuba.ac.jp> <87ioqxnhhk.fsf@uwakimon.sk.tsukuba.ac.jp> <87bnwpov7b.fsf@fencepost.gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 X-Trace: ger.gmane.org 1396110538 4749 80.91.229.3 (29 Mar 2014 16:28:58 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 29 Mar 2014 16:28:58 +0000 (UTC) Cc: emacs-devel@gnu.org To: David Kastrup Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Mar 29 17:28:52 2014 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1WTw7n-000067-IU for ged-emacs-devel@m.gmane.org; Sat, 29 Mar 2014 17:28:51 +0100 Original-Received: from localhost ([::1]:40137 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WTw7n-00046O-4a for ged-emacs-devel@m.gmane.org; Sat, 29 Mar 2014 12:28:51 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:43751) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WTw7c-00046B-Ou for emacs-devel@gnu.org; Sat, 29 Mar 2014 12:28:48 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WTw7V-0004N6-8t for emacs-devel@gnu.org; Sat, 29 Mar 2014 12:28:40 -0400 Original-Received: from mgmt2.sk.tsukuba.ac.jp ([130.158.97.224]:49433) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WTw7N-0004LF-7P; Sat, 29 Mar 2014 12:28:25 -0400 Original-Received: from uwakimon.sk.tsukuba.ac.jp (uwakimon.sk.tsukuba.ac.jp [130.158.99.156]) by mgmt2.sk.tsukuba.ac.jp (Postfix) with ESMTP id F2F97970A3D; Sun, 30 Mar 2014 01:28:21 +0900 (JST) Original-Received: by uwakimon.sk.tsukuba.ac.jp (Postfix, from userid 1000) id E1F5E1A28DC; Sun, 30 Mar 2014 01:28:21 +0900 (JST) In-Reply-To: <87bnwpov7b.fsf@fencepost.gnu.org> X-Mailer: VM undefined under 21.5 (beta34) "kale" 2a0f42961ed4 XEmacs Lucid (x86_64-unknown-linux) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 130.158.97.224 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:171156 Archived-At: David Kastrup writes: > That's not what unibyte buffers are for. They are for byte > streams, not characters. You would not want to edit a unibyte > buffer, for example, by inserting text and stuff. I beg to differ. I would like to edit RFC 822 headers for HTTP, SMTP, and other such wire protocols. This is precisely the use case that convinced van Rossum to restore %-formatting for bytes in Python 3.5 (to be released in about 18 months). > We have that "extra metadata", it is the unibyte flag. Yes, I know, but my point is that it should be purely for use of the internal implementation, and probably restricted to the C level. > But I consider it a mistake to use it for anything but "character > codes in this buffer happen to range from 0..255 rather than > 0..1000000 or whatever". I sympathize, though I think it's overkill for Emacs to have separate bytes and text types visible at the Lisp level. FWIW, that's a big step toward the design approach taken by Python 3, which has both bytes and text, but you can't mix them without an explicit encoding or decoding step, and the internal encoding of text is not exposed to Python functions at all. > And since Unicode 128..255 happens to be the latin-1 plane where the > latin-1 plane is defined as all, this will mean that the result will > behave like the latin-1 plane. That's not necessarily true. It just requires a slightly more complex design, which would be appropriate for Emacsen (as compared to Python).