From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: "Stephen J. Turnbull" <stephen@xemacs.org>
Newsgroups: gmane.emacs.devel
Subject: Re: Buffer-local variables affect general-purpose functions
Date: Fri, 28 Mar 2014 12:38:10 +0900
Message-ID: <87r45nouvx.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <831txozsqa.fsf@gnu.org> <jwv4n2j2141.fsf-monnier+emacs@gnu.org>
	<83ppl7y30l.fsf@gnu.org>
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
X-Trace: ger.gmane.org 1395977904 24508 80.91.229.3 (28 Mar 2014 03:38:24 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Fri, 28 Mar 2014 03:38:24 +0000 (UTC)
Cc: Stefan Monnier <monnier@IRO.UMontreal.CA>, emacs-devel@gnu.org
To: Eli Zaretskii <eliz@gnu.org>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Mar 28 04:38:34 2014
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1WTNcn-00066R-Qy
	for ged-emacs-devel@m.gmane.org; Fri, 28 Mar 2014 04:38:33 +0100
Original-Received: from localhost ([::1]:56995 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1WTNcn-0005Dc-Eg
	for ged-emacs-devel@m.gmane.org; Thu, 27 Mar 2014 23:38:33 -0400
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:46933)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <stephen@xemacs.org>) id 1WTNcf-0005DE-Hc
	for emacs-devel@gnu.org; Thu, 27 Mar 2014 23:38:31 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <stephen@xemacs.org>) id 1WTNcZ-00008q-NM
	for emacs-devel@gnu.org; Thu, 27 Mar 2014 23:38:25 -0400
Original-Received: from mgmt2.sk.tsukuba.ac.jp ([130.158.97.224]:34879)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <stephen@xemacs.org>)
	id 1WTNcT-00007S-2q; Thu, 27 Mar 2014 23:38:13 -0400
Original-Received: from uwakimon.sk.tsukuba.ac.jp (uwakimon.sk.tsukuba.ac.jp
	[130.158.99.156])
	by mgmt2.sk.tsukuba.ac.jp (Postfix) with ESMTP id 7C7EA970A21;
	Fri, 28 Mar 2014 12:38:10 +0900 (JST)
Original-Received: by uwakimon.sk.tsukuba.ac.jp (Postfix, from userid 1000)
	id 6B3611A28DC; Fri, 28 Mar 2014 12:38:10 +0900 (JST)
In-Reply-To: <83ppl7y30l.fsf@gnu.org>
X-Mailer: VM undefined under 21.5  (beta34) "kale" 2a0f42961ed4 XEmacs Lucid
	(x86_64-unknown-linux)
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x
X-Received-From: 130.158.97.224
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:171060
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/171060>

Eli Zaretskii writes:

 > Paul seemed to say something more broad: that _all_ behaviors specific
 > to unibyte buffers should go away.  Do you agree?

Yes, please.  XEmacs has never had the unibyte hack with Mule, and
never has had much trouble with that.  It also has never had an
instance of the \201 bug since Mule was declared stable -- where Emacs
has had *many* regressions.  It's arguable that there are performance
implications, but simply aliasing the binary codec to latin1-unix has
*never* caused a bug in handling binary files -- all bugs are due to
autodetection errors, not the buffer representation.  I don't recall a
case where a programmer "did something stupid" with a character
function that technically is inappropriate for true binary (eg,
upcase) -- invariably they were doing something like upcasing all the
HTML tags as they came off the wire.  Ie, the stream was a binary
protocol where all of the syntax was represented with ASCII bytes, and
therefore "readable words".

If the performance implications bother you, then a buffer
representation like http://www.python.org/dev/peps/pep-0393/ may be
useful.  You could do that halfway, as well (ie, buffers containing
pure Latin1 text or binary text would be represented as a flat buffer
of bytes, buffers containing scalars >= 256 would be represented as
UTF-8b, or whatever the hack for representing undecodable bytes
currently is).

 > Anyway, what should replace those hacks?  Arbitrarily interpreting raw
 > bytes as Latin characters is not TRT, IMO.

Python has a bytes/character distinction, but they have completely
separate implementations.  Emacs doesn't need that, unless you want to
compete with the P-languages as a web framework platform.  OTOH Emacs'
unibyte buffer toggle is a design bug, pure and simple, and it should
be backed up against a wall and immersed in insecticide.

If you stick to the interpretation that bytes contain non-negative
integers less than 256, you won't have a problem in practice if you
think them as the first 256 Unicode characters, but choose not to use
functions that make sense only with characters.  Python actually
implements many polymorphic functions (ie, they can be interpreted as
bytes->bytes or characters->characters, etc) by converting bytes to
characters as Latin-1, then using the character implementation of the
function.