From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "Stephen J. Turnbull" Newsgroups: gmane.emacs.devel Subject: Re: Buffer-local variables affect general-purpose functions Date: Fri, 28 Mar 2014 12:38:10 +0900 Message-ID: <87r45nouvx.fsf@uwakimon.sk.tsukuba.ac.jp> References: <831txozsqa.fsf@gnu.org> <83ppl7y30l.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 X-Trace: ger.gmane.org 1395977904 24508 80.91.229.3 (28 Mar 2014 03:38:24 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 28 Mar 2014 03:38:24 +0000 (UTC) Cc: Stefan Monnier , emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Mar 28 04:38:34 2014 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1WTNcn-00066R-Qy for ged-emacs-devel@m.gmane.org; Fri, 28 Mar 2014 04:38:33 +0100 Original-Received: from localhost ([::1]:56995 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WTNcn-0005Dc-Eg for ged-emacs-devel@m.gmane.org; Thu, 27 Mar 2014 23:38:33 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:46933) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WTNcf-0005DE-Hc for emacs-devel@gnu.org; Thu, 27 Mar 2014 23:38:31 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WTNcZ-00008q-NM for emacs-devel@gnu.org; Thu, 27 Mar 2014 23:38:25 -0400 Original-Received: from mgmt2.sk.tsukuba.ac.jp ([130.158.97.224]:34879) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WTNcT-00007S-2q; Thu, 27 Mar 2014 23:38:13 -0400 Original-Received: from uwakimon.sk.tsukuba.ac.jp (uwakimon.sk.tsukuba.ac.jp [130.158.99.156]) by mgmt2.sk.tsukuba.ac.jp (Postfix) with ESMTP id 7C7EA970A21; Fri, 28 Mar 2014 12:38:10 +0900 (JST) Original-Received: by uwakimon.sk.tsukuba.ac.jp (Postfix, from userid 1000) id 6B3611A28DC; Fri, 28 Mar 2014 12:38:10 +0900 (JST) In-Reply-To: <83ppl7y30l.fsf@gnu.org> X-Mailer: VM undefined under 21.5 (beta34) "kale" 2a0f42961ed4 XEmacs Lucid (x86_64-unknown-linux) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 130.158.97.224 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:171060 Archived-At: Eli Zaretskii writes: > Paul seemed to say something more broad: that _all_ behaviors specific > to unibyte buffers should go away. Do you agree? Yes, please. XEmacs has never had the unibyte hack with Mule, and never has had much trouble with that. It also has never had an instance of the \201 bug since Mule was declared stable -- where Emacs has had *many* regressions. It's arguable that there are performance implications, but simply aliasing the binary codec to latin1-unix has *never* caused a bug in handling binary files -- all bugs are due to autodetection errors, not the buffer representation. I don't recall a case where a programmer "did something stupid" with a character function that technically is inappropriate for true binary (eg, upcase) -- invariably they were doing something like upcasing all the HTML tags as they came off the wire. Ie, the stream was a binary protocol where all of the syntax was represented with ASCII bytes, and therefore "readable words". If the performance implications bother you, then a buffer representation like http://www.python.org/dev/peps/pep-0393/ may be useful. You could do that halfway, as well (ie, buffers containing pure Latin1 text or binary text would be represented as a flat buffer of bytes, buffers containing scalars >= 256 would be represented as UTF-8b, or whatever the hack for representing undecodable bytes currently is). > Anyway, what should replace those hacks? Arbitrarily interpreting raw > bytes as Latin characters is not TRT, IMO. Python has a bytes/character distinction, but they have completely separate implementations. Emacs doesn't need that, unless you want to compete with the P-languages as a web framework platform. OTOH Emacs' unibyte buffer toggle is a design bug, pure and simple, and it should be backed up against a wall and immersed in insecticide. If you stick to the interpretation that bytes contain non-negative integers less than 256, you won't have a problem in practice if you think them as the first 256 Unicode characters, but choose not to use functions that make sense only with characters. Python actually implements many polymorphic functions (ie, they can be interpreted as bytes->bytes or characters->characters, etc) by converting bytes to characters as Latin-1, then using the character implementation of the function.