From mboxrd@z Thu Jan  1 00:00:00 1970
Path: main.gmane.org!not-for-mail
From: Miles Bader <miles@lsi.nec.co.jp>
Newsgroups: gmane.emacs.devel
Subject: Re: setenv -> locale-coding-system cannot handle ASCII?!
Date: 04 Mar 2003 11:48:57 +0900
Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org
Message-ID: <buod6l73kyu.fsf@mcspd15.ucom.lsi.nec.co.jp>
References: <m3lm05ciwr.fsf@loiso.podval.org>
	<200302250634.PAA27478@etlken.m17n.org>
	<buo65r8ua9i.fsf@mcspd15.ucom.lsi.nec.co.jp>
	<200302260058.JAA28973@etlken.m17n.org>
	<200302260211.h1Q2BJl08373@rum.cs.yale.edu>
	<200302260234.LAA29082@etlken.m17n.org>
	<200302260252.h1Q2qIK08490@rum.cs.yale.edu>
	<200302260532.OAA29294@etlken.m17n.org>
	<E18oAwB-0000ZD-00@fencepost.gnu.org> <20030227000638.GA5470@gnu.org>
	<E18pv9d-0004nD-00@fencepost.gnu.org>
Reply-To: Miles Bader <miles@gnu.org>
NNTP-Posting-Host: main.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: main.gmane.org 1046746334 20548 80.91.224.249 (4 Mar 2003 02:52:14 GMT)
X-Complaints-To: usenet@main.gmane.org
NNTP-Posting-Date: Tue, 4 Mar 2003 02:52:14 +0000 (UTC)
Cc: handa@m17n.org
Original-X-From: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Tue Mar 04 03:52:12 2003
Return-path: <emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org>
Original-Received: from quimby.gnus.org ([80.91.224.244])
	by main.gmane.org with esmtp (Exim 3.35 #1 (Debian))
	id 18q2XM-0005L7-00
	for <emacs-devel@main.gmane.org>; Tue, 04 Mar 2003 03:52:12 +0100
Original-Received: from monty-python.gnu.org ([199.232.76.173])
	by quimby.gnus.org with esmtp (Exim 3.12 #1 (Debian))
	id 18q2qU-0003MO-00
	for <emacs-devel@quimby.gnus.org>; Tue, 04 Mar 2003 04:11:58 +0100
Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org)
	by monty-python.gnu.org with esmtp (Exim 4.10.13)
	id 18q2Wo-00052x-02
	for emacs-devel@quimby.gnus.org; Mon, 03 Mar 2003 21:51:38 -0500
Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10.13)
	id 18q2WA-0004jM-00
	for emacs-devel@gnu.org; Mon, 03 Mar 2003 21:50:58 -0500
Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10.13)
	id 18q2Vz-0004DB-00
	for emacs-devel@gnu.org; Mon, 03 Mar 2003 21:50:51 -0500
Original-Received: from tyo201.gate.nec.co.jp ([210.143.35.51])
	by monty-python.gnu.org with esmtp (Exim 4.10.13)
	id 18q2Vs-0003kq-00; Mon, 03 Mar 2003 21:50:40 -0500
Original-Received: from mailgate4.nec.co.jp ([10.7.69.195])h242oGw06747;
	Tue, 4 Mar 2003 11:50:16 +0900 (JST)
Original-Received: from mailsv.nec.co.jp (mailgate51.nec.co.jp [10.7.69.196]) by
	mailgate4.nec.co.jp (8.11.6/3.7W-MAILGATE-NEC) with ESMTP
	id h242oGN13425; Tue, 4 Mar 2003 11:50:16 +0900 (JST)
Original-Received: from mcsss2.ucom.lsi.nec.co.jp ([10.30.114.133]) by mailsv.nec.co.jp
	(8.11.6/3.7W-MAILSV-NEC) with ESMTP
	id h242mwq01043; Tue, 4 Mar 2003 11:50:14 +0900 (JST)
Original-Received: from mcspd15.ucom.lsi.nec.co.jp (mcspd15 [10.30.114.174])
	id h242mwB09323;	Tue, 4 Mar 2003 11:48:58 +0900 (JST)
Original-Received: by mcspd15.ucom.lsi.nec.co.jp (Postfix, from userid 31295)
	id EC8D3370F; Tue,  4 Mar 2003 11:48:57 +0900 (JST)
Original-To: rms@gnu.org
System-Type: i686-pc-linux-gnu
Blat: Foop
In-Reply-To: <E18pv9d-0004nD-00@fencepost.gnu.org>
Original-Lines: 65
Original-cc: d.love@dl.ac.uk
Original-cc: sds@gnu.org
Original-cc: emacs-devel@gnu.org
Original-cc: monnier+gnu/emacs@rum.cs.yale.edu
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1b5
Precedence: list
List-Id: Emacs development discussions. <emacs-devel.gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Post: <mailto:emacs-devel@gnu.org>
List-Subscribe: <http://mail.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
List-Archive: <http://mail.gnu.org/pipermail/emacs-devel>
List-Unsubscribe: <http://mail.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org
Xref: main.gmane.org gmane.emacs.devel:12088
X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:12088

Richard Stallman <rms@gnu.org> writes:
>     a buffer/string's should have an associated `unibyte encoding'
>     attribute, which would allow it to be encoded using the
>     straightforward and efficient `unibyte representation' but appear
>     to lisp/whoweve as being a multibyte buffer/string (all of who's
>     characters happen to have the same charset).
> 
> This is more or less what a unibyte buffer is now, except that there
> is only one possibility for which character sets can be stored in it:
> it holds the character codes from 0 to 0377.

Yeah, but I'm saying that emacs should be able to use this efficient
representation for other character sets as well -- I think it's far more
common to have buffers storing non-raw 8-bit characters than raw
characters, so why is the uncommon case optimized?

> If we wanted to hide from the user the distinction between unibyte and
> multibyte buffers, we would have to change the buffer's representation
> automatically when inserting characters that don't fit unibyte.  That
> seems like a bad idea.

Well I agree that it would be annoying if your 10-megabyte raw-bytes buffer
suddenly got converted because you accidentally inserted a chinese
character. :-)

However I think that in many cases such a conversion would be OK, and
since 99% of the time, people _don't_ mix character sets, it would
probably be a win on average.

Maybe there could be a buffer-local variable that `locks' the buffer's
character set, and would cause an error to be signalled if some code
attempts to insert non-compatible text (instead of converting the
buffer)?  This might better catch errors in coding than current
`just insert the raw-codes' unibyte buffers (if you _really_ want to
insert the raw-codes, you can of course do so explicitly.

> The advantage of unibyte mode for some European Latin-N users is that
> they don't have to deal with encoding and decoding, so they never have
> to specify a coding system.  It is possible that today we could get
> the same results using multibyte buffers and forcing use of a specific
> Latin-N coding system.  People could try experimenting with this and
> seeing if it provides results that are just like what European users
> now get with unibyte mode.

Perhaps the same advantages could be had, without making a special case,
by having a `uninterpreted' character set, which would effectively be
treated by the display code as `just send whatever code raw to the terminal.'

> As for the idea that efficiency should never be a factor in deciding
> what to do here, I am skeptical of that.

I'm not saying that efficiency isn't an issue, I'm saying that lisp
programmers shouldn't have to worry about it as much.  They should be
able to just use `normal' coding methods (which currently means
multibyte by default), and expect that emacs would optimize this in
certain common cases; currently if lisp programmer wants extra
efficiency, he's got to use special and more dangerous operations.

I realize that what I'm suggesting is a bit much, at least for the near
future, but I also think the current design is somewhat broken, and
makes it too easy for programmers to do the wrong thing.

-Miles
-- 
Ich bin ein Virus. Mach' mit und kopiere mich in Deine .signature.