From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Miles Bader Newsgroups: gmane.emacs.devel Subject: Re: setenv -> locale-coding-system cannot handle ASCII?! Date: 04 Mar 2003 11:48:57 +0900 Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Message-ID: References: <200302250634.PAA27478@etlken.m17n.org> <200302260058.JAA28973@etlken.m17n.org> <200302260211.h1Q2BJl08373@rum.cs.yale.edu> <200302260234.LAA29082@etlken.m17n.org> <200302260252.h1Q2qIK08490@rum.cs.yale.edu> <200302260532.OAA29294@etlken.m17n.org> <20030227000638.GA5470@gnu.org> Reply-To: Miles Bader NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: main.gmane.org 1046746334 20548 80.91.224.249 (4 Mar 2003 02:52:14 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Tue, 4 Mar 2003 02:52:14 +0000 (UTC) Cc: handa@m17n.org Original-X-From: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Tue Mar 04 03:52:12 2003 Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 18q2XM-0005L7-00 for ; Tue, 04 Mar 2003 03:52:12 +0100 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.12 #1 (Debian)) id 18q2qU-0003MO-00 for ; Tue, 04 Mar 2003 04:11:58 +0100 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 18q2Wo-00052x-02 for emacs-devel@quimby.gnus.org; Mon, 03 Mar 2003 21:51:38 -0500 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10.13) id 18q2WA-0004jM-00 for emacs-devel@gnu.org; Mon, 03 Mar 2003 21:50:58 -0500 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10.13) id 18q2Vz-0004DB-00 for emacs-devel@gnu.org; Mon, 03 Mar 2003 21:50:51 -0500 Original-Received: from tyo201.gate.nec.co.jp ([210.143.35.51]) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 18q2Vs-0003kq-00; Mon, 03 Mar 2003 21:50:40 -0500 Original-Received: from mailgate4.nec.co.jp ([10.7.69.195])h242oGw06747; Tue, 4 Mar 2003 11:50:16 +0900 (JST) Original-Received: from mailsv.nec.co.jp (mailgate51.nec.co.jp [10.7.69.196]) by mailgate4.nec.co.jp (8.11.6/3.7W-MAILGATE-NEC) with ESMTP id h242oGN13425; Tue, 4 Mar 2003 11:50:16 +0900 (JST) Original-Received: from mcsss2.ucom.lsi.nec.co.jp ([10.30.114.133]) by mailsv.nec.co.jp (8.11.6/3.7W-MAILSV-NEC) with ESMTP id h242mwq01043; Tue, 4 Mar 2003 11:50:14 +0900 (JST) Original-Received: from mcspd15.ucom.lsi.nec.co.jp (mcspd15 [10.30.114.174]) id h242mwB09323; Tue, 4 Mar 2003 11:48:58 +0900 (JST) Original-Received: by mcspd15.ucom.lsi.nec.co.jp (Postfix, from userid 31295) id EC8D3370F; Tue, 4 Mar 2003 11:48:57 +0900 (JST) Original-To: rms@gnu.org System-Type: i686-pc-linux-gnu Blat: Foop In-Reply-To: Original-Lines: 65 Original-cc: d.love@dl.ac.uk Original-cc: sds@gnu.org Original-cc: emacs-devel@gnu.org Original-cc: monnier+gnu/emacs@rum.cs.yale.edu X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1b5 Precedence: list List-Id: Emacs development discussions. List-Help: List-Post: List-Subscribe: , List-Archive: List-Unsubscribe: , Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:12088 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:12088 Richard Stallman writes: > a buffer/string's should have an associated `unibyte encoding' > attribute, which would allow it to be encoded using the > straightforward and efficient `unibyte representation' but appear > to lisp/whoweve as being a multibyte buffer/string (all of who's > characters happen to have the same charset). > > This is more or less what a unibyte buffer is now, except that there > is only one possibility for which character sets can be stored in it: > it holds the character codes from 0 to 0377. Yeah, but I'm saying that emacs should be able to use this efficient representation for other character sets as well -- I think it's far more common to have buffers storing non-raw 8-bit characters than raw characters, so why is the uncommon case optimized? > If we wanted to hide from the user the distinction between unibyte and > multibyte buffers, we would have to change the buffer's representation > automatically when inserting characters that don't fit unibyte. That > seems like a bad idea. Well I agree that it would be annoying if your 10-megabyte raw-bytes buffer suddenly got converted because you accidentally inserted a chinese character. :-) However I think that in many cases such a conversion would be OK, and since 99% of the time, people _don't_ mix character sets, it would probably be a win on average. Maybe there could be a buffer-local variable that `locks' the buffer's character set, and would cause an error to be signalled if some code attempts to insert non-compatible text (instead of converting the buffer)? This might better catch errors in coding than current `just insert the raw-codes' unibyte buffers (if you _really_ want to insert the raw-codes, you can of course do so explicitly. > The advantage of unibyte mode for some European Latin-N users is that > they don't have to deal with encoding and decoding, so they never have > to specify a coding system. It is possible that today we could get > the same results using multibyte buffers and forcing use of a specific > Latin-N coding system. People could try experimenting with this and > seeing if it provides results that are just like what European users > now get with unibyte mode. Perhaps the same advantages could be had, without making a special case, by having a `uninterpreted' character set, which would effectively be treated by the display code as `just send whatever code raw to the terminal.' > As for the idea that efficiency should never be a factor in deciding > what to do here, I am skeptical of that. I'm not saying that efficiency isn't an issue, I'm saying that lisp programmers shouldn't have to worry about it as much. They should be able to just use `normal' coding methods (which currently means multibyte by default), and expect that emacs would optimize this in certain common cases; currently if lisp programmer wants extra efficiency, he's got to use special and more dangerous operations. I realize that what I'm suggesting is a bit much, at least for the near future, but I also think the current design is somewhat broken, and makes it too easy for programmers to do the wrong thing. -Miles -- Ich bin ein Virus. Mach' mit und kopiere mich in Deine .signature.