From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: setenv -> locale-coding-system cannot handle ASCII?! Date: Wed, 26 Feb 2003 16:49:15 +0900 (JST) Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Message-ID: <200302260749.QAA29494@etlken.m17n.org> References: <200302250634.PAA27478@etlken.m17n.org> <200302260058.JAA28973@etlken.m17n.org> <200302260211.h1Q2BJl08373@rum.cs.yale.edu> <200302260234.LAA29082@etlken.m17n.org> <200302260252.h1Q2qIK08490@rum.cs.yale.edu> <200302260532.OAA29294@etlken.m17n.org> <200302260550.h1Q5oSc08967@rum.cs.yale.edu> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII X-Trace: main.gmane.org 1046245782 24742 80.91.224.249 (26 Feb 2003 07:49:42 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Wed, 26 Feb 2003 07:49:42 +0000 (UTC) Cc: miles@gnu.org Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 18nwJw-0006Qv-00 for ; Wed, 26 Feb 2003 08:49:40 +0100 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.12 #1 (Debian)) id 18nwaG-0004BQ-00 for ; Wed, 26 Feb 2003 09:06:32 +0100 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 18nwK0-0001l1-01 for emacs-devel@quimby.gnus.org; Wed, 26 Feb 2003 02:49:44 -0500 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10.13) id 18nwJh-0001kN-00 for emacs-devel@gnu.org; Wed, 26 Feb 2003 02:49:25 -0500 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10.13) id 18nwJf-0001jr-00 for emacs-devel@gnu.org; Wed, 26 Feb 2003 02:49:24 -0500 Original-Received: from tsukuba.m17n.org ([192.47.44.130]) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 18nwJd-0001iz-00; Wed, 26 Feb 2003 02:49:22 -0500 Original-Received: from fs.m17n.org (fs.m17n.org [192.47.44.2])h1Q7nGk16247; Wed, 26 Feb 2003 16:49:16 +0900 (JST) (envelope-from handa@m17n.org) Original-Received: from etlken.m17n.org (etlken.m17n.org [192.47.44.125]) h1Q7nGR07664; Wed, 26 Feb 2003 16:49:16 +0900 (JST) Original-Received: (from handa@localhost) by etlken.m17n.org (8.8.8+Sun/3.7W-2001040620) id QAA29494; Wed, 26 Feb 2003 16:49:15 +0900 (JST) Original-To: monnier+gnu/emacs@rum.cs.yale.edu In-reply-to: <200302260550.h1Q5oSc08967@rum.cs.yale.edu> (monnier+gnu/emacs@rum.cs.yale.edu) User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.2.92 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI) Original-cc: d.love@dl.ac.uk Original-cc: sds@gnu.org Original-cc: emacs-devel@gnu.org X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1b5 Precedence: list List-Id: Emacs development discussions. List-Help: List-Post: List-Subscribe: , List-Archive: List-Unsubscribe: , Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:11963 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:11963 In article <200302260550.h1Q5oSc08967@rum.cs.yale.edu>, "Stefan Monnier" writes: >> Why is it not needed? Strings and buffers are not that >> different, both are containers of characters. > They are used differently. Operations on strings generally apply to the > whole string: you can only encode/decode a whole string at a time. That's because of the limitation of the current implementation, not because of the nature of strings. There's no reason for keeping that limitation. Actually, as we have changed the type Lisp_String in 21.1, it's not difficult to make strings change length. >> If we get a unibyte string from a unibyte buffer by buffer-substring, >> how should we treat that string? > Like any other unibyte string: as a sequence of raw bytes. > If you want to treat it as a sequence of characters, then > you need to pass it through `string-as-multibyte'. If we regard that limitation as a nature of strings, your idea is worth considering. It seems that we can at least construct a consistent explanation about its behaviour based on your idea too. ------------------------------------------------------------ What a character in a unibyte buffer represents depends on a context. It may be a character represented by a single byte, or a raw byte not yet decoded, or a byte constituing a multibyte form of the different character. On the other hand, a character in a unibyte string always represents a raw byte. Emacs coerces it into a character represented by that single byte when a unibyte string is concatenated with a multibyte string, or it is inserted in a multibyte buffer. ------------------------------------------------------------ But, I'm not sure such a change is really necessary. Are you sure that the change doesn't break the current usage of unibyte strings? >> The latter yields multibyte, but I think it'a bug. I found >> that "(format "%s" 1)" is implemented by using >> prin1-to-string, and prin1-to-string prints an object to a >> temporary buffer and gets that buffer string. So, in a >> multibyte sesstion "(format "%s" 1)" yields a multibyte >> string. :-( > I know: I bumped into it yesterday while playing around with tar-mode. > How about the attached patch ? Please see the comments below. >> So, do you mean that you want this? >> >> If a unibyte buffer has \201\300 in the region FROM and TO, >> >> (encode-coding-string (buffer-substring FROM TO) 'iso-latin-1) >> => "\201\300" >> >> (encode-coding-region FROM TO 'iso-latin-1) changes the >> region to \300. > Yes, I guess I'd be happy with it. >> Isn't it more confusing? > Not to me. What do the other people think about it? > PS: I wish there was a way to swap two buffers's content so that > tar-mode could swap the (potentially very large) data to > a helper buffer (without needing to copy this large data) > and then use multibyte for the display and unibyte for > the helper buffer. I don't understand what you mean, especially the usage of the helper buffer. I think tar-mode should use multiple buffers, one unibyte buffer for tar-file itself, one multibyte buffer for table of contents, and the other multibyte buffers (created on demand) for viewing/editing files contained in the tar-file. Then, tar mode works almost the same way as dired. We can see multibyte files in the different buffers. We can use the same method in arc-mode and also in RMAIL. Is that different from what you mean? --- Ken'ichi HANDA handa@m17n.org