From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: eight-bit char handling in emacs-unicode Date: Tue, 2 Dec 2003 22:07:43 +0900 (JST) Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Message-ID: <200312021307.WAA01163@etlken.m17n.org> References: <200311250107.KAA24646@etlken.m17n.org> <200311260007.JAA26617@etlken.m17n.org> <200311270134.KAA28664@etlken.m17n.org> <200312010043.JAA04933@etlken.m17n.org> NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII X-Trace: sea.gmane.org 1070371839 7469 80.91.224.253 (2 Dec 2003 13:30:39 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Tue, 2 Dec 2003 13:30:39 +0000 (UTC) Cc: jas@extundo.com, emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Tue Dec 02 14:30:36 2003 Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1ARAbs-0000DB-00 for ; Tue, 02 Dec 2003 14:30:36 +0100 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.35 #1 (Debian)) id 1ARAbr-0005iy-00 for ; Tue, 02 Dec 2003 14:30:36 +0100 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.24) id 1ARBOF-00088x-T7 for emacs-devel@quimby.gnus.org; Tue, 02 Dec 2003 09:20:35 -0500 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.24) id 1ARBGV-00059f-Il for emacs-devel@gnu.org; Tue, 02 Dec 2003 09:12:35 -0500 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.24) id 1ARBFx-0004wK-Hl for emacs-devel@gnu.org; Tue, 02 Dec 2003 09:12:33 -0500 Original-Received: from [192.47.44.130] (helo=tsukuba.m17n.org) by monty-python.gnu.org with esmtp (Exim 4.24) id 1ARBDV-0003xt-0z for emacs-devel@gnu.org; Tue, 02 Dec 2003 09:09:29 -0500 Original-Received: from fs.m17n.org (fs.m17n.org [192.47.44.2]) by tsukuba.m17n.org (8.11.6p2/3.7W-20010518204228) with ESMTP id hB2D7jh20239; Tue, 2 Dec 2003 22:07:45 +0900 (JST) (envelope-from handa@m17n.org) Original-Received: from etlken.m17n.org (etlken.m17n.org [192.47.44.125]) by fs.m17n.org (8.11.6/3.7W-20010823150639) with ESMTP id hB2D7is24706; Tue, 2 Dec 2003 22:07:44 +0900 (JST) Original-Received: (from handa@localhost) by etlken.m17n.org (8.8.8+Sun/3.7W-2001040620) id WAA01163; Tue, 2 Dec 2003 22:07:43 +0900 (JST) Original-To: monnier@IRO.UMontreal.CA In-reply-to: (message from Stefan Monnier on 01 Dec 2003 11:15:47 -0500) User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.3 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.2 Precedence: list List-Id: Emacs development discussions. List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:18280 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:18280 In article , Stefan Monnier writes: >> It is used for not loosing information about text even if >> you kill a text in a multibyte buffer and paste it in a >> unibyte buffer. > That's the kind of concrete case I needed, thank you. I'm very glad that now we can start to argue on the same wavelength. > Now I'll have to go back and reread the thread to understand things > better. Please. > Are there other cases like that ? For instance, on searching a multibyte string in a unibyte buffer. But, if we are searching for a regular expression that contains a character range (e.g. [a-z]), the current way of simple multibyte->unibyte conversion doesn't work in many cases. I fixed it in the unicode branch. > Also, should we really allow such a thing ? I myself tend to agree with dropping such a way of unibyte support, but that should be decided by Richard. > I mean, it's a dangerous operation since it only works if the user > is lucky enough to use just the right subset of > characters. But, we can expect such a luck in many situations where people mostly uses only characters belonging to their primary charset. > So we should at least signal an error if the conversion is > unsafe (in that make-string-multibyte will not recover the > original string). Shall we test it with HEAD to check how often such an error occurs? > BTW, in which kind of circumstances is the user presented with both > a multibyte buffer and a unibyte buffer ? Even if one starts Emacs with --unibyte, emacs sometimes make a multibyte buffer (e.g. C-h h). And, even if one starts Emacs with --multibyte, he may have a file that contains, for instance, latin-1 characters and raw-byte data, and he may want to read such a file with the coding system raw-text (then C-x = always shows \000..\377). >> Are you talking about the actual Emacs Lisp codes that >> explicitely call make-string-unibyte? I've been talking >> about the functionality of make-string-unibyte itself, >> especially about the implicit call to the C function >> copy_text that does the same thing as make-string-unibyte. >> Is that the reason why it seems that we are talking at corss >> purposes. > I'm talking about both. > I agree on the signalling, of course, I just want to push it further > and signal even when pasting latin-2 multibyte text into a unibyte buffer. > After all, why should Slovak users be able to do that but Japanese users > not ? In my view, everytime we use this kind of thing, we're taking > a temporary shortcut that is "good enough for 8bit users" but not for the > rest of the world. The fact that something doesn't work for double-byte charset users can't be a reason strong enough for dropping it for single-byte charset users. > AFAIK, unibyte buffers should only be used internally and never presented > to the user. This is because unibyte buffers contain bytes (in my view) > whereas the user wants to see characters. I agree that is a very clean view, and I myself expressed the same thing several times. But, it seems that Richard doesn't want to drop the current way of unibyte support. --- Ken'ichi HANDA handa@m17n.org