From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: eight-bit char handling in emacs-unicode Date: Mon, 1 Dec 2003 09:43:23 +0900 (JST) Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Message-ID: <200312010043.JAA04933@etlken.m17n.org> References: <200311250107.KAA24646@etlken.m17n.org> <200311260007.JAA26617@etlken.m17n.org> <200311270134.KAA28664@etlken.m17n.org> NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII X-Trace: sea.gmane.org 1070239507 17606 80.91.224.253 (1 Dec 2003 00:45:07 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Mon, 1 Dec 2003 00:45:07 +0000 (UTC) Cc: jas@extundo.com, emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Mon Dec 01 01:45:04 2003 Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1AQcBU-00054R-00 for ; Mon, 01 Dec 2003 01:45:04 +0100 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.35 #1 (Debian)) id 1AQcBT-0005jx-00 for ; Mon, 01 Dec 2003 01:45:04 +0100 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.24) id 1AQd8j-0004Dp-LY for emacs-devel@quimby.gnus.org; Sun, 30 Nov 2003 20:46:17 -0500 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.24) id 1AQd8I-0004Bq-AB for emacs-devel@gnu.org; Sun, 30 Nov 2003 20:45:50 -0500 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.24) id 1AQd7i-0003Pi-B5 for emacs-devel@gnu.org; Sun, 30 Nov 2003 20:45:45 -0500 Original-Received: from [192.47.44.130] (helo=tsukuba.m17n.org) by monty-python.gnu.org with esmtp (Exim 4.24) id 1AQd7c-000381-ES for emacs-devel@gnu.org; Sun, 30 Nov 2003 20:45:08 -0500 Original-Received: from fs.m17n.org (fs.m17n.org [192.47.44.2]) by tsukuba.m17n.org (8.11.6p2/3.7W-20010518204228) with ESMTP id hB10hPh18441; Mon, 1 Dec 2003 09:43:25 +0900 (JST) (envelope-from handa@m17n.org) Original-Received: from etlken.m17n.org (etlken.m17n.org [192.47.44.125]) by fs.m17n.org (8.11.6/3.7W-20010823150639) with ESMTP id hB10hNs04795; Mon, 1 Dec 2003 09:43:24 +0900 (JST) Original-Received: (from handa@localhost) by etlken.m17n.org (8.8.8+Sun/3.7W-2001040620) id JAA04933; Mon, 1 Dec 2003 09:43:23 +0900 (JST) Original-To: monnier@IRO.UMontreal.CA In-reply-to: (message from Stefan Monnier on 27 Nov 2003 09:23:00 -0500) User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.3 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.2 Precedence: list List-Id: Emacs development discussions. List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:18234 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:18234 In article , Stefan Monnier writes: >>> I can't answer this question without knowing the answer to my question: >>> what is string-make-unibyte used for. >> It is used for converting a multibyte string to unibyte >> before it is inserted in a unibyte buffer. > I meant `what is "converting from multibyte to unibyte" used for'. > I.e. it can be used for different things in different contexts and I can't > answer in general, so I need a concrete case. It is used for not loosing information about text even if you kill a text in a multibyte buffer and paste it in a unibyte buffer. When you kill the just pasted text of a unibyte buffer and paste it in the original multibyte buffer, you recover the same character sequence. Anyway, I already showed you this example: In Latin-2 environment but the default encoding is CTEXT. In that case also, inserting multibyte latin-2 string in unibyte buffer works the same way as in this case: In Latin-2 environment and the default environment is iso-latin-2. And, that's because the functionality of string-make-unibyte doesn't have to know about coding system. All it has to know is which character set to use. If you can't answer in general, please answer to this concrete question. In Latin-2 environment where one's primary character set is latin-iso8859-2 but the default encoding is CTEXT, how to make insertion of a multibyte string (containing only latin-iso8859-2 characters) in a unibyte buffer work with your method? Such an insertion may happen when a user kill a text in a multibyte buffer and yank it in a unibyte buffer. >> It's an ambiguous statement. Which are you sauing? >> Replace string-make-unibyte by: >> (1) encode-coding-string or make-string-unibyte. >> (2) a code that applies encode-coding-string or >> make-string-unibyte to the whole string depending on >> something (perhaps on the input string?). >> (3) a code that applies encode-coding-string to substrings >> where that is appropriate, and applies make-string-unibyte >> to the remaing substrings. >> (4) something that I still don't understand. > I'm saying that each *call* to string-make-unibyte can be replaced > by a call to either encode-coding-string or make-string-unibyte. > But the decision of which to use and which coding-system to use > depends on the context. Are you talking about the actual Emacs Lisp codes that explicitely call make-string-unibyte? I've been talking about the functionality of make-string-unibyte itself, especially about the implicit call to the C function copy_text that does the same thing as make-string-unibyte. Is that the reason why it seems that we are talking at corss purposes. > Now why would we want to do the work of changing all those calls? > Because all those that would use encode-coding-string are incorrect > in using string-make-unibyte because they won't do the right thing > in some language environments. What is the right thing to do when a multibyte Japanese text is being pasted into a unibyte buffer? I think signalling an error is the only right thing, and I've never objected to make copy_text and Fstring_make_unibyte signal an error in such a case. --- Ken'ichi HANDA handa@m17n.org