From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: eight-bit char handling in emacs-unicode Date: Sun, 23 Nov 2003 16:30:49 +0900 (JST) Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Message-ID: <200311230730.QAA21903@etlken.m17n.org> References: <200311130153.KAA04615@etlken.m17n.org> <200311130610.PAA04983@etlken.m17n.org> <200311130901.SAA05204@etlken.m17n.org> <200311140047.JAA06414@etlken.m17n.org> <200311180733.QAA13703@etlken.m17n.org> <200311190006.JAA14847@etlken.m17n.org> <200311210041.JAA18324@etlken.m17n.org> <200311210627.PAA18757@etlken.m17n.org> <200311220125.KAA20128@etlken.m17n.org> NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII X-Trace: sea.gmane.org 1069573204 4723 80.91.224.253 (23 Nov 2003 07:40:04 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Sun, 23 Nov 2003 07:40:04 +0000 (UTC) Cc: jas@extundo.com, emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Sun Nov 23 08:39:59 2003 Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1ANoqd-0003HZ-00 for ; Sun, 23 Nov 2003 08:39:59 +0100 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.35 #1 (Debian)) id 1ANoqd-0003wA-00 for ; Sun, 23 Nov 2003 08:39:59 +0100 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.24) id 1ANpg9-0007nE-Cu for emacs-devel@quimby.gnus.org; Sun, 23 Nov 2003 03:33:13 -0500 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.24) id 1ANpg3-0007n6-N8 for emacs-devel@gnu.org; Sun, 23 Nov 2003 03:33:07 -0500 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.24) id 1ANpfX-0007hZ-A8 for emacs-devel@gnu.org; Sun, 23 Nov 2003 03:33:06 -0500 Original-Received: from [192.47.44.130] (helo=tsukuba.m17n.org) by monty-python.gnu.org with esmtp (Exim 4.24) id 1ANpfW-0007Zc-CA for emacs-devel@gnu.org; Sun, 23 Nov 2003 03:32:34 -0500 Original-Received: from fs.m17n.org (fs.m17n.org [192.47.44.2]) by tsukuba.m17n.org (8.11.6p2/3.7W-20010518204228) with ESMTP id hAN7Ush15533; Sun, 23 Nov 2003 16:30:54 +0900 (JST) (envelope-from handa@m17n.org) Original-Received: from etlken.m17n.org (etlken.m17n.org [192.47.44.125]) by fs.m17n.org (8.11.6/3.7W-20010823150639) with ESMTP id hAN7Uos28703; Sun, 23 Nov 2003 16:30:50 +0900 (JST) Original-Received: (from handa@localhost) by etlken.m17n.org (8.8.8+Sun/3.7W-2001040620) id QAA21903; Sun, 23 Nov 2003 16:30:49 +0900 (JST) Original-To: monnier@IRO.UMontreal.CA In-reply-to: (message from Stefan Monnier on 22 Nov 2003 18:53:05 -0500) User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.3 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.2 Precedence: list List-Id: Emacs development discussions. List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:18047 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:18047 In article , Stefan Monnier writes: >>>> It is perfectly possible to live in such an environment >>>> where only the charset iso-8859-1 is used but only the >>>> coding system utf-8 is used. In this environment, the >>>> results of encode-coding-string and string-make-unibyte are >>>> of course not the same, but still both operations are >>>> meaningful. >>> I see that encode-coding-string does the utf-8 encoding, but what >>> does string-make-unibyte do in such a case and what is it used for ? >> It gets iso-8859-1 code-points of all characters in a >> multibyte string and concatenate them (the same as what is >> does in latin-1 lang. env.). > You mean it does the same as (encode-coding-string str 'latin-1) ? Not exactly the same when STR contains, for instance, Cyrillic characters. How to deal with unsupported characters differs in operations. Encode-coding-string may behave leniently so that the result can be decoded back correctly (perhaps by adding some escape sequence). But, string-make-unibyte should never change the number of charaters. And, > Then why use string-make-unibyte ? There's no way to know that we should use the coding-system latin-1 in this situation. All we know is that the default coding-system is utf-8, and the default character set is iso-8859-1. >> Please try C-x C-m L utf-8 RET and see how >> string-make-unibyte and string-make-multibyte work. > I'll try that, but I'd like to understand the motivation for making it work > the way it works. I've always understood those two as "trying to DTRT" in > a very ad-hoc way such that people that used to work in an 8bit non-ASCII > environment don't need to worry about coding-systems and still have > things working mostly correctly. Doing unibyte<->multibyte conversion automatically may be an ad-hoc way. The way how they work for unsupported characters may also be an ad-hoc way. But, the concept of unibyte<->multibyte convesion itself is not ad-hoc. Don't you think their meaning is very clear when you grasp them as my way? Do you see any inconsistency in my explanation about them? --- Ken'ichi HANDA handa@m17n.org