From mboxrd@z Thu Jan  1 00:00:00 1970
Path: main.gmane.org!not-for-mail
From: Kenichi Handa <handa@m17n.org>
Newsgroups: gmane.emacs.devel
Subject: Re: eight-bit char handling in emacs-unicode
Date: Sun, 23 Nov 2003 16:30:49 +0900 (JST)
Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org
Message-ID: <200311230730.QAA21903@etlken.m17n.org>
References: <ilubrrha7oc.fsf@latte.josefsson.org>	<200311130153.KAA04615@etlken.m17n.org>	<ilur80c50uj.fsf@latte.josefsson.org>	<200311130610.PAA04983@etlken.m17n.org>	<iluekwcwyl8.fsf@latte.josefsson.org>	<200311130901.SAA05204@etlken.m17n.org>	<ilun0b08by1.fsf@latte.josefsson.org>	<200311140047.JAA06414@etlken.m17n.org>	<jwvhe12emr3.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>	<200311180733.QAA13703@etlken.m17n.org>	<jwvn0atd38w.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>	<200311190006.JAA14847@etlken.m17n.org>	<jwvptfp139w.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>	<200311210041.JAA18324@etlken.m17n.org>	<jwvzneqwbo3.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>	<200311210627.PAA18757@etlken.m17n.org>	<jwvvfpdsrab.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>	<200311220125.KAA20128@etlken.m17n.org>
	<jwvoev4ufqd.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>
NNTP-Posting-Host: deer.gmane.org
Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya")
Content-Type: text/plain; charset=US-ASCII
X-Trace: sea.gmane.org 1069573204 4723 80.91.224.253 (23 Nov 2003 07:40:04 GMT)
X-Complaints-To: usenet@sea.gmane.org
NNTP-Posting-Date: Sun, 23 Nov 2003 07:40:04 +0000 (UTC)
Cc: jas@extundo.com, emacs-devel@gnu.org
Original-X-From: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Sun Nov 23 08:39:59 2003
Return-path: <emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org>
Original-Received: from quimby.gnus.org ([80.91.224.244])
	by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian))
	id 1ANoqd-0003HZ-00
	for <emacs-devel@deer.gmane.org>; Sun, 23 Nov 2003 08:39:59 +0100
Original-Received: from monty-python.gnu.org ([199.232.76.173])
	by quimby.gnus.org with esmtp (Exim 3.35 #1 (Debian))
	id 1ANoqd-0003wA-00
	for <emacs-devel@quimby.gnus.org>; Sun, 23 Nov 2003 08:39:59 +0100
Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org)
	by monty-python.gnu.org with esmtp (Exim 4.24)
	id 1ANpg9-0007nE-Cu
	for emacs-devel@quimby.gnus.org; Sun, 23 Nov 2003 03:33:13 -0500
Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.24)
	id 1ANpg3-0007n6-N8
	for emacs-devel@gnu.org; Sun, 23 Nov 2003 03:33:07 -0500
Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.24)
	id 1ANpfX-0007hZ-A8
	for emacs-devel@gnu.org; Sun, 23 Nov 2003 03:33:06 -0500
Original-Received: from [192.47.44.130] (helo=tsukuba.m17n.org)
	by monty-python.gnu.org with esmtp (Exim 4.24) id 1ANpfW-0007Zc-CA
	for emacs-devel@gnu.org; Sun, 23 Nov 2003 03:32:34 -0500
Original-Received: from fs.m17n.org (fs.m17n.org [192.47.44.2])
	by tsukuba.m17n.org (8.11.6p2/3.7W-20010518204228) with ESMTP id
	hAN7Ush15533; Sun, 23 Nov 2003 16:30:54 +0900 (JST)
	(envelope-from handa@m17n.org)
Original-Received: from etlken.m17n.org (etlken.m17n.org [192.47.44.125])
	by fs.m17n.org (8.11.6/3.7W-20010823150639) with ESMTP id hAN7Uos28703; 
	Sun, 23 Nov 2003 16:30:50 +0900 (JST)
Original-Received: (from handa@localhost)
	by etlken.m17n.org (8.8.8+Sun/3.7W-2001040620) id QAA21903;
	Sun, 23 Nov 2003 16:30:49 +0900 (JST)
Original-To: monnier@IRO.UMontreal.CA
In-reply-to: <jwvoev4ufqd.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>
	(message from Stefan Monnier on 22 Nov 2003 18:53:05 -0500)
User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2
	Emacs/21.3 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI)
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.2
Precedence: list
List-Id: Emacs development discussions.  <emacs-devel.gnu.org>
List-Unsubscribe: <http://mail.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://mail.gnu.org/pipermail/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <http://mail.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org
Xref: main.gmane.org gmane.emacs.devel:18047
X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:18047

In article <jwvoev4ufqd.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>, Stefan Monnier <monnier@IRO.UMontreal.CA> writes:

>>>>  It is perfectly possible to live in such an environment
>>>>  where only the charset iso-8859-1 is used but only the
>>>>  coding system utf-8 is used.  In this environment, the
>>>>  results of encode-coding-string and string-make-unibyte are
>>>>  of course not the same, but still both operations are
>>>>  meaningful.

>>>  I see that encode-coding-string does the utf-8 encoding, but what
>>>  does string-make-unibyte do in such a case and what is it used for ?

>>  It gets iso-8859-1 code-points of all characters in a
>>  multibyte string and concatenate them (the same as what is
>>  does in latin-1 lang. env.).

> You mean it does the same as (encode-coding-string str 'latin-1) ?

Not exactly the same when STR contains, for instance,
Cyrillic characters.  How to deal with unsupported
characters differs in operations.  Encode-coding-string may
behave leniently so that the result can be decoded back
correctly (perhaps by adding some escape sequence).  But,
string-make-unibyte should never change the number of
charaters.  And,

> Then why use string-make-unibyte ?

There's no way to know that we should use the coding-system
latin-1 in this situation.  All we know is that the default
coding-system is utf-8, and the default character set is
iso-8859-1.

>>  Please try C-x C-m L utf-8 RET and see how
>>  string-make-unibyte and string-make-multibyte work.

> I'll try that, but I'd like to understand the motivation for making it work
> the way it works.  I've always understood those two as "trying to DTRT" in
> a very ad-hoc way such that people that used to work in an 8bit non-ASCII
> environment don't need to worry about coding-systems and still have
> things working mostly correctly.

Doing unibyte<->multibyte conversion automatically
may be an ad-hoc way.  The way how they work for unsupported
characters may also be an ad-hoc way.

But, the concept of unibyte<->multibyte convesion itself is
not ad-hoc.  Don't you think their meaning is very clear
when you grasp them as my way?  Do you see any inconsistency
in my explanation about them?

---
Ken'ichi HANDA
handa@m17n.org