From mboxrd@z Thu Jan  1 00:00:00 1970
Path: main.gmane.org!not-for-mail
From: Kenichi Handa <handa@m17n.org>
Newsgroups: gmane.emacs.devel
Subject: Re: eight-bit char handling in emacs-unicode
Date: Mon, 1 Dec 2003 09:43:23 +0900 (JST)
Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org
Message-ID: <200312010043.JAA04933@etlken.m17n.org>
References: <200311250107.KAA24646@etlken.m17n.org>	<jwvfzgcsbuv.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>	<200311260007.JAA26617@etlken.m17n.org>	<jwvhe0rp6ml.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>	<200311270134.KAA28664@etlken.m17n.org>
	<jwvad6hlwu1.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>
NNTP-Posting-Host: deer.gmane.org
Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya")
Content-Type: text/plain; charset=US-ASCII
X-Trace: sea.gmane.org 1070239507 17606 80.91.224.253 (1 Dec 2003 00:45:07 GMT)
X-Complaints-To: usenet@sea.gmane.org
NNTP-Posting-Date: Mon, 1 Dec 2003 00:45:07 +0000 (UTC)
Cc: jas@extundo.com, emacs-devel@gnu.org
Original-X-From: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Mon Dec 01 01:45:04 2003
Return-path: <emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org>
Original-Received: from quimby.gnus.org ([80.91.224.244])
	by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian))
	id 1AQcBU-00054R-00
	for <emacs-devel@deer.gmane.org>; Mon, 01 Dec 2003 01:45:04 +0100
Original-Received: from monty-python.gnu.org ([199.232.76.173])
	by quimby.gnus.org with esmtp (Exim 3.35 #1 (Debian))
	id 1AQcBT-0005jx-00
	for <emacs-devel@quimby.gnus.org>; Mon, 01 Dec 2003 01:45:04 +0100
Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org)
	by monty-python.gnu.org with esmtp (Exim 4.24)
	id 1AQd8j-0004Dp-LY
	for emacs-devel@quimby.gnus.org; Sun, 30 Nov 2003 20:46:17 -0500
Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.24)
	id 1AQd8I-0004Bq-AB
	for emacs-devel@gnu.org; Sun, 30 Nov 2003 20:45:50 -0500
Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.24)
	id 1AQd7i-0003Pi-B5
	for emacs-devel@gnu.org; Sun, 30 Nov 2003 20:45:45 -0500
Original-Received: from [192.47.44.130] (helo=tsukuba.m17n.org)
	by monty-python.gnu.org with esmtp (Exim 4.24) id 1AQd7c-000381-ES
	for emacs-devel@gnu.org; Sun, 30 Nov 2003 20:45:08 -0500
Original-Received: from fs.m17n.org (fs.m17n.org [192.47.44.2])
	by tsukuba.m17n.org (8.11.6p2/3.7W-20010518204228) with ESMTP id
	hB10hPh18441; Mon, 1 Dec 2003 09:43:25 +0900 (JST)
	(envelope-from handa@m17n.org)
Original-Received: from etlken.m17n.org (etlken.m17n.org [192.47.44.125])
	by fs.m17n.org (8.11.6/3.7W-20010823150639) with ESMTP id hB10hNs04795; 
	Mon, 1 Dec 2003 09:43:24 +0900 (JST)
Original-Received: (from handa@localhost)
	by etlken.m17n.org (8.8.8+Sun/3.7W-2001040620) id JAA04933;
	Mon, 1 Dec 2003 09:43:23 +0900 (JST)
Original-To: monnier@IRO.UMontreal.CA
In-reply-to: <jwvad6hlwu1.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>
	(message from Stefan Monnier on 27 Nov 2003 09:23:00 -0500)
User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2
	Emacs/21.3 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI)
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.2
Precedence: list
List-Id: Emacs development discussions.  <emacs-devel.gnu.org>
List-Unsubscribe: <http://mail.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://mail.gnu.org/pipermail/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <http://mail.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org
Xref: main.gmane.org gmane.emacs.devel:18234
X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:18234

In article <jwvad6hlwu1.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>, Stefan Monnier <monnier@IRO.UMontreal.CA> writes:
>>>  I can't answer this question without knowing the answer to my question:
>>>  what is string-make-unibyte used for.

>>  It is used for converting a multibyte string to unibyte
>>  before it is inserted in a unibyte buffer.

> I meant `what is "converting from multibyte to unibyte" used for'.
> I.e. it can be used for different things in different contexts and I can't
> answer in general, so I need a concrete case.

It is used for not loosing information about text even if
you kill a text in a multibyte buffer and paste it in a
unibyte buffer.  When you kill the just pasted text of a
unibyte buffer and paste it in the original multibyte
buffer, you recover the same character sequence.

Anyway, I already showed you this example:

  In Latin-2 environment but the default encoding is CTEXT.

In that case also, inserting multibyte latin-2 string in
unibyte buffer works the same way as in this case:

  In Latin-2 environment and the default environment is iso-latin-2.

And, that's because the functionality of string-make-unibyte
doesn't have to know about coding system.  All it has to
know is which character set to use.

If you can't answer in general, please answer to this
concrete question.

  In Latin-2 environment where one's primary character set
  is latin-iso8859-2 but the default encoding is CTEXT, how
  to make insertion of a multibyte string (containing only
  latin-iso8859-2 characters) in a unibyte buffer work with
  your method?  Such an insertion may happen when a user
  kill a text in a multibyte buffer and yank it in a unibyte
  buffer.

>>  It's an ambiguous statement.  Which are you sauing?

>>  Replace string-make-unibyte by:
>>  (1) encode-coding-string or make-string-unibyte.

>>  (2) a code that applies encode-coding-string or
>>  make-string-unibyte to the whole string depending on
>>  something (perhaps on the input string?).

>>  (3) a code that applies encode-coding-string to substrings
>>  where that is appropriate, and applies make-string-unibyte
>>  to the remaing substrings.

>>  (4) something that I still don't understand.

> I'm saying that each *call* to string-make-unibyte can be replaced
> by a call to either encode-coding-string or make-string-unibyte.

> But the decision of which to use and which coding-system to use
> depends on the context.

Are you talking about the actual Emacs Lisp codes that
explicitely call make-string-unibyte?  I've been talking
about the functionality of make-string-unibyte itself,
especially about the implicit call to the C function
copy_text that does the same thing as make-string-unibyte.
Is that the reason why it seems that we are talking at corss
purposes.

> Now why would we want to do the work of changing all those calls?
> Because all those that would use encode-coding-string are incorrect
> in using string-make-unibyte because they won't do the right thing
> in some language environments.

What is the right thing to do when a multibyte Japanese text
is being pasted into a unibyte buffer?

I think signalling an error is the only right thing, and
I've never objected to make copy_text and
Fstring_make_unibyte signal an error in such a case.

---
Ken'ichi HANDA
handa@m17n.org