From mboxrd@z Thu Jan  1 00:00:00 1970
Path: main.gmane.org!not-for-mail
From: Stefan Monnier <monnier@IRO.UMontreal.CA>
Newsgroups: gmane.emacs.devel
Subject: Re: eight-bit char handling in emacs-unicode
Date: 26 Nov 2003 09:14:03 -0500
Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org
Message-ID: <jwvhe0rp6ml.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>
References: <200311250107.KAA24646@etlken.m17n.org>
	<jwvfzgcsbuv.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>
	<200311260007.JAA26617@etlken.m17n.org>
NNTP-Posting-Host: deer.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: sea.gmane.org 1069858573 10696 80.91.224.253 (26 Nov 2003 14:56:13 GMT)
X-Complaints-To: usenet@sea.gmane.org
NNTP-Posting-Date: Wed, 26 Nov 2003 14:56:13 +0000 (UTC)
Cc: jas@extundo.com, emacs-devel@gnu.org
Original-X-From: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Wed Nov 26 15:56:09 2003
Return-path: <emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org>
Original-Received: from quimby.gnus.org ([80.91.224.244])
	by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian))
	id 1AP15N-0006PL-00
	for <emacs-devel@deer.gmane.org>; Wed, 26 Nov 2003 15:56:09 +0100
Original-Received: from monty-python.gnu.org ([199.232.76.173])
	by quimby.gnus.org with esmtp (Exim 3.35 #1 (Debian))
	id 1AP15M-0000wM-00
	for <emacs-devel@quimby.gnus.org>; Wed, 26 Nov 2003 15:56:08 +0100
Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org)
	by monty-python.gnu.org with esmtp (Exim 4.24)
	id 1AP1zO-0000Fc-6V
	for emacs-devel@quimby.gnus.org; Wed, 26 Nov 2003 10:54:02 -0500
Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.24)
	id 1AP1yA-0008U2-2r
	for emacs-devel@gnu.org; Wed, 26 Nov 2003 10:52:46 -0500
Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.24)
	id 1AP1xZ-0008IZ-9J
	for emacs-devel@gnu.org; Wed, 26 Nov 2003 10:52:40 -0500
Original-Received: from [199.232.41.8] (helo=mx20.gnu.org)
	by monty-python.gnu.org with esmtp (TLSv1:DES-CBC3-SHA:168)
	(Exim 4.24) id 1AP1sP-0006w8-Ru
	for emacs-devel@gnu.org; Wed, 26 Nov 2003 10:46:49 -0500
Original-Received: from [132.204.24.67] (helo=mercure.iro.umontreal.ca)
	by mx20.gnu.org with esmtp (Exim 4.24) id 1AP0Qs-0002ua-Qu
	for emacs-devel@gnu.org; Wed, 26 Nov 2003 09:14:18 -0500
Original-Received: from vor.iro.umontreal.ca (vor.iro.umontreal.ca [132.204.24.42])
	by mercure.iro.umontreal.ca (8.12.9/8.12.9) with ESMTP id
	hAQEE3bj006812; Wed, 26 Nov 2003 09:14:04 -0500
Original-Received: by vor.iro.umontreal.ca (Postfix, from userid 20848)
	id 451053C63E; Wed, 26 Nov 2003 09:14:03 -0500 (EST)
Original-To: Kenichi Handa <handa@m17n.org>
In-Reply-To: <200311260007.JAA26617@etlken.m17n.org>
Original-Lines: 58
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3.50
X-DIRO-MailScanner: Found to be clean
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.2
Precedence: list
List-Id: Emacs development discussions.  <emacs-devel.gnu.org>
List-Unsubscribe: <http://mail.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://mail.gnu.org/pipermail/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <http://mail.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org
Xref: main.gmane.org gmane.emacs.devel:18142
X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:18142

>> but since the output is a unibyte string,
>> that restrict it to cases where the code-points can be encoded in 8 bits,
>> thus it doesn't sound very generic
> Yes.  But I thought generic or not is not a point here.

Except that if it's not generic (in the sense that it does not behave
meaningfully in all language environments), then it can't be used in generic
elisp code, right?

>> and I don't see any application for it
>> (nor do I see any practical difference with using encode-coding-string
>> since the output AFAIK would be the same).

> My examples shows that we can't use encode-coding-string.
> How can we use encode-coding-string without knowing what
> coding system to use?  I haven't heard your answer yet.

I can't answer this question without knowing the answer to my question:
what is string-make-unibyte used for.  I'm not saying that we can do
something like:

  (defun string-make-unibyte (s) (encode-coding-string s <blabla>))

but I'm saying that everywhere where the current string-make-unibyte is
used, we should be able to easily replace it by a call to
encode-coding-string or a code to my make-string-unibyte (which does
not pay attention to the language environment and only accepts multibyte
chars that correspond to bytes, i.e. eight-bit-control or
eight-bit-graphic, or ASCII, and multibyte chars whose internal code point
is 128-255).

> But, my understanding is that
> string-make-unibyte/multibyte are designed not to change the
> number of characters to make the difference of
> unibyte/multibyte transparent in Lisp.

That is indeed an absolute requirement.

>> Of course: that's pretty much what I suggested: make-string-unibyte only
>> accepts multibyte chars that correspond to "bytes".

> I agree with that.  But, it just changes the behaviour of
> the function on error case.  It doesn't change the concept
> of what it does.

Except that I said "byte" not "code point", which makes a difference
in non-latin-1 locales.

>> I don't see any use of string-make-unibyte in your two examples.
> Again, I'd like to ask how to use encode-coding-string
> without knowing the proper coding-system in each case.

How could I know the coding-system to use when replacing
`string-make-unibyte' if I don't have any actual call to
string-make-unibyte to work with ?


        Stefan