From mboxrd@z Thu Jan  1 00:00:00 1970
Path: main.gmane.org!not-for-mail
From: Stefan Monnier <monnier@IRO.UMontreal.CA>
Newsgroups: gmane.emacs.devel
Subject: Re: eight-bit char handling in emacs-unicode
Date: 02 Dec 2003 11:06:30 -0500
Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org
Message-ID: <jwvekvnp5y8.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>
References: <200311250107.KAA24646@etlken.m17n.org>
	<jwvfzgcsbuv.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>
	<200311260007.JAA26617@etlken.m17n.org>
	<jwvhe0rp6ml.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>
	<200311270134.KAA28664@etlken.m17n.org>
	<jwvad6hlwu1.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>
	<200312010043.JAA04933@etlken.m17n.org>
	<jwvd6b8ttfj.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>
	<200312021307.WAA01163@etlken.m17n.org>
NNTP-Posting-Host: deer.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: sea.gmane.org 1070385104 4761 80.91.224.253 (2 Dec 2003 17:11:44 GMT)
X-Complaints-To: usenet@sea.gmane.org
NNTP-Posting-Date: Tue, 2 Dec 2003 17:11:44 +0000 (UTC)
Cc: jas@extundo.com, emacs-devel@gnu.org
Original-X-From: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Tue Dec 02 18:11:40 2003
Return-path: <emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org>
Original-Received: from quimby.gnus.org ([80.91.224.244])
	by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian))
	id 1ARE3n-00081y-00
	for <emacs-devel@deer.gmane.org>; Tue, 02 Dec 2003 18:11:39 +0100
Original-Received: from monty-python.gnu.org ([199.232.76.173])
	by quimby.gnus.org with esmtp (Exim 3.35 #1 (Debian))
	id 1ARE3n-0008St-00
	for <emacs-devel@quimby.gnus.org>; Tue, 02 Dec 2003 18:11:39 +0100
Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org)
	by monty-python.gnu.org with esmtp (Exim 4.24)
	id 1ARElD-0003lo-Oq
	for emacs-devel@quimby.gnus.org; Tue, 02 Dec 2003 12:56:31 -0500
Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.24)
	id 1ARE12-0003vK-2d
	for emacs-devel@gnu.org; Tue, 02 Dec 2003 12:08:48 -0500
Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.24)
	id 1ARE0V-0003j2-LA
	for emacs-devel@gnu.org; Tue, 02 Dec 2003 12:08:46 -0500
Original-Received: from [132.204.24.67] (helo=mercure.iro.umontreal.ca)
	by monty-python.gnu.org with esmtp (Exim 4.24) id 1ARE0T-0003hu-VY
	for emacs-devel@gnu.org; Tue, 02 Dec 2003 12:08:14 -0500
Original-Received: from vor.iro.umontreal.ca (vor.iro.umontreal.ca [132.204.24.42])
	by mercure.iro.umontreal.ca (8.12.9/8.12.9) with ESMTP id
	hB2G6Ubj004558; Tue, 2 Dec 2003 11:06:31 -0500
Original-Received: by vor.iro.umontreal.ca (Postfix, from userid 20848)
	id 59A9B3C63E; Tue,  2 Dec 2003 11:06:30 -0500 (EST)
Original-To: Kenichi Handa <handa@m17n.org>
In-Reply-To: <200312021307.WAA01163@etlken.m17n.org>
Original-Lines: 41
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3.50
X-DIRO-MailScanner: Found to be clean
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.2
Precedence: list
List-Id: Emacs development discussions.  <emacs-devel.gnu.org>
List-Unsubscribe: <http://mail.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://mail.gnu.org/pipermail/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <http://mail.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org
Xref: main.gmane.org gmane.emacs.devel:18284
X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:18284

>> So we should at least signal an error if the conversion is
>> unsafe (in that make-string-multibyte will not recover the
>> original string).

> Shall we test it with HEAD to check how often such an error
> occurs?

That would be great.

>> BTW, in which kind of circumstances is the user presented with both
>> a multibyte buffer and a unibyte buffer ?

> Even if one starts Emacs with --unibyte, emacs sometimes
> make a multibyte buffer (e.g. C-h h).

I guess in a unibyte session, it makes sense, because in such a case,
unibyte buffers do contain characters and the user explicitly tells us
"don't bother me about multiple charsets, just pretend all fits within
8bits".

> And, even if one starts Emacs with --multibyte, he may have a file that
> contains, for instance, latin-1 characters and raw-byte data, and he may
> want to read such a file with the coding system raw-text (then C-x =
> always shows \000..\377).

Is such a buffer necessarily unibyte ?  Why not multibyte ?
Or is it for performance reasons ?
And what should happen if we paste text containing 8859-5 ou BIG5
text in such a buffer ?

> The fact that something doesn't work for double-byte charset
> users can't be a reason strong enough for dropping it for
> single-byte charset users.

Agreed.  But we should encourage people to "do it right" by calling
the appropriate encoding/decoding functions so it works for all cases.
I believe that a good way to encourage people is by discouraging the use of
string-make-unibyte (and other ways to use copy_text similarly).


        Stefan