From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel Subject: Re: eight-bit char handling in emacs-unicode Date: 02 Dec 2003 11:06:30 -0500 Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Message-ID: References: <200311250107.KAA24646@etlken.m17n.org> <200311260007.JAA26617@etlken.m17n.org> <200311270134.KAA28664@etlken.m17n.org> <200312010043.JAA04933@etlken.m17n.org> <200312021307.WAA01163@etlken.m17n.org> NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1070385104 4761 80.91.224.253 (2 Dec 2003 17:11:44 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Tue, 2 Dec 2003 17:11:44 +0000 (UTC) Cc: jas@extundo.com, emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Tue Dec 02 18:11:40 2003 Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1ARE3n-00081y-00 for ; Tue, 02 Dec 2003 18:11:39 +0100 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.35 #1 (Debian)) id 1ARE3n-0008St-00 for ; Tue, 02 Dec 2003 18:11:39 +0100 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.24) id 1ARElD-0003lo-Oq for emacs-devel@quimby.gnus.org; Tue, 02 Dec 2003 12:56:31 -0500 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.24) id 1ARE12-0003vK-2d for emacs-devel@gnu.org; Tue, 02 Dec 2003 12:08:48 -0500 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.24) id 1ARE0V-0003j2-LA for emacs-devel@gnu.org; Tue, 02 Dec 2003 12:08:46 -0500 Original-Received: from [132.204.24.67] (helo=mercure.iro.umontreal.ca) by monty-python.gnu.org with esmtp (Exim 4.24) id 1ARE0T-0003hu-VY for emacs-devel@gnu.org; Tue, 02 Dec 2003 12:08:14 -0500 Original-Received: from vor.iro.umontreal.ca (vor.iro.umontreal.ca [132.204.24.42]) by mercure.iro.umontreal.ca (8.12.9/8.12.9) with ESMTP id hB2G6Ubj004558; Tue, 2 Dec 2003 11:06:31 -0500 Original-Received: by vor.iro.umontreal.ca (Postfix, from userid 20848) id 59A9B3C63E; Tue, 2 Dec 2003 11:06:30 -0500 (EST) Original-To: Kenichi Handa In-Reply-To: <200312021307.WAA01163@etlken.m17n.org> Original-Lines: 41 User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3.50 X-DIRO-MailScanner: Found to be clean X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.2 Precedence: list List-Id: Emacs development discussions. List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:18284 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:18284 >> So we should at least signal an error if the conversion is >> unsafe (in that make-string-multibyte will not recover the >> original string). > Shall we test it with HEAD to check how often such an error > occurs? That would be great. >> BTW, in which kind of circumstances is the user presented with both >> a multibyte buffer and a unibyte buffer ? > Even if one starts Emacs with --unibyte, emacs sometimes > make a multibyte buffer (e.g. C-h h). I guess in a unibyte session, it makes sense, because in such a case, unibyte buffers do contain characters and the user explicitly tells us "don't bother me about multiple charsets, just pretend all fits within 8bits". > And, even if one starts Emacs with --multibyte, he may have a file that > contains, for instance, latin-1 characters and raw-byte data, and he may > want to read such a file with the coding system raw-text (then C-x = > always shows \000..\377). Is such a buffer necessarily unibyte ? Why not multibyte ? Or is it for performance reasons ? And what should happen if we paste text containing 8859-5 ou BIG5 text in such a buffer ? > The fact that something doesn't work for double-byte charset > users can't be a reason strong enough for dropping it for > single-byte charset users. Agreed. But we should encourage people to "do it right" by calling the appropriate encoding/decoding functions so it works for all cases. I believe that a good way to encourage people is by discouraging the use of string-make-unibyte (and other ways to use copy_text similarly). Stefan