From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: eight-bit char handling in emacs-unicode Date: Fri, 21 Nov 2003 09:41:47 +0900 (JST) Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Message-ID: <200311210041.JAA18324@etlken.m17n.org> References: <200311130153.KAA04615@etlken.m17n.org> <200311130610.PAA04983@etlken.m17n.org> <200311130901.SAA05204@etlken.m17n.org> <200311140047.JAA06414@etlken.m17n.org> <200311180733.QAA13703@etlken.m17n.org> <200311190006.JAA14847@etlken.m17n.org> NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII X-Trace: sea.gmane.org 1069376401 11106 80.91.224.253 (21 Nov 2003 01:00:01 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Fri, 21 Nov 2003 01:00:01 +0000 (UTC) Cc: jas@extundo.com, emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Fri Nov 21 01:59:56 2003 Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1AMzeN-0005g3-00 for ; Fri, 21 Nov 2003 01:59:55 +0100 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.35 #1 (Debian)) id 1AMzeN-0004J8-00 for ; Fri, 21 Nov 2003 01:59:55 +0100 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.24) id 1AN0SF-0000WV-Ce for emacs-devel@quimby.gnus.org; Thu, 20 Nov 2003 20:51:27 -0500 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.24) id 1AN0QK-0008JF-R2 for emacs-devel@gnu.org; Thu, 20 Nov 2003 20:49:28 -0500 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.24) id 1AN0KW-0006LU-Fk for emacs-devel@gnu.org; Thu, 20 Nov 2003 20:43:59 -0500 Original-Received: from [192.47.44.130] (helo=tsukuba.m17n.org) by monty-python.gnu.org with esmtp (Exim 4.24) id 1AN0KV-0006K6-Dk for emacs-devel@gnu.org; Thu, 20 Nov 2003 20:43:27 -0500 Original-Received: from fs.m17n.org (fs.m17n.org [192.47.44.2]) by tsukuba.m17n.org (8.11.6p2/3.7W-20010518204228) with ESMTP id hAL0fmh06858; Fri, 21 Nov 2003 09:41:48 +0900 (JST) (envelope-from handa@m17n.org) Original-Received: from etlken.m17n.org (etlken.m17n.org [192.47.44.125]) by fs.m17n.org (8.11.6/3.7W-20010823150639) with ESMTP id hAL0fls11944; Fri, 21 Nov 2003 09:41:47 +0900 (JST) Original-Received: (from handa@localhost) by etlken.m17n.org (8.8.8+Sun/3.7W-2001040620) id JAA18324; Fri, 21 Nov 2003 09:41:47 +0900 (JST) Original-To: monnier@IRO.UMontreal.CA In-reply-to: (message from Stefan Monnier on 18 Nov 2003 22:05:39 -0500) User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.3 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.2 Precedence: list List-Id: Emacs development discussions. List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:17995 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:17995 In article , Stefan Monnier writes: >> I see. Apart from the design itself, I agree that it's difficult to >> introduce a new type. But, when I discussed with Richard about the >> Character type object a few year ago, he was not that negative provided >> that it gives sure improvement. > Sounds about right to me: we have one free tag that we could use for chars Yes, and as that is the last free tag, I still hesitate to consume it for the Character object. >> Then, we can't use make-string-unibyte for the current case >> because, in emacs-unicode, (concat '(?a 192)) returns a >> multibyte string whose second element is A-grave, not an >> eight-bit-char. Am I missing something? > Well, obviously we need to make it accept this case (i.e. accept both the > latin-1 192 and the eight-bit-char 192). Then, I see your intention. But, isn't the semantics of such a function very weird? >>> To do what your string-make-unibyte does you should use >>> `encode-coding-string' where the coding system is passed explicitly. >> Those are conceptually different things (I remember the >> similar discussion we had a while ago). >> encode-coding-string does: >> char-sequence --CCS-set--> (CCS/codepoint-pair)-sequence >> --CES--> encoded-byte-sequence >> string-make-unibyte does: >> char-sequence --CCS--> code-point-sequence >> --concat--> code-point-sequence >> These two yield the same result only when CCS support all >> chars in "char-sequence" and CES is stateless >> (e.g. iso-latin-1) and . > You lost me here (I'm a poor soul whose doesn't know much outside of the > latin-1 world). CCS: Coded Character Set CES: Character Encoding Scheme coding-system of Emacs: Set of CCSs and CES. iso-latin-1: CCSs are ascii and latin-iso8859-1, CES is 8-bit version of ISO-2022 iso-2022-jp: CCSs are ascii, japanese-jisx0208, ... CES is 7-bit version of ISO-2022 > I thought that string-make-unibyte only behaves meaningfully for > "normal 8bit coding-systems" such as latin-1. Yes, but it doesn't mean it is conceptually the same as encode-coding-string. The result of string-make-unibyte should still be regarded as a sequence of character, but the result of encode-coding-string is a sequence of byte. Here exists an ambiguity of a unibyte string. The number 192 can be regarded as: (1) just a number, a byte (2) a code point of some character set. (3) a character code A unibyte string can contain (1) and (2) without distinguishing them, but a multibyte string can contain (1) and (3) while distinguishing them. --- Ken'ichi HANDA handa@m17n.org