From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Benjamin Riefenstahl Newsgroups: gmane.emacs.devel Subject: Re: utf-8 cut/paste Date: Wed, 26 May 2004 14:30:37 +0200 Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Message-ID: References: <9003-Tue25May2004080243+0300-eliz@gnu.org> <9743-Tue25May2004143607+0300-eliz@gnu.org> NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: sea.gmane.org 1085581024 5091 80.91.224.253 (26 May 2004 14:17:04 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Wed, 26 May 2004 14:17:04 +0000 (UTC) Cc: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Wed May 26 16:16:50 2004 Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1BSzD8-0002Pi-00 for ; Wed, 26 May 2004 16:16:50 +0200 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.35 #1 (Debian)) id 1BSzD7-0006qp-00 for ; Wed, 26 May 2004 16:16:49 +0200 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.34) id 1BSxdG-0002Rl-09 for emacs-devel@quimby.gnus.org; Wed, 26 May 2004 08:35:42 -0400 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.34) id 1BSxco-0002R9-5d for emacs-devel@gnu.org; Wed, 26 May 2004 08:35:14 -0400 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.34) id 1BSxcC-0002Lj-Bf for emacs-devel@gnu.org; Wed, 26 May 2004 08:35:08 -0400 Original-Received: from [193.28.100.151] (helo=mail.epost.de) by monty-python.gnu.org with esmtp (Exim 4.34) id 1BSxZ1-0001zw-K1; Wed, 26 May 2004 08:31:20 -0400 Original-Received: from seneca.benny.turtle-trading.net.epost.de (193.99.153.30) by mail.epost.de (6.7.015) id 40B135700009658D; Wed, 26 May 2004 14:31:17 +0200 Original-To: Sam Steingold In-Reply-To: (Sam Steingold's message of "Tue, 25 May 2004 11:41:09 -0400") User-Agent: Gnus/5.1001 (Gnus v5.10.1) Emacs/21.3.50 (gnu/linux) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.4 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:23956 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:23956 Hi Sam, Note that your original problem with cyrillic is not actually related to MULE. MULE may make sound things a bit more complicated, but the problem is that Emacs doesn't use the Unicode APIs of Windows. Which it can do fine (and probably will at some point), with or without MULE. At least on NT/W2K/XP, I don't know whether the Unicode clipboard works on 9x/Me. Sam Steingold writes: > each character comes equipped with its integer encoding, and 2 > characters which are identical elements of CHARACTER, but appear in > two different encodings (e.g., #\=D0=A6 encoded in koi8 and in alt) are > different characters in MULE. This is so absurd that I can hardly > believe that anyone could ever conceive of this, let alone implement > it. You are presupposing that you know which "2 characters [...] are identical elements of CHARACTER, but appear in two different encodings." While this knowledge seems obvious in theory, in practice it involves quite a lot of work to formalize this unification for all relevant charsets (i.e. for the charsets that are actually in use). After the work has mostly been done in Unicode, this kind of information is actually one of the major benefits of that standard. So now, today we have a well-defined reference for things like: > #\C is a "LATIN CAPITAL LETTER C", or #\=D0=A1 is a "CYRILLIC CAPITAL > LETTER ES" (even through they might look similar in your font). But when MULE was first implemented, Unicode was in its infancy, if I see this right. So at that time this knowledge wasn't available in formal terms and in the necessary breadth. IOW, MULE (building on ISO-2022) was a solution at the time, while Unicode was a still in the design phase with much work to go. benny