From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Jason Rumney Newsgroups: gmane.emacs.devel Subject: Re: Unicode support for the MS Windows clipboard Date: Thu, 27 May 2004 16:43:29 +0100 Organization: Integra SP Ltd Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Message-ID: <40B60CA1.9020301@gnu.org> References: NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Trace: sea.gmane.org 1085675389 19196 80.91.224.253 (27 May 2004 16:29:49 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Thu, 27 May 2004 16:29:49 +0000 (UTC) Cc: Sam Steingold , emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Thu May 27 18:29:39 2004 Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1BTNlC-0004lZ-00 for ; Thu, 27 May 2004 18:29:38 +0200 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.35 #1 (Debian)) id 1BTNlC-0005Xe-00 for ; Thu, 27 May 2004 18:29:38 +0200 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.34) id 1BTNCP-00029N-Bf for emacs-devel@quimby.gnus.org; Thu, 27 May 2004 11:53:41 -0400 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.34) id 1BTNA2-0001rX-8n for emacs-devel@gnu.org; Thu, 27 May 2004 11:51:14 -0400 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.34) id 1BTN9S-0001hS-FB for emacs-devel@gnu.org; Thu, 27 May 2004 11:51:10 -0400 Original-Received: from [209.61.173.204] (helo=integrasp.com) by monty-python.gnu.org with smtp (Exim 4.34) id 1BTN9R-0001hA-Nc for emacs-devel@gnu.org; Thu, 27 May 2004 11:50:38 -0400 Original-Received: (qmail 23803 invoked from network); 27 May 2004 15:28:20 -0000 Original-Received: from unknown (HELO ntserver.integrasp.com) (217.207.198.106) by 66.216.96.43 with SMTP; 27 May 2004 15:28:20 -0000 Original-Received: from [192.168.111.196] (altiojr.altio.com [192.168.111.196]) by ntserver.integrasp.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id LXV19B2N; Thu, 27 May 2004 16:43:29 +0100 User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7) Gecko/20040514 X-Accept-Language: en-gb, en, ja Original-To: Benjamin Riefenstahl In-Reply-To: X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.4 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:24026 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:24026 Benjamin Riefenstahl wrote: > Jason Rumney writes: >>If that is the case, it might be better to get rid of >>w32-clipboard-type as a user variable, and determine the type >>automatically from selection-coding-system instead. cp<900 should >>map to OEM, utf16 to unicode, and others to ANSI. > > Can we just assume this? Does "cp<900" really garantee OEM? Possibly Thai is an exception, and maybe Vietnamese, we can make exceptions where necessary, but basically all the OEM codepages that are not also used as ANSI codepages are in the sub 900 range. The DBCS codepages in the 900-1000 range are used for both ANSI and OEM codepages, so either CF_TEXT or CF_OEMTEXT would be valid for them, though CF_TEXT is probably more widely recognized. > How do we know that some exotic private trick coding system isn't usefull for > CF_UNICODETEXT The encoding of CF_UNICODETEXT does not vary, so utf-16-le (or maybe -be) is the only coding-system that is appropriate. As mentioned, we could map other utf coding systems automatically onto the right one, to avoid the user having to know too many details. > or for CF_OEMTEXT CF_OEMTEXT is defined as the default console codepage for that version of Windows. Although it is theoretically possible for the user to have customized it beyond the limited set that come out of the box with different localised versions of Windows, it really isn't that interesting to us because other applications probably wouldn't support those non-default encodings either. I doubt there are many (if any) applications that support CF_OEMTEXT but not CF_TEXT, so it is probably better to just ignore it until someone comes up with a reason why we should support it. >>Also, we should set (and read) CF_LOCALE when we are using CF_TEXT, >>to indicate the coding we have used. > > I'll have to look that up, I'm not familiar with CF_LOCALE. I think the problem I had with that was finding a locale given an ANSI codepage. In the case where CF_LOCALE is the default system locale, we don't need to set it, and in other cases we would be better using CF_UNICODETEXT, so maybe this is not worth pursuing. >>When reading from the clipboard, if CF_UNICODE is present, it might >>be better to use that (ignoring selection-coding-system). > > Could we get into trouble with the MULE problem here? Or does > unify-8859-on-{en,de}coding solve this for all cases? > >>On the other hand, some Chinese characters are still not covered by >>Emacs' unicode support (even with utf-translate-cjk-mode), [...] >>Big5 is definitely not entirely covered). > > If those characters are not supported by Unicode, how does Windows > support them, which is based on Unicode after all? Does it support > them at all? Or does it use the private characters for this? Maybe I am imagining a problem that is not there. Having checked again, it seems the problem I saw with a character not being displayed, which I thought was due to an unsupported character was actually due to a character (in Chinese Traditional text) being decoded as japanese-jisx0212 which I don't have a font for. I can still see this being a major problem for Chinese users though. character: [] (0254137, 88159, 0x1585f, U+4F60) charset: japanese-jisx0212 (JISX0212 Japanese supplement: ISO-IR-159.) code point: 48 95 syntax: w which means: word category: C:Chinese (Han) characters of 2-byte character sets j:Japanese |:While filling, we can break a line at this character. buffer code: 0x94 0xB0 0xDF file code: not encodable by coding system mule-utf-8-dos display: no font available > PS: The default for selection-coding-system should be cpXXXX-dos, not > just cpXXXX. Otherwise I get as line ends instead of > when I copy non-ASCII text. Which than doesn't work well with > Notepad, of course. Thanks, the code to make sure selection-coding-system was dos seems to have been removed in my previous changes.