From mboxrd@z Thu Jan  1 00:00:00 1970
Path: main.gmane.org!not-for-mail
From: Jason Rumney <jasonr@gnu.org>
Newsgroups: gmane.emacs.devel
Subject: Re: Unicode support for the MS Windows clipboard
Date: Thu, 27 May 2004 16:43:29 +0100
Organization: Integra SP Ltd
Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org
Message-ID: <40B60CA1.9020301@gnu.org>
References: <m3smdnm5jh.fsf@seneca.benny.turtle-trading.net>	<uzn7us3n8.fsf@jasonrumney.net>
	<m3y8nefaap.fsf@seneca.benny.turtle-trading.net>
NNTP-Posting-Host: deer.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: sea.gmane.org 1085675389 19196 80.91.224.253 (27 May 2004 16:29:49 GMT)
X-Complaints-To: usenet@sea.gmane.org
NNTP-Posting-Date: Thu, 27 May 2004 16:29:49 +0000 (UTC)
Cc: Sam Steingold <sds@gnu.org>, emacs-devel@gnu.org
Original-X-From: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Thu May 27 18:29:39 2004
Return-path: <emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org>
Original-Received: from quimby.gnus.org ([80.91.224.244])
	by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian))
	id 1BTNlC-0004lZ-00
	for <emacs-devel@deer.gmane.org>; Thu, 27 May 2004 18:29:38 +0200
Original-Received: from monty-python.gnu.org ([199.232.76.173])
	by quimby.gnus.org with esmtp (Exim 3.35 #1 (Debian))
	id 1BTNlC-0005Xe-00
	for <emacs-devel@quimby.gnus.org>; Thu, 27 May 2004 18:29:38 +0200
Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org)
	by monty-python.gnu.org with esmtp (Exim 4.34)
	id 1BTNCP-00029N-Bf
	for emacs-devel@quimby.gnus.org; Thu, 27 May 2004 11:53:41 -0400
Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.34)
	id 1BTNA2-0001rX-8n
	for emacs-devel@gnu.org; Thu, 27 May 2004 11:51:14 -0400
Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.34)
	id 1BTN9S-0001hS-FB
	for emacs-devel@gnu.org; Thu, 27 May 2004 11:51:10 -0400
Original-Received: from [209.61.173.204] (helo=integrasp.com)
	by monty-python.gnu.org with smtp (Exim 4.34) id 1BTN9R-0001hA-Nc
	for emacs-devel@gnu.org; Thu, 27 May 2004 11:50:38 -0400
Original-Received: (qmail 23803 invoked from network); 27 May 2004 15:28:20 -0000
Original-Received: from unknown (HELO ntserver.integrasp.com) (217.207.198.106)
	by 66.216.96.43 with SMTP; 27 May 2004 15:28:20 -0000
Original-Received: from [192.168.111.196] (altiojr.altio.com [192.168.111.196]) by
	ntserver.integrasp.com with SMTP (Microsoft Exchange Internet
	Mail Service Version 5.5.2653.13)
	id LXV19B2N; Thu, 27 May 2004 16:43:29 +0100
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US;
	rv:1.7) Gecko/20040514
X-Accept-Language: en-gb, en, ja
Original-To: Benjamin Riefenstahl <Benjamin.Riefenstahl@epost.de>
In-Reply-To: <m3y8nefaap.fsf@seneca.benny.turtle-trading.net>
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.4
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <http://mail.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://mail.gnu.org/pipermail/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <http://mail.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org
Xref: main.gmane.org gmane.emacs.devel:24026
X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:24026

Benjamin Riefenstahl wrote:

> Jason Rumney <jasonr@gnu.org> writes:

>>If that is the case, it might be better to get rid of
>>w32-clipboard-type as a user variable, and determine the type
>>automatically from selection-coding-system instead. cp<900 should
>>map to OEM, utf16 to unicode, and others to ANSI.
> 
> Can we just assume this?  Does "cp<900" really garantee OEM?

Possibly Thai is an exception, and maybe Vietnamese, we can make 
exceptions where necessary, but basically all the OEM codepages that are 
not also used as ANSI codepages are in the sub 900 range. The DBCS 
codepages in the 900-1000 range are used for both ANSI and OEM 
codepages, so either CF_TEXT or CF_OEMTEXT would be valid for them, 
though CF_TEXT is probably more widely recognized.

> How do we know that some exotic private trick coding system isn't usefull for
> CF_UNICODETEXT

The encoding of CF_UNICODETEXT does not vary, so utf-16-le (or maybe 
-be) is the only coding-system that is appropriate. As mentioned, we 
could map other utf coding systems automatically onto the right one, to 
avoid the user having to know too many details.

 > or for CF_OEMTEXT

CF_OEMTEXT is defined as the default console codepage for that version 
of Windows. Although it is theoretically possible for the user to have 
customized it beyond the limited set that come out of the box with 
different localised versions of Windows, it really isn't that 
interesting to us because other applications probably wouldn't support 
those non-default encodings either. I doubt there are many (if any) 
applications that support CF_OEMTEXT but not CF_TEXT, so it is probably 
better to just ignore it until someone comes up with a reason why we 
should support it.

>>Also, we should set (and read) CF_LOCALE when we are using CF_TEXT,
>>to indicate the coding we have used.
> 
> I'll have to look that up, I'm not familiar with CF_LOCALE.

I think the problem I had with that was finding a locale given an ANSI 
codepage. In the case where CF_LOCALE is the default system locale, we 
don't need to set it, and in other cases we would be better using 
CF_UNICODETEXT, so maybe this is not worth pursuing.

>>When reading from the clipboard, if CF_UNICODE is present, it might
>>be better to use that (ignoring selection-coding-system).
> 
> Could we get into trouble with the MULE problem here?  Or does
> unify-8859-on-{en,de}coding solve this for all cases?
> 
>>On the other hand, some Chinese characters are still not covered by
>>Emacs' unicode support (even with utf-translate-cjk-mode), [...]
>>Big5 is definitely not entirely covered).
> 
> If those characters are not supported by Unicode, how does Windows
> support them, which is based on Unicode after all?  Does it support
> them at all?  Or does it use the private characters for this?

Maybe I am imagining a problem that is not there. Having checked again, 
it seems the problem I saw with a character not being displayed, which I 
thought was due to an unsupported character was actually due to a 
character (in Chinese Traditional text) being decoded as 
japanese-jisx0212 which I don't have a font for. I can still see this 
being a major problem for Chinese users though.


   character: [] (0254137, 88159, 0x1585f, U+4F60)
     charset: japanese-jisx0212 (JISX0212 Japanese supplement: ISO-IR-159.)
  code point: 48 95
      syntax: w 	which means: word
    category: C:Chinese (Han) characters of 2-byte character sets 
j:Japanese
              |:While filling, we can break a line at this character.
buffer code: 0x94 0xB0 0xDF
   file code: not encodable by coding system mule-utf-8-dos
     display: no font available


> PS: The default for selection-coding-system should be cpXXXX-dos, not
> just cpXXXX.  Otherwise I get <LF> as line ends instead of <CR><LF>
> when I copy non-ASCII text.  Which than doesn't work well with
> Notepad, of course.

Thanks, the code to make sure selection-coding-system was dos seems to 
have been removed in my previous changes.