From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.help Subject: Re: Copying and pasting Cyrillic text between Emacs and other apps Date: 29 Jan 2004 08:04:54 +0200 Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Message-ID: References: <36b165a.0401271516.c68fa38@posting.google.com> <36b165a.0401281140.4e2b26c@posting.google.com> Reply-To: Eli Zaretskii NNTP-Posting-Host: deer.gmane.org X-Trace: sea.gmane.org 1075356279 29877 80.91.224.253 (29 Jan 2004 06:04:39 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Thu, 29 Jan 2004 06:04:39 +0000 (UTC) Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Thu Jan 29 07:04:33 2004 Return-path: Original-Received: from monty-python.gnu.org ([199.232.76.173]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1Am5I1-0000h5-00 for ; Thu, 29 Jan 2004 07:04:33 +0100 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.24) id 1Am5Hd-00061b-Q6 for geh-help-gnu-emacs@m.gmane.org; Thu, 29 Jan 2004 01:04:09 -0500 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.24) id 1Am5HF-0005vi-K3 for help-gnu-emacs@gnu.org; Thu, 29 Jan 2004 01:03:45 -0500 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.24) id 1Am5Gj-0005au-4E for help-gnu-emacs@gnu.org; Thu, 29 Jan 2004 01:03:44 -0500 Original-Received: from [207.232.27.5] (helo=WST0054) by monty-python.gnu.org with asmtp (Exim 4.24) id 1Am5Gi-0005Xo-Fl for help-gnu-emacs@gnu.org; Thu, 29 Jan 2004 01:03:12 -0500 Original-To: help-gnu-emacs@gnu.org In-reply-to: <36b165a.0401281140.4e2b26c@posting.google.com> (paulgor@compuserve.com) X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.2 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: main.gmane.org gmane.emacs.help:16497 X-Report-Spam: http://spam.gmane.org/gmane.emacs.help:16497 > From: paulgor@compuserve.com (Paul Gorodyansky) > Newsgroups: gnu.emacs.help > Date: 28 Jan 2004 11:40:13 -0800 > > To see Windows code page I use 2 things: > a) go to Console and type > chcp > it returns OEM code page, say 850 and thus I know that > Windows code page is 1252 :) > MS has all that listed: > http://www.microsoft.com/globaldev/reference/cphome.mspx > > b) have my own 2-line C program that calls GetACP() > and puts it on screen :) so I can see > "System Code Page: 1252" It turns out my wording was inaccurate and thus misleading. What I wanted to see was what codepage was used to encode the characters. You seem to be assuming that this codepage is always identical to the system codepage, but that is not really true, at least not on Windows XP. Try copying into the clipboard Cyrillic characters from the Explorer on a non-Cyrillic Windows machine, and you will see that CF_TEXT is encoded in cp1251 even though the system codepage is something different. > As for characters and their Unicode codepoints: > a) Start/Run - charmap - and I can see a Unicode # for > each symbol > b) http://www.unicode.org/unicode/reports/tr24/charts/index.html Sure, there are lots of places where Unicode codepoints of the characters are listed, but what I wanted to know is how does Windows encode them in the clipboard. It turns out they use the 16-bit Unicode codepoints, at least for the BMP. (Out of curiosity: do you or anyone else know how does Windows encode characters outside the BMP? Is it UTF-16 or something else?)