unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Benjamin Riefenstahl <Benjamin.Riefenstahl@epost.de>
Cc: sds@gnu.org, Jason Rumney <jasonr@gnu.org>,
	Stefan Monnier <monnier@iro.umontreal.ca>,
	emacs-devel@gnu.org
Subject: Re: Unicode support for the MS Windows clipboard
Date: Fri, 28 May 2004 15:26:10 +0200	[thread overview]
Message-ID: <m3brk8fzt9.fsf@seneca.benny.turtle-trading.net> (raw)
In-Reply-To: <m33c5mgq49.fsf@seneca.benny.turtle-trading.net> (Benjamin Riefenstahl's message of "Thu, 27 May 2004 11:45:42 +0200")

Hi all,


I have been doing some testing, research and thinking.  First here is
a quick response on a few points.  Next I'll do a new patch based on
my current ideas.


> "Eli Zaretskii" <eliz@gnu.org> writes:
>> Couldn't this be done without introducing Windows-specific options?

Jason Rumney <jasonr@gnu.org> writes:
> Also, we should set (and read) CF_LOCALE when we are using CF_TEXT,
> to indicate the coding we have used.

Thanks for the pointer, researching locales actually led me to a
solution for deriving codepage properties (OEM vs "ANSI") via locales.

I think I have an algorithm that works.  It makes a few assumptions
about the coding system names and based on that it derives the
requested clipboard type automatically.

How does this sound:

- If selection-coding-system has the form /(.*-)?utf-16.*/, I assume
  CF_UNICODETEXT is wanted.

- If selection-coding-system has the form /cp[0-9]+.*/ or
  /windows-[0-9]+.*/, I derive the codepage from that.

    - Check if the codepage is identical to GetACP() or GetOEMCP().
      If it is, use CF_TEXT or CF_OEMTEXT accordingly. 

    - Else get a corresponding LCID (reverse mapping via
      EnumLocales()) which has the codepage as OEM or "ANSI".  In this
      case we also need to set LC_LOCALE accordingly.

The last step takes a small performance hit, but the results can
easily be cached.

I am also thinking of custom coding systems, like e.g. for doing
automatic remapping of private characters or locale specific
pre-/postprocessing.  This is why I am not completely comfortable with
hardcoding coding systems or using heuristics based on the coding
system symbol names.  If such concerns are completely misplaced,
please just tell ;-).

Anyway, I have no problem with dictating the above naming conventions
for selection-coding-system for now.


Jason Rumney <jasonr@gnu.org> writes:
> Andrew Innes always had the intention to make the clipboard work
> on-demand, the same way it does on X. So the memory would only be
> used if the clipboard text was actually pasted (and then only for
> the format the client wanted).

We could do that using WM_RENDERFORMAT.  But than we absolutely need a
valid HWND to get a target for that message.  I don't know anything
about the Emacs message loop and the windows that are available.  It
would probably be best to allocate a custom hidden window for this.
I'll postpone that idea for now and just assume that we don't use
Unicode on 9x/Me.

Jason Rumney <jasonr@gnu.org> writes:
> Another thing worth considering, if we are making major changes to the
> clipboard code, is that Kenichi Handa pointed out some time ago that
> the encoding part of the X clipboard support is now done in Lisp
> (xselect.el). Windows could do this too.

At the moment this is done via {de,en}code_coding() and a couple of
friends.  Is that the same thing?


Benjamin Riefenstahl <Benjamin.Riefenstahl@epost.de> writes:
>> Anyway, what happens to the MULE problem in this unified scenario?
>> Do all problems go away with unify-8859-on-{de,en}coding?

Jason Rumney <jasonr@gnu.org> writes:
> What MULE problem?

Disjunct charsets leading to the introduction of unwanted characters
(similar to that SHIFT-JIS <-> Chinese confusion that you just
mentioned).  At one of the last times when the discussion came up
somebody mentioned that this could still be a serious problem.


Jason Rumney <jasonr@gnu.org> writes:
> The encoding of CF_UNICODETEXT does not vary, so utf-16-le (or maybe
> -be) is the only coding-system that is appropriate.

Actually at the moment that would be utf-16le-dos, not utf-16-le-dos.
The latter includes a BOM, which we really don't want here.  The
non-intuitive naming difference makes me wonder though, if this is
just some unintended confusion?  There are also currently
utf-16-le-with-signature-* and mule-utf-16-*.


> "Eli Zaretskii" <eliz@gnu.org> writes:
>> Also, AFAIK CF_UNICODETEXT _can_ be used on Windows 9x, as any
>> program like clipbrd.exe or ClipConvert will show you.

I tested Win95 and Win98SE.  On both systems, the clipboard viewer and
Notepad couldn't make use of CF_UNICODETEXT.  Cut-and-paste between
two Emacs instances via CF_UNICODETEXT works, so i assume other
applications that support CF_UNICODETEXT would work, too.  No
automatic conversion by Windows, though.


Benjamin Riefenstahl <Benjamin.Riefenstahl@epost.de> writes:
>>> - Drop optimizations for ASCII-only text.

> "Eli Zaretskii" <eliz@gnu.org> writes:
>> Is that optimization indeed an optimization?

Getting data from the clipboard is indeed quite a bit faster with this
optimization.  Putting something on the clipboard doesn't benefit, but
that's probably because the detection of this case is inefficient, it
uses find_charset_in_text(), although the result is not really
used. So probably that can be made better, too.  I'll try to get this
integrated in the next version of the patch.


benny

  parent reply	other threads:[~2004-05-28 13:26 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-05-26 18:01 [Patch] Unicode support for the MS Windows clipboard Benjamin Riefenstahl
2004-05-27  7:58 ` Jason Rumney
2004-05-27 10:12   ` Benjamin Riefenstahl
2004-05-27 15:43     ` Jason Rumney
2004-05-27 17:46       ` Stefan Monnier
2004-05-27 21:30         ` Jason Rumney
2004-05-27  8:05 ` [Patch] " Eli Zaretskii
2004-05-27  9:45   ` Benjamin Riefenstahl
2004-05-28  7:44     ` Jason Rumney
2004-05-28  9:20     ` Eli Zaretskii
2004-05-29 14:46       ` Benjamin Riefenstahl
2004-05-28 13:26     ` Benjamin Riefenstahl [this message]
2004-05-28 14:48       ` Jason Rumney
2004-05-29  0:15         ` Kenichi Handa
2004-05-29 12:21       ` Eli Zaretskii
2004-05-29 14:52         ` Benjamin Riefenstahl
2004-05-29 17:40           ` Eli Zaretskii
2004-06-03  9:17       ` Benjamin Riefenstahl
2004-06-03 13:21         ` Kenichi Handa
2004-06-04 13:01         ` Eli Zaretskii
2004-07-26 19:17         ` Benjamin Riefenstahl
2004-07-26 19:35           ` Jason Rumney
2004-07-27 22:43             ` Benjamin Riefenstahl
2004-07-28  4:45               ` Eli Zaretskii
2004-07-28  8:02                 ` Jason Rumney
2004-07-28 18:57                   ` Eli Zaretskii
2004-07-28 11:30                 ` Benjamin Riefenstahl
2004-07-28 12:38                   ` Benjamin Riefenstahl
2004-07-28 13:03                     ` Jason Rumney
2004-07-28 13:44                       ` Benjamin Riefenstahl
2004-07-26 19:45           ` Jason Rumney
2004-07-27 11:17             ` Benjamin Riefenstahl
2004-07-27  5:07           ` Eli Zaretskii
2004-07-27 12:20             ` Benjamin Riefenstahl
2004-07-27  7:41           ` Jason Rumney
2004-07-27 11:04             ` Benjamin Riefenstahl
2004-07-27 12:24             ` Benjamin Riefenstahl
2004-07-27 13:15               ` Jason Rumney
2004-07-28  1:12               ` Tak Ota
2004-07-28 11:20                 ` Benjamin Riefenstahl
2004-07-28 11:35                   ` Jason Rumney
2004-07-28 12:08                     ` Benjamin Riefenstahl
2004-07-28 16:57                       ` Tak Ota
2004-07-28 17:34                       ` Tak Ota
2004-07-28 16:26                     ` Tak Ota
2004-07-28 18:42               ` Tak Ota
2004-07-28 21:51                 ` Tak Ota
2004-07-29 11:42                   ` Benjamin Riefenstahl
2004-07-29 16:38                     ` Tak Ota
2004-08-27 17:06               ` Tak Ota
2004-08-29 13:33                 ` Benjamin Riefenstahl
2004-08-30 20:47                   ` Unicode support for the MS Windows clipboard [new patch] Benjamin Riefenstahl
2004-08-31  4:05                     ` Eli Zaretskii
2004-09-12 19:50                       ` Benjamin Riefenstahl
2004-09-13 19:55                         ` Eli Zaretskii
2004-09-08 21:11                     ` Tak Ota
2004-09-10 13:47                     ` Kim F. Storm
2004-09-10 15:34                       ` Jason Rumney
2004-09-10 17:46                         ` Benjamin Riefenstahl
2004-09-12  9:09                           ` Richard Stallman
2004-09-12 14:11                             ` Benjamin Riefenstahl
2004-11-08 17:24                     ` Benjamin Riefenstahl
2004-11-15 21:41                       ` Tak Ota
2004-11-21 19:17                         ` Benjamin Riefenstahl
2005-02-08  0:49                       ` Tak Ota
2005-02-08  9:04                         ` Jason Rumney
2005-02-15 18:19                           ` Tak Ota
2005-02-16  9:52                             ` Jason Rumney
2005-02-16 17:09                             ` Benjamin Riefenstahl
2004-05-28 15:18     ` Unicode support for the MS Windows clipboard Stefan Monnier
2004-05-29 12:23       ` Eli Zaretskii
2004-05-27 17:48   ` [Patch] " Stefan Monnier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m3brk8fzt9.fsf@seneca.benny.turtle-trading.net \
    --to=benjamin.riefenstahl@epost.de \
    --cc=emacs-devel@gnu.org \
    --cc=jasonr@gnu.org \
    --cc=monnier@iro.umontreal.ca \
    --cc=sds@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).