unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: "Joakim Hårsman" <joakim.harsman@gmail.com>
To: Eli Zaretskii <eliz@gnu.org>
Cc: 10299@debbugs.gnu.org
Subject: bug#10299: Emacs doesn't handle Unicode characters in keyboard layout on MS Windows
Date: Thu, 15 Dec 2011 08:53:15 +0100	[thread overview]
Message-ID: <CAFJF9wUCd+oo=28UwH0E+XH2QJFKjFiQuGgn7NQ+Xwmdgkqo2A@mail.gmail.com> (raw)
In-Reply-To: <E1Rb4hn-0002C6-5C@fencepost.gnu.org>

On 15 December 2011 07:22, Eli Zaretskii <eliz@gnu.org> wrote:
>> Date: Wed, 14 Dec 2011 21:39:28 +0100
>> From: Joakim Hårsman <joakim.harsman@gmail.com>
>>
>> However, Emacs doesn't seem to handle the case when the keyboard
>> layout contains characters not available in the ANSI code page, and
>> just prints a question mark character instead.
>
> Yes, Emacs on Windows uses the ANSI codepage to read the keyboard
> input.  Does it help to play with the value of keyboard-coding-system?

No, changing keyboard-coding-system doesn't help, and utf-16le-dos
isn't a valid setting for keyboard-coding-system anyway.

>> For certain characters,
>> a character that is visually similar to the actual character is
>> printed instead of a question mark. For example, if I use a layout
>> where AltGr+O produces U+2218 RING OPERATOR, Emacs prints U+00B0
>> DEGREE SYMBOL instead. The degree symbol is available in Windows 1252,
>> the default ANSI code page on my system, but the ring operator
>> isn't.
>
> I'm guessing that this is Windows trying to translate the characters
> to the ANSI codepage behind the scenes.
>
>> However, if the layout maps AltGr+R to U+0220A SMALL ELEMENT OF, Emacs
>> just prints a question mark, presumably because Windows 1252 doesn't
>> contain a reasonable replacement for that character.
>
> Will inputting these characters with "C-x 8 RET 0220a RET" or "C-x 8
> RET SMALL ELEMENT OF RET" be a good enough solution for you?  You can
> input any Unicode character by its name or codepoint using "C-x 8 RET".

Using C-x 8 is too cumbersome. I guess I could write my own custom
Emacs input method, but since Emacs now has good support for Unicode,
it would seem easier if it handled Unicode key events from the OS
correctly.

>> I'd be happy to help debug this but I have no idea where to even
>> start. Is there an easy way to find out if it's the C code that
>> clobbers the character or if it happens in lisp for example?
>
> I don't think there any "clobbering".  Emacs deliberately converts the
> Unicode characters to the current locale's ANSI codepage.  I think
> (but I'm not sure) the reason is that Emacs cannot use UTF-16 for
> keyboard input.  Perhaps Jason and Handa-san could comment on this.

I really don't know my way around the Emacs source, but a quick look
at w32_kbd_patch_key in w32inevt.c seems to indicate that Emacs really
is decoding the Unicode character event correctly, both
uChar.UnicodeChar and uChar.AsciiChar seem to be set correctly.

  /* On NT, call ToUnicode instead and then convert to the current
     locale's default codepage.  */
  if (os_subtype == OS_NT)
    {
      WCHAR buf[128];

      isdead = ToUnicode (event->wVirtualKeyCode, event->wVirtualScanCode,
			  keystate, buf, 128, 0);
      if (isdead > 0)
	{
	  char cp[20];
	  int cpId;

	  event->uChar.UnicodeChar = buf[isdead - 1];

	  GetLocaleInfo (GetThreadLocale (),
			 LOCALE_IDEFAULTANSICODEPAGE, cp, 20);
	  cpId = atoi (cp);
	  isdead = WideCharToMultiByte (cpId, 0, buf, isdead,
					ansi_code, 4, NULL, NULL);
	}
      else
	isdead = 0;
    }

However, this bit from w32_wnd_proc in w32fns.c looks suspicious to me:

		  else
		    {
		      /* Try to handle other keystrokes by determining the
			 base character (ie. translating the base key plus
			 shift modifier).  */
		      int add;
		      KEY_EVENT_RECORD key;

		      key.bKeyDown = TRUE;
		      key.wRepeatCount = 1;
		      key.wVirtualKeyCode = wParam;
		      key.wVirtualScanCode = (lParam & 0xFF0000) >> 16;
		      key.uChar.AsciiChar = 0;
		      key.dwControlKeyState = modifiers;

		      add = w32_kbd_patch_key (&key);
		      /* 0 means an unrecognized keycode, negative means
			 dead key.  Ignore both.  */
		      while (--add >= 0)
			{
			  /* Forward asciified character sequence.  */
			  post_character_message
			    (hwnd, WM_CHAR,
                             (unsigned char) key.uChar.AsciiChar, lParam,
			     w32_get_key_modifiers (wParam, lParam));
			  w32_kbd_patch_key (&key);
			}
		      return 0;
		    }

It looks like it's re-posting the event with just the Ascii key code,
clobbering the Unicode info that's originally in wParam. Or maybe the
idea is to translate characters that require multiple bytes into
multiple events?





  parent reply	other threads:[~2011-12-15  7:53 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-14 20:39 bug#10299: Emacs doesn't handle Unicode characters in keyboard layout on MS Windows Joakim Hårsman
2011-12-15  6:22 ` Eli Zaretskii
2011-12-15  6:51   ` Kenichi Handa
2011-12-15  7:53   ` Joakim Hårsman [this message]
2011-12-15 10:52     ` Eli Zaretskii
2011-12-15 11:11       ` Joakim Hårsman
2011-12-15 13:16         ` Eli Zaretskii
2011-12-15 14:40   ` Jason Rumney
2011-12-15 15:08     ` Lennart Borgman
2011-12-15 15:40     ` Joakim Hårsman
2011-12-15 17:34     ` Eli Zaretskii
2011-12-15 20:50       ` Joakim Hårsman
2011-12-15 21:47         ` Joakim Hårsman
2011-12-16  8:13           ` Eli Zaretskii
2011-12-16 11:01             ` Joakim Hårsman
2011-12-16 11:14               ` Dani Moncayo
2011-12-16 11:26                 ` Eli Zaretskii
2011-12-17 12:52                   ` Joakim Hårsman
2011-12-17 15:23                     ` Eli Zaretskii
     [not found]                       ` <CAFJF9wW7Cfmad+BmjQ4A-sVeLi+eRvOXSWfD=--=QJmr3Ver6w@mail.gmail.com>
2011-12-18 18:13                         ` Eli Zaretskii
2011-12-19 10:44                           ` Joakim Hårsman
2011-12-19 10:59                             ` Lennart Borgman
2011-12-19 11:04                               ` Joakim Hårsman
2011-12-19 11:17                                 ` Lennart Borgman
2011-12-19 11:50                                   ` Joakim Hårsman
2011-12-19 13:31                           ` Jason Rumney
2011-12-20 21:16                           ` Joakim Hårsman
2012-01-14 16:40                             ` Joakim Hårsman
2012-01-16 14:03                               ` Stefan Monnier
2012-01-23 19:15                                 ` Joakim Hårsman
2012-01-24  1:35                                   ` Stefan Monnier
2012-01-24  9:40                                     ` Andreas Schwab
2012-01-24 12:03                                       ` Juanma Barranquero
2012-01-24 20:42                                         ` Joakim Hårsman
2012-07-28 14:50                                           ` Eli Zaretskii
2012-08-06 20:20                                             ` Joakim Hårsman
2012-08-07  2:53                                               ` Eli Zaretskii
2012-08-07 19:47                                                 ` Joakim Hårsman
2012-08-08  2:48                                                   ` Eli Zaretskii
2012-08-08 18:54                                                     ` Joakim Hårsman
2012-08-10  6:56                                                       ` Eli Zaretskii
2012-08-07 12:15                                               ` Jason Rumney
2012-08-07 19:49                                                 ` Joakim Hårsman
2011-12-16 11:22               ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAFJF9wUCd+oo=28UwH0E+XH2QJFKjFiQuGgn7NQ+Xwmdgkqo2A@mail.gmail.com' \
    --to=joakim.harsman@gmail.com \
    --cc=10299@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).