From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Joakim =?UTF-8?Q?H=C3=A5rsman?= Newsgroups: gmane.emacs.bugs Subject: bug#10299: Emacs doesn't handle Unicode characters in keyboard layout on MS Windows Date: Thu, 15 Dec 2011 08:53:15 +0100 Message-ID: References: NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Trace: dough.gmane.org 1323935623 15038 80.91.229.12 (15 Dec 2011 07:53:43 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Thu, 15 Dec 2011 07:53:43 +0000 (UTC) Cc: 10299@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Thu Dec 15 08:53:39 2011 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([140.186.70.17]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Rb68I-0006y7-PR for geb-bug-gnu-emacs@m.gmane.org; Thu, 15 Dec 2011 08:53:39 +0100 Original-Received: from localhost ([::1]:58187 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Rb68I-0002nP-6f for geb-bug-gnu-emacs@m.gmane.org; Thu, 15 Dec 2011 02:53:38 -0500 Original-Received: from eggs.gnu.org ([140.186.70.92]:38421) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Rb68F-0002n3-3Z for bug-gnu-emacs@gnu.org; Thu, 15 Dec 2011 02:53:36 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Rb68E-0000ew-21 for bug-gnu-emacs@gnu.org; Thu, 15 Dec 2011 02:53:35 -0500 Original-Received: from debbugs.gnu.org ([140.186.70.43]:38065) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Rb68D-0000dX-Vl for bug-gnu-emacs@gnu.org; Thu, 15 Dec 2011 02:53:34 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.69) (envelope-from ) id 1Rb69d-0006dJ-LU for bug-gnu-emacs@gnu.org; Thu, 15 Dec 2011 02:55:01 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Joakim =?UTF-8?Q?H=C3=A5rsman?= Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 15 Dec 2011 07:55:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 10299 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 10299-submit@debbugs.gnu.org id=B10299.132393568725478 (code B ref 10299); Thu, 15 Dec 2011 07:55:01 +0000 Original-Received: (at 10299) by debbugs.gnu.org; 15 Dec 2011 07:54:47 +0000 Original-Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1Rb69P-0006ct-8x for submit@debbugs.gnu.org; Thu, 15 Dec 2011 02:54:47 -0500 Original-Received: from mail-ey0-f172.google.com ([209.85.215.172]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1Rb69N-0006cn-8b for 10299@debbugs.gnu.org; Thu, 15 Dec 2011 02:54:46 -0500 Original-Received: by eaad1 with SMTP id d1so1565811eaa.3 for <10299@debbugs.gnu.org>; Wed, 14 Dec 2011 23:53:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=jecuTEJQgDO3V3EtRe1+839L4Gl7UofK+qV9JxBSe3k=; b=VyWWgVVC+3V6H+ESWGba2Uzyudm0cUqpUTruSiTtnpeTyikqBlJOstgWFX6h+RfFFK WY4TGE4EW0ml/6HRCtmI9GXv9IjwPUfblrfQHLTACbJ8K4Cah8HK8EUVqXjVkFI+CX+b Jo8AfrBcIIIW7fthcw1HUOIJ2KuuhuQflwx5Y= Original-Received: by 10.204.148.77 with SMTP id o13mr402869bkv.97.1323935595859; Wed, 14 Dec 2011 23:53:15 -0800 (PST) Original-Received: by 10.204.58.209 with HTTP; Wed, 14 Dec 2011 23:53:15 -0800 (PST) In-Reply-To: X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list Resent-Date: Thu, 15 Dec 2011 02:55:01 -0500 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:54979 Archived-At: On 15 December 2011 07:22, Eli Zaretskii wrote: >> Date: Wed, 14 Dec 2011 21:39:28 +0100 >> From: Joakim H=E5rsman >> >> However, Emacs doesn't seem to handle the case when the keyboard >> layout contains characters not available in the ANSI code page, and >> just prints a question mark character instead. > > Yes, Emacs on Windows uses the ANSI codepage to read the keyboard > input. =A0Does it help to play with the value of keyboard-coding-system? No, changing keyboard-coding-system doesn't help, and utf-16le-dos isn't a valid setting for keyboard-coding-system anyway. >> For certain characters, >> a character that is visually similar to the actual character is >> printed instead of a question mark. For example, if I use a layout >> where AltGr+O produces U+2218 RING OPERATOR, Emacs prints U+00B0 >> DEGREE SYMBOL instead. The degree symbol is available in Windows 1252, >> the default ANSI code page on my system, but the ring operator >> isn't. > > I'm guessing that this is Windows trying to translate the characters > to the ANSI codepage behind the scenes. > >> However, if the layout maps AltGr+R to U+0220A SMALL ELEMENT OF, Emacs >> just prints a question mark, presumably because Windows 1252 doesn't >> contain a reasonable replacement for that character. > > Will inputting these characters with "C-x 8 RET 0220a RET" or "C-x 8 > RET SMALL ELEMENT OF RET" be a good enough solution for you? =A0You can > input any Unicode character by its name or codepoint using "C-x 8 RET". Using C-x 8 is too cumbersome. I guess I could write my own custom Emacs input method, but since Emacs now has good support for Unicode, it would seem easier if it handled Unicode key events from the OS correctly. >> I'd be happy to help debug this but I have no idea where to even >> start. Is there an easy way to find out if it's the C code that >> clobbers the character or if it happens in lisp for example? > > I don't think there any "clobbering". =A0Emacs deliberately converts the > Unicode characters to the current locale's ANSI codepage. =A0I think > (but I'm not sure) the reason is that Emacs cannot use UTF-16 for > keyboard input. =A0Perhaps Jason and Handa-san could comment on this. I really don't know my way around the Emacs source, but a quick look at w32_kbd_patch_key in w32inevt.c seems to indicate that Emacs really is decoding the Unicode character event correctly, both uChar.UnicodeChar and uChar.AsciiChar seem to be set correctly. /* On NT, call ToUnicode instead and then convert to the current locale's default codepage. */ if (os_subtype =3D=3D OS_NT) { WCHAR buf[128]; isdead =3D ToUnicode (event->wVirtualKeyCode, event->wVirtualScanCode= , keystate, buf, 128, 0); if (isdead > 0) { char cp[20]; int cpId; event->uChar.UnicodeChar =3D buf[isdead - 1]; GetLocaleInfo (GetThreadLocale (), LOCALE_IDEFAULTANSICODEPAGE, cp, 20); cpId =3D atoi (cp); isdead =3D WideCharToMultiByte (cpId, 0, buf, isdead, ansi_code, 4, NULL, NULL); } else isdead =3D 0; } However, this bit from w32_wnd_proc in w32fns.c looks suspicious to me: else { /* Try to handle other keystrokes by determining the base character (ie. translating the base key plus shift modifier). */ int add; KEY_EVENT_RECORD key; key.bKeyDown =3D TRUE; key.wRepeatCount =3D 1; key.wVirtualKeyCode =3D wParam; key.wVirtualScanCode =3D (lParam & 0xFF0000) >> 16; key.uChar.AsciiChar =3D 0; key.dwControlKeyState =3D modifiers; add =3D w32_kbd_patch_key (&key); /* 0 means an unrecognized keycode, negative means dead key. Ignore both. */ while (--add >=3D 0) { /* Forward asciified character sequence. */ post_character_message (hwnd, WM_CHAR, (unsigned char) key.uChar.AsciiChar, lParam, w32_get_key_modifiers (wParam, lParam)); w32_kbd_patch_key (&key); } return 0; } It looks like it's re-posting the event with just the Ascii key code, clobbering the Unicode info that's originally in wParam. Or maybe the idea is to translate characters that require multiple bytes into multiple events?