From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Ilya Zakharevich Newsgroups: gmane.emacs.bugs Subject: bug#19994: 25.0.50; Unicode keyboard input on Windows Date: Tue, 3 Mar 2015 15:09:49 -0800 Message-ID: <20150303230949.GA29784@math.berkeley.edu> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: ger.gmane.org 1425424291 680 80.91.229.3 (3 Mar 2015 23:11:31 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 3 Mar 2015 23:11:31 +0000 (UTC) To: 19994@debbugs.gnu.org Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Wed Mar 04 00:11:15 2015 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1YSvy5-0001MY-L6 for geb-bug-gnu-emacs@m.gmane.org; Wed, 04 Mar 2015 00:11:13 +0100 Original-Received: from localhost ([::1]:41282 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YSvy4-00062o-TK for geb-bug-gnu-emacs@m.gmane.org; Tue, 03 Mar 2015 18:11:12 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:56199) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YSvxy-00061A-Sy for bug-gnu-emacs@gnu.org; Tue, 03 Mar 2015 18:11:09 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YSvxv-0000RX-Gx for bug-gnu-emacs@gnu.org; Tue, 03 Mar 2015 18:11:06 -0500 Original-Received: from debbugs.gnu.org ([140.186.70.43]:35762) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YSvxv-0000RT-CD for bug-gnu-emacs@gnu.org; Tue, 03 Mar 2015 18:11:03 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.80) (envelope-from ) id 1YSvxu-0004pH-An for bug-gnu-emacs@gnu.org; Tue, 03 Mar 2015 18:11:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Ilya Zakharevich Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Tue, 03 Mar 2015 23:11:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 19994 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: X-Debbugs-Original-To: bug-gnu-emacs@gnu.org Original-Received: via spool by submit@debbugs.gnu.org id=B.142542421418490 (code B ref -1); Tue, 03 Mar 2015 23:11:02 +0000 Original-Received: (at submit) by debbugs.gnu.org; 3 Mar 2015 23:10:14 +0000 Original-Received: from localhost ([127.0.0.1]:34330 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1YSvx5-0004o6-H6 for submit@debbugs.gnu.org; Tue, 03 Mar 2015 18:10:13 -0500 Original-Received: from eggs.gnu.org ([208.118.235.92]:54399) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1YSvx2-0004nf-0f for submit@debbugs.gnu.org; Tue, 03 Mar 2015 18:10:09 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YSvwu-0007d3-D9 for submit@debbugs.gnu.org; Tue, 03 Mar 2015 18:10:02 -0500 Original-Received: from lists.gnu.org ([2001:4830:134:3::11]:58599) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YSvwu-0007cs-9M for submit@debbugs.gnu.org; Tue, 03 Mar 2015 18:10:00 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:56113) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YSvwr-0005mE-ND for bug-gnu-emacs@gnu.org; Tue, 03 Mar 2015 18:10:00 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YSvwn-0007Wz-Vv for bug-gnu-emacs@gnu.org; Tue, 03 Mar 2015 18:09:57 -0500 Original-Received: from nm22-vm7.bullet.mail.gq1.yahoo.com ([98.136.217.70]:43395) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YSvwn-0007Vx-Ig for bug-gnu-emacs@gnu.org; Tue, 03 Mar 2015 18:09:53 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1425424192; bh=My7jnpCUBEYHRUR4tstqz2azTsNPFSefnssFO3QSvCo=; h=Date:From:To:Subject:From:Subject; b=eTb8aTrjDUf8AWgeB3ZwfYIFIWXW7EmpqQved0oKxzOkatc/2u7yljxwWCaeQj0ffDL5M5FlQXh0x4E9KdtPu4mAf8MpmJuURq45I81svRt3MdFR4oucjPh7W+8W/uVkIxRwo3Ga/I6HC8cqvY8VPUII3ozFD72BhQrAru1aNNUBDxRZ7lhW9PsLbLY6bD99vtr14VYLj7huE0WFy3zVYO23j7ImE7A5HPIs6eT8DgTH3ZA6TUCaERTcg3h+oi6iN+wplAa0xCuV6A8jBioGDomCJKoIPNQOIY7pJ6k8i7kOemIEGa6GShPqV5q39QwSJZ2wnmA4/eD52mZbgftUkg== Original-Received: from [98.137.12.60] by nm22.bullet.mail.gq1.yahoo.com with NNFMP; 03 Mar 2015 23:09:52 -0000 Original-Received: from [208.71.42.194] by tm5.bullet.mail.gq1.yahoo.com with NNFMP; 03 Mar 2015 23:09:52 -0000 Original-Received: from [127.0.0.1] by smtp205.mail.gq1.yahoo.com with NNFMP; 03 Mar 2015 23:09:52 -0000 X-Yahoo-Newman-Id: 547531.29739.bm@smtp205.mail.gq1.yahoo.com X-Yahoo-Newman-Property: ymail-3 X-YMail-OSG: YDoBvOIVM1l6qCFNjekFMal3CU6ssiWu5uZAbkr6oysMfV6 9n1VRLuWIcN4WlrPm895XzOZN6Yh4PG2ygQXxaDBuCXs7iQeRUrXKuXnpJaB EP9eV1t2isuhWPQ6u6FCeoQ_IyRV_XtH34NML8IMDtlNTYQvQeaDybQUNaOa .AAJKSM42.b0n1WFE1ySw5H3_3Sv16SniHVOJVsdyB.bshYsMC_bCX_BUBu7 DLgtYEWJlJCXHJZlrEUeHQNkK0JHLX7e6XY.HOhvcvbymtv1pv_dCOPzQX9r 0xaAdt_0HN43JGNVbXl.YDqGwoRipXuMZMXcebH1Bl8b7_Q2zRSZI5oMyDrP 6FYWwKsvhAkx6_Cg2nJR9TQMJcf96dGuBtyardMHAj2wo2JnnrPTuhFPPN3G ctBqDyKGRPoRAoDdGc1FYJ1kwyK06MPT4mH2oJi4Mn42UpLkqb5D0OMCwdUI L6FwVIxuDsWkFQCWOMEpxL6Nu6ryUSeofi4k9Y9fo.Z6tvjNND5ROeqZc4YE yPQN4HgqQKKWRrIJB2LR8vcZlGkfP0zrI3s_V41JlPZP7ViEVCToCIR6cRbd DG55oseXI9MokVmfACfptB7tZXqFh3KFcrMXGMrtFeS1P1LLV020ziga2mUI bFaNbY2UNwGNE5zrIE_96YRHppUSx7Ztu0eJkj32kpaNimspLV4d0ogQMW2A F3imtAMSOt5oS2JQIUmcs2VxAsYSF7c6kyfLPZlx_jmK375TZW5OagHtj55l Ds3SJl3mEBvJuiUgd6ibNLvJcY7AEDO1pJP87gdG9puT5Wi8tqf9Rc86XZmY ggpq.crWTMtxs1c81svkLcsgGzUMnTyY X-Yahoo-SMTP: oLSY3dWswBBqoBVzCkLl_RIsw6heKMxu8wpEbARv1SU- Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:100036 Archived-At: I’m working on a patch to make Unicode keyboard input to work properly on Windows (in graphic mode). The problems with the current implementation stem from the facts that • on Windows, it IS possible to implement a bullet-proof system of Unicode input (at least, for GUI applications); • However, how to do it is completely undocumented. [See http://search.cpan.org/~ilyaz/UI-KeyboardLayout/lib/UI/KeyboardLayout.pm#Keyboard_input_on_Windows:_interaction_of_applications_and_the_kernel ] So, essentially, all developers of applications try to design their own set of heuristical approaches which • cover several keyboard layouts they can put their hands on; • more or less follow the design goals of their applications. The approach taken by Emacs is to break the keyboard keys (VK’s) into several groups, and treat different groups differently. Only the keys on the main island of the keyboard may input characters. Moreover, only the most common combinations of modifiers are allowed to be used for the character input. (In addition, there are plain bugs — like treating UTF-16 as if it were UTF-32.) [I gave a very terse description on https://groups.google.com/forum/?hl=en#!search/emacs$20keyboard$20windows$20ilya/gnu.emacs.help/ZHpZK2YfFuo/aAyZFUxrFeEJ ] The “correct” approach should proceed in exactly the opposite direction: if a keypress produces a character, it should be treated as a character — no matter where on the physical keyboard the key is residing, and which modifiers were pressed. The patch below • Implements this “primacy of characters” doctrine; • As far as I could see, is compatible with the current work of Emacs on “simple keyboard layouts”; • Worked at some moment (before I started a massive addition of comments ;-] — and maybe it is still working, I did not touch it for a month); • (Currently) ignores the indent coding rules; • Passes all the test thrown at it by my super-puper-all-bells-and-whistles layouts; see e.g. http://k.ilyaz.org/windows/izKeys-visual-maps.html#examples • Is not bullet-proof: ∘ I use one heuristic to detect which modifiers are “consumed” by the character input, and which are “on top” of character input; ∘ It does not (same as the current Emacs) support Unicode-entered-by-Alt-numbers. • Does not fix a bug with UTF-16 of stand-alone (pumped to us) WM_CHAR’s. If I ever find more time to work on it, I plan to: 1) Add yet more documentation; 2) Change a little bit the logic of detection of consumed/extra modifiers. This change may be cosmetic only — or maybe, with some extremely devilous layouts, it may be beneficial. (I have not seen layouts where this change would matter, though! And I looked though the source code of hundred(s).) 3) Bring it in sync with the Emacs coding style. Meanwhile, I would greatly appreciate all input related to the current state of the patch. (I *HOPE* that I did not break (many!) special cases in the current implementation — but such things are hard to be sure in!) Thanks for the parts of Emacs which ARE working great, Ilya ======================================================= --- w32fns.c-ini 2015-01-30 15:33:23.505201400 -0800 +++ w32fns.c 2015-02-15 02:46:12.070091800 -0800 @@ -2832,6 +2832,126 @@ post_character_message (HWND hwnd, UINT my_post_msg (&wmsg, hwnd, msg, wParam, lParam); } +static int +get_wm_chars (HWND aWnd, int *buf, int buflen, int ignore_ctrl, int ctrl, int *ctrl_cnt, int *is_dead, int vk, int exp) +{ + MSG msg; + int i = buflen, doubled = 0, code_unit; /* If doubled is at the end, ignore it */ + if (ctrl_cnt) + *ctrl_cnt = 0; + if (is_dead) + *is_dead = -1; + while (buflen && /* Should be called only when w32_unicode_gui */ + PeekMessageW(&msg, aWnd, WM_KEYFIRST, WM_KEYLAST, PM_NOREMOVE | PM_NOYIELD) && + (msg.message == WM_CHAR || msg.message == WM_SYSCHAR || + msg.message == WM_DEADCHAR || msg.message == WM_SYSDEADCHAR || msg.message == WM_UNICHAR)) { /* Not contigious */ + int dead; + + GetMessageW(&msg, aWnd, msg.message, msg.message); + dead = (msg.message == WM_DEADCHAR || msg.message == WM_SYSDEADCHAR); + if (is_dead) + *is_dead = (dead ? msg.wParam : -1); + if (dead) + continue; + code_unit = msg.wParam; + if (doubled) { /* had surrogate */ + if (msg.message == WM_UNICHAR || code_unit < 0xDC00 || code_unit > 0xDFFF) { + /* Mismatched first surrogate. Pass both code units as if they were two characters. */ + *buf++ = doubled; + if (!--buflen) // Drop the second char if at the end of the buffer + return i; + } else { + code_unit = (doubled << 10) + code_unit - 0x35FDC00; + } + doubled = 0; + } else if (code_unit >= 0xD800 && code_unit <= 0xDBFF) { + doubled = code_unit; + continue; + } /* We handle mismatched second surrogate the same as a normal character. */ + /* The only "fake" characters delivered by ToUnicode() or TranslateMessage() are: + 0x01 .. 0x1a for Control-chars, + 0x00 and 0x1b .. 0x1f for Control- []\@^_ + 0x7f for Control-BackSpace + 0x20 for Control-Space */ + if (ignore_ctrl && (code_unit < 0x20 || code_unit == 0x7f || (code_unit == 0x20 && ctrl))) { + /* Non-character payload in a WM_CHAR (Ctrl-something pressed). Ignore. */ + if (ctrl_cnt) + *ctrl_cnt++; + continue; + } + if (code_unit < 0x7f && + ((vk >= VK_NUMPAD0 && vk <= VK_DIVIDE) || + (exp && ((vk >= VK_PRIOR && vk <= VK_DOWN) || + vk == VK_INSERT || vk == VK_DELETE || vk == VK_CLEAR))) && + strchr("0123456789/*-+.,", code_unit)) /* Traditionally, Emacs translates these to characters later, in `self-insert-character' */ + continue; + *buf++ = code_unit; + buflen--; + } + return i - buflen; +} + +int +deliver_wm_chars (int do_translate, HWND hwnd, UINT msg, UINT wParam, UINT lParam) +{ + /* An "old style" keyboard description may assign up to 125 UTF-16 code points to a keypress. + (However, the "old style" TranslateMessage() would deliver at most 16 of them.) Be on a + safe side, and prepare to treat many more. */ + int ctrl_cnt, buf[1024], count, is_dead; + + if (do_translate) { + MSG windows_msg = { hwnd, msg, wParam, lParam, 0, {0,0} }; + + windows_msg.time = GetMessageTime (); + TranslateMessage (&windows_msg); + } + count = get_wm_chars (hwnd, buf, sizeof(buf)/sizeof(*buf), 1, + /* The message may have been synthesized by who knows what; be conservative. */ + modifier_set (VK_LCONTROL) || modifier_set (VK_RCONTROL) || modifier_set (VK_CONTROL), + &ctrl_cnt, &is_dead, wParam, (lParam & 0x1000000L) != 0); + if (count) { + W32Msg wmsg; + int *b = buf, strip_Alt = 1; + + /* wParam is checked when converting CapsLock to Shift */ + wmsg.dwModifiers = do_translate ? w32_get_key_modifiers (wParam, lParam) : 0; + + /* What follows is just heuristics; the correct treatement requires non-destructive ToUnicode(). */ + if (wmsg.dwModifiers & ctrl_modifier) /* If ctrl-something delivers chars, ctrl and the rest should be hidden */ + wmsg.dwModifiers = wmsg.dwModifiers & shift_modifier; + /* In many keyboard layouts, (left) Alt is not changing the character. Unless we are in this situation, strip Alt/Meta. */ + if (wmsg.dwModifiers & (alt_modifier | meta_modifier) && /* If alt-something delivers non-ASCIIchars, alt should be hidden */ + count == 1 && *b < 0x10000) { + SHORT r = VkKeyScanW( *b ); + + fprintf(stderr, "VkKeyScanW %#06x %#04x\n", (int)r, wParam); + if ((r & 0xFF) == wParam && !(r & ~0x1FF)) { /* Char available without Alt modifier, so Alt is "on top" */ + if (*b > 0x7f && ('A' <= wParam && wParam <= 'Z')) + return 0; /* Another branch below would convert it to Alt-Latin char via wParam */ + strip_Alt = 0; + } + } + if (strip_Alt) + wmsg.dwModifiers = wmsg.dwModifiers & ~(alt_modifier | meta_modifier); + + signal_user_input (); + while (count--) + { + fprintf(stderr, "unichar %#06x\n", *b); + my_post_msg (&wmsg, hwnd, WM_UNICHAR, *b++, lParam); + } + if (!ctrl_cnt) /* Process ALSO as ctrl */ + return 1; + else + fprintf(stderr, "extra ctrl char\n"); + return -1; + } else if (is_dead >= 0) { + fprintf(stderr, "dead %#06x\n", is_dead); + return 1; + } + return 0; +} + /* Main window procedure */ static LRESULT CALLBACK @@ -3007,7 +3127,6 @@ w32_wnd_proc (HWND hwnd, UINT msg, WPARA /* Synchronize modifiers with current keystroke. */ sync_modifiers (); record_keydown (wParam, lParam); - wParam = map_keypad_keys (wParam, (lParam & 0x1000000L) != 0); windows_translate = 0; @@ -3117,6 +3236,45 @@ w32_wnd_proc (HWND hwnd, UINT msg, WPARA wParam = VK_NUMLOCK; break; default: + if (w32_unicode_gui) { + /* If this event generates characters or deadkeys, do not interpret + it as a "raw combination of modifiers and keysym". Hide + deadkeys, and use the generated character(s) instead of the + keysym. (Backward compatibility: exceptions for numpad keys + generating 0-9 . , / * - +, and for extra-Alt combined with a + non-Latin char.) + + Try to not report modifiers which have effect on which + character or deadkey is generated. + + Example (contrived): if rightAlt-? generates f (on a Cyrillic + keyboard layout), and Ctrl, leftAlt do not affect the generated + character, one wants to report Ctrl-leftAlt-f if the user + presses Ctrl-leftAlt-rightAlt-?. */ + int res; +#if 0 + /* Some of WM_CHAR may be fed to us directly, some are results of + TranslateMessage(). Using 0 as the first argument (in a + separate call) might help us distinguish these two cases. + + However, the keypress feeders would most probably expect the + "standard" message pump, when TranslateMessage() is called on + EVERY KeyDown/Keyup event. So they may feed us Down-Ctrl + Down-FAKE Char-o and expect us to recognize it as Ctrl-o. + Using 0 as the first argument would interfere with this. */ + deliver_wm_chars (0, hwnd, msg, wParam, lParam); +#endif + /* Processing the generated WM_CHAR messages *WHILE* we handle + KEYDOWN/UP event is the best choice, since withoug any fuss, + we know all 3 of: scancode, virtual keycode, and expansion. + (Additionally, one knows boundaries of expansion of different + keypresses.) */ + res = deliver_wm_chars (1, hwnd, msg, wParam, lParam); + windows_translate = -( res != 0 ); + if (res > 0) /* Bound to character(s) or a deadkey */ + break; + } /* Some branches after this one may be not needed */ + wParam = map_keypad_keys (wParam, (lParam & 0x1000000L) != 0); /* If not defined as a function key, change it to a WM_CHAR message. */ if (wParam > 255 || !lispy_function_keys[wParam]) { @@ -3184,6 +3342,8 @@ w32_wnd_proc (HWND hwnd, UINT msg, WPARA } } + if (windows_translate == -1) + break; translate: if (windows_translate) { ======================================================= In GNU Emacs 25.0.50.20 (i686-pc-mingw32) of 2015-02-08 on BUCEFAL Repository revision: d5e3922e08587e7eb9e5aec2e9f84cbda405f857 Windowing system distributor `Microsoft Corp.', version 6.1.7601 Configured using: `configure --prefix=/k/test' Configured features: SOUND NOTIFY ACL Important settings: value of $LANG: ENU locale-coding-system: cp1252 Major mode: Fundamental Minor modes in effect: tooltip-mode: t global-eldoc-mode: t electric-indent-mode: t mouse-wheel-mode: t tool-bar-mode: t menu-bar-mode: t file-name-shadow-mode: t global-font-lock-mode: t blink-cursor-mode: t auto-composition-mode: t auto-encryption-mode: t auto-compression-mode: t buffer-read-only: t line-number-mode: t Recent messages: For information about GNU Emacs and the GNU system, type C-h C-a. Load-path shadows: None found. Features: (shadow sort gnus-util mail-extr emacsbug message dired format-spec rfc822 mml easymenu mml-sec mm-decode mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader sendmail rfc2047 rfc2045 ietf-drums mm-util help-fns mail-prsvr mail-utils time-date tooltip eldoc electric uniquify ediff-hook vc-hooks lisp-float-type mwheel dos-w32 ls-lisp disp-table w32-win w32-vars tool-bar dnd fontset image regexp-opt fringe tabulated-list newcomment elisp-mode lisp-mode prog-mode register page menu-bar rfn-eshadow timer select scroll-bar mouse jit-lock font-lock syntax facemenu font-core frame cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean japanese hebrew greek romanian slovak czech european ethiopic indian cyrillic chinese case-table epa-hook jka-cmpr-hook help simple abbrev minibuffer cl-preloaded nadvice loaddefs button faces cus-face macroexp files text-properties overlay sha1 md5 base64 format env code-pages mule custom widget hashtable-print-readable backquote make-network-process w32notify w32 multi-tty emacs) Memory information: ((conses 8 80324 9864) (symbols 32 17968 0) (miscs 32 85 128) (strings 16 12688 4007) (string-bytes 1 324435) (vectors 8 9470) (vector-slots 4 390690 6074) (floats 8 65 62) (intervals 28 243 45) (buffers 516 13))