From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Ilya Zakharevich Newsgroups: gmane.emacs.bugs Subject: bug#19994: 25.0.50; Unicode keyboard input on Windows Date: Wed, 8 Jul 2015 17:02:59 -0700 Message-ID: <20150709000259.GA7163@math.berkeley.edu> References: <20150303230949.GA29784@math.berkeley.edu> <83bnk8prqa.fsf@gnu.org> <20150701100712.GA24175@math.berkeley.edu> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="Q68bSM7Ycu6FN28Q" Content-Transfer-Encoding: 8bit X-Trace: ger.gmane.org 1436400270 21999 80.91.229.3 (9 Jul 2015 00:04:30 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 9 Jul 2015 00:04:30 +0000 (UTC) Cc: 19994@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Thu Jul 09 02:04:17 2015 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1ZCzK3-0002r5-Tl for geb-bug-gnu-emacs@m.gmane.org; Thu, 09 Jul 2015 02:04:16 +0200 Original-Received: from localhost ([::1]:37222 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZCzK2-0003Cd-VR for geb-bug-gnu-emacs@m.gmane.org; Wed, 08 Jul 2015 20:04:14 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:44763) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZCzJu-0003CX-Pi for bug-gnu-emacs@gnu.org; Wed, 08 Jul 2015 20:04:11 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZCzJq-00067T-Hm for bug-gnu-emacs@gnu.org; Wed, 08 Jul 2015 20:04:06 -0400 Original-Received: from debbugs.gnu.org ([140.186.70.43]:43124) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZCzJq-00067B-CX for bug-gnu-emacs@gnu.org; Wed, 08 Jul 2015 20:04:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.80) (envelope-from ) id 1ZCzJp-0005mJ-QE for bug-gnu-emacs@gnu.org; Wed, 08 Jul 2015 20:04:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Ilya Zakharevich Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 09 Jul 2015 00:04:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 19994 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 19994-submit@debbugs.gnu.org id=B19994.143640019722160 (code B ref 19994); Thu, 09 Jul 2015 00:04:01 +0000 Original-Received: (at 19994) by debbugs.gnu.org; 9 Jul 2015 00:03:17 +0000 Original-Received: from localhost ([127.0.0.1]:44570 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1ZCzJ4-0005lJ-I6 for submit@debbugs.gnu.org; Wed, 08 Jul 2015 20:03:17 -0400 Original-Received: from nm12-vm7.bullet.mail.gq1.yahoo.com ([98.136.218.206]:51838) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1ZCzIz-0005l3-9u for 19994@debbugs.gnu.org; Wed, 08 Jul 2015 20:03:12 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1436400183; bh=DxzVPgUsJPiR72x2TfhbUTqOKsPmabVRe2B5y5vrjy8=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From:Subject; b=uifTgYcVQuQmro7nz2nb2zTaezB3Lh0h/pBC/1u/vbFCIVJgtpmTGyVY87IZxtMiUXd2btI3MW9CA+ULfECATdpU2XgRU296E+OfsO4eX3rosKcO5kIqx7Af5Zi/8WWpQj0fefDELjGtANPxKzAIfIRRKToO2gBXx0KU31LafXAXdKM0FjmgenjPaXCpZ4RMnh632kIHpnHeTE4VjBVHPTsQpdO7c67XMV88/mkmesrq9fYv7aJBkhhoBDY5v/AGKQCSilUUKDLwxXyKrLE0pRHJfe6R/nkYCuSJihgbK0IkUBbnG05+Htt23WPy1LIMJ0ZFhXDwiMRFMA2ftVzOXw== Original-Received: from [98.137.12.62] by nm12.bullet.mail.gq1.yahoo.com with NNFMP; 09 Jul 2015 00:03:03 -0000 Original-Received: from [208.71.42.199] by tm7.bullet.mail.gq1.yahoo.com with NNFMP; 09 Jul 2015 00:03:03 -0000 Original-Received: from [127.0.0.1] by smtp210.mail.gq1.yahoo.com with NNFMP; 09 Jul 2015 00:03:03 -0000 X-Yahoo-Newman-Id: 303987.68193.bm@smtp210.mail.gq1.yahoo.com X-Yahoo-Newman-Property: ymail-3 X-YMail-OSG: azpXjMsVM1mnSmZ2QmGIqfSxsnVJ1UhbI1tef.aZbbjiAZ_ QrZeU0bZt925AhhRFMEcH9S52RpA.PH60x1RNB1aDbCEJrU.OjemLzBB8nV3 t1yR.nsH.Bw7_angeoRqmTpkikryxCYlybjMTE85Kjen4WfoZg.ywzw8Tmyp 0fCs9HI2QSAnjorTS0.Z5QRFl6nyQPGBi_KMlVi7DZK69SlK8bDuWp9Tuett dDU9JYcS_e12vkEniqxVM_PAIh7HzX7cRe3AX9BfgnsAXWcQi9k1xDgiVEvt mc3P6PnlKbkCYFeNpwg7mQ2nkZduhCvBiq.rbJEAn2ATXCc0dI8OY0BieQku MLF5WhoJzip9wjerfEH.1FOQxNo_v6H4RzQnwPiITUirrFXOAij1aLBZ.ZKt bojDIAiHdMdZpVDRCXQEYCinkY6BtKVV7h0j3ZaNVJT0d7Le0S2lLkxRw8v0 iqND5Es3U6MJr4pMlFmxUWsSJpKWjBpTSihJA6v0ig8qsb0nCRgd1ukDJqOF tu2fcd4b.BZS3MGpFNFi_zWoxOJHkiRdmR83e2eS_dRPR X-Yahoo-SMTP: oLSY3dWswBBqoBVzCkLl_RIsw6heKMxu8wpEbARv1SU- Content-Disposition: inline In-Reply-To: <20150701100712.GA24175@math.berkeley.edu> User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:104841 Archived-At: --Q68bSM7Ycu6FN28Q Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit On Wed, Jul 01, 2015 at 03:07:12AM -0700, Ilya Zakharevich wrote: > On Wed, Mar 04, 2015 at 08:01:01PM +0200, Eli Zaretskii wrote: > > I suggest, indeed, to clean up the code so we could commit it to the > > master branch. That way, it will get wider testing, and we can fix > I had no time to work on the code itself, but > • I fixed the formatting, > • I pumped up the docs, > • I put in the suggested eassert(). The variant I sent was too primitive — it was not covering a (common?) usage case when (with AltGr-layouts) leftCtrl+rightCtrl was behaving differently than pressing AltGr: • leftCtrl+rightCtrl would trigger C-M-key; • altGr would enter the character payload. This update (0) fixes two formatting-style omissions; (A) adds A LOAD of new comments; (B) treats such important cases (as above) separately; (z) Marks a piece of old code which does not make any sense. (see the last chunk in the relative patch) Notes: • In (B), there are some decisions to make. I encapsulate these decisions into two strings. For best result, these strings should be user-customizable. However, currently they are just put into C #defines. When I sit on this more, and if these customizations turn out to be useful, one can make them into Lisp variables. • There is a bug in the (old) Emacs code which prevents some cases treated in (B) from being really useful. I did not fix it yet. To see the bug: ∘ switch to layout with AltGr; ∘ assume that AltGr-s produces ß (as with US International); ∘ pressing AltGr-rightControl-s produces Meta-ß; ∘ pressing rightControl-AltGr-s produces C-M-s. (I do not think this effect is intentional.) • And, BTW, is it documented anywhere that leftControl-rightControl-key produces C-M-key? I include two patches: □ absolute (ignore the previous patches) □ relative (with whitespace ignored) — for reading. Enjoy, Ilya --Q68bSM7Ycu6FN28Q Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="w32fns.c-diff-v2" --- w32fns.c-ini 2015-01-30 15:33:23.505201400 -0800 +++ w32fns.c 2015-07-08 16:32:11.187197700 -0700 @@ -2832,6 +2832,413 @@ post_character_message (HWND hwnd, UINT my_post_msg (&wmsg, hwnd, msg, wParam, lParam); } +static int +get_wm_chars (HWND aWnd, int *buf, int buflen, int ignore_ctrl, int ctrl, + int *ctrl_cnt, int *is_dead, int vk, int exp) +{ + MSG msg; + /* If doubled is at the end, ignore it */ + int i = buflen, doubled = 0, code_unit; + + if (ctrl_cnt) + *ctrl_cnt = 0; + if (is_dead) + *is_dead = -1; + eassert(w32_unicode_gui); + while (buflen + /* Should be called only when w32_unicode_gui: */ + && PeekMessageW(&msg, aWnd, WM_KEYFIRST, WM_KEYLAST, + PM_NOREMOVE | PM_NOYIELD) + && (msg.message == WM_CHAR || msg.message == WM_SYSCHAR + || msg.message == WM_DEADCHAR || msg.message == WM_SYSDEADCHAR + || msg.message == WM_UNICHAR)) + { + /* We extract character payload, but in this call we handle only the + characters which comes BEFORE the next keyup/keydown message. */ + int dead; + + GetMessageW(&msg, aWnd, msg.message, msg.message); + dead = (msg.message == WM_DEADCHAR || msg.message == WM_SYSDEADCHAR); + if (is_dead) + *is_dead = (dead ? msg.wParam : -1); + if (dead) + continue; + code_unit = msg.wParam; + if (doubled) + { + /* had surrogate */ + if (msg.message == WM_UNICHAR + || code_unit < 0xDC00 || code_unit > 0xDFFF) + { /* Mismatched first surrogate. + Pass both code units as if they were two characters. */ + *buf++ = doubled; + if (!--buflen) + return i; /* Drop the 2nd char if at the end of the buffer. */ + } + else /* see https://en.wikipedia.org/wiki/UTF-16 */ + { + code_unit = (doubled << 10) + code_unit - 0x35FDC00; + } + doubled = 0; + } + else if (code_unit >= 0xD800 && code_unit <= 0xDBFF) + { + /* Handle mismatched 2nd surrogate the same as a normal character. */ + doubled = code_unit; + continue; + } + + /* The only "fake" characters delivered by ToUnicode() or + TranslateMessage() are: + 0x01 .. 0x1a for Ctrl-letter, Enter, Tab, Ctrl-Break, Esc, Backspace + 0x00 and 0x1b .. 0x1f for Control- []\@^_ + 0x7f for Control-BackSpace + 0x20 for Control-Space */ + if (ignore_ctrl + && (code_unit < 0x20 || code_unit == 0x7f + || (code_unit == 0x20 && ctrl))) + { + /* Non-character payload in a WM_CHAR + (Ctrl-something pressed, see above). Ignore, and report. */ + if (ctrl_cnt) + *ctrl_cnt++; + continue; + } + /* Traditionally, Emacs would ignore the character payload of VK_NUMPAD* + keys, and would treat them later via `function-key-map'. In addition + to usual 102-key NUMPAD keys, this map also treats `kp-'-variants of + space, tab, enter, separator, equal. TAB and EQUAL, apparently, + cannot be generated on Win-GUI branch. ENTER is already handled + by the code above. According to `lispy_function_keys', kp_space is + generated by not-extended VK_CLEAR. (kp-tab != VK_OEM_NEC_EQUAL!). + + We do similarly for backward-compatibility, but ignore only the + characters restorable later by `function-key-map'. */ + if (code_unit < 0x7f + && ((vk >= VK_NUMPAD0 && vk <= VK_DIVIDE) + || (exp && ((vk >= VK_PRIOR && vk <= VK_DOWN) || + vk == VK_INSERT || vk == VK_DELETE || vk == VK_CLEAR))) + && strchr("0123456789/*-+.,", code_unit)) + continue; + *buf++ = code_unit; + buflen--; + } + return i - buflen; +} + +#ifdef DBG_WM_CHARS +# define FPRINTF_WM_CHARS(ARG) fprintf ARG +#else +# define FPRINTF_WM_CHARS(ARG) 0 +#endif + +/* This is a heuristic only. This is supposed to track the state of the + finite automaton in the language environment of Windows. + + However, separate windows (if with the same different language + environments!) should have different values. Moreover, switching to a + non-Emacs window with the same language environment, and using (dead)keys + there would change the value stored in the kernel, but not this value. */ +static int after_deadkey = 0; + +int +deliver_wm_chars (int do_translate, HWND hwnd, UINT msg, UINT wParam, + UINT lParam, int legacy_alt_meta) +{ + /* An "old style" keyboard description may assign up to 125 UTF-16 code + points to a keypress. + (However, the "old style" TranslateMessage() would deliver at most 16 of + them.) Be on a safe side, and prepare to treat many more. */ + int ctrl_cnt, buf[1024], count, is_dead, after_dead = (after_deadkey != -1); + + /* Since the keypress processing logic of Windows has a lot of state, it + is important to call TranslateMessage() for every keyup/keydown, AND + do it exactly once. (The actual change of state is done by + ToUnicode[Ex](), which is called by TranslateMessage(). So one can + call ToUnicode[Ex]() instead.) + + The "usual" message pump calls TranslateMessage() for EVERY event. + Emacs calls TranslateMessage() very selectively (is it needed for doing + some tricky stuff with Win95??? With newer Windows, selectiveness is, + most probably, not needed - and harms a lot). + + So, with the usual message pump, the following call to TranslateMessage() + is not needed (and is going to be VERY harmful). With Emacs' message + pump, the call is needed. */ + if (do_translate) + { + MSG windows_msg = { hwnd, msg, wParam, lParam, 0, {0,0} }; + + windows_msg.time = GetMessageTime (); + TranslateMessage (&windows_msg); + } + count = get_wm_chars (hwnd, buf, sizeof(buf)/sizeof(*buf), 1, + /* The message may have been synthesized by + who knows what; be conservative. */ + modifier_set (VK_LCONTROL) + || modifier_set (VK_RCONTROL) + || modifier_set (VK_CONTROL), + &ctrl_cnt, &is_dead, wParam, + (lParam & 0x1000000L) != 0); + if (count) + { + W32Msg wmsg; + DWORD console_modifiers = construct_console_modifiers (); + int *b = buf, strip_Alt = 1, strip_ExtraMods = 1, hairy = 0; + char *type_CtrlAlt = NULL; + + /* XXXX In fact, there may be another case when we need to do the same: + What happens if the string defined in the LIGATURES has length + 0? Probably, we will get count==0, but the state of the finite + automaton would reset to 0??? */ + after_deadkey = -1; + + /* wParam is checked when converting CapsLock to Shift; this is a clone + of w32_get_key_modifiers (). */ + wmsg.dwModifiers = w32_kbd_mods_to_emacs (console_modifiers, wParam); + + /* What follows is just heuristics; the correct treatement requires + non-destructive ToUnicode(): + http://search.cpan.org/~ilyaz/UI-KeyboardLayout/lib/UI/KeyboardLayout.pm#Can_an_application_on_Windows_accept_keyboard_events?_Part_IV:_application-specific_modifiers + + What one needs to find is: + * which of the present modifiers AFFECT the resulting char(s) + (so should be stripped, since their EFFECT is "already + taken into account" in the string in buf), and + * which modifiers are not affecting buf, so should be reported to + the application for further treatment. + + Example: assume that we know: + (A) lCtrl+rCtrl+rAlt modifiers with VK_A key produce a Latin "f" + ("may be logical" in JCUKEN-flavored Russian keyboard flavors); + (B) removing any of lCtrl, rCtrl, rAlt changes the produced char; + (C) Win-modifier is not affecting the produced character + (this is the common case: happens with all "standard" layouts). + + Suppose the user presses Win+lCtrl+rCtrl+rAlt modifiers with VK_A. + What is the intent of the user? We need to guess the intent to decide + which event to deliver to the application. + + This looks like a reasonable logic: since Win- modifier doesn't affect + the output string, the user was pressing Win for SOME OTHER purpose. + So the user wanted to generate Win-SOMETHING event. Now, what is + something? If one takes the mantra that "character payload is more + important than the combination of keypresses which resulted in this + payload", then one should ignore lCtrl+rCtrl+rAlt, ignore VK_A, and + assume that the user wanted to generate Win-f. + + Unfortunately, without non-destructive ToUnicode(), checking (B),(C) + is out of question. So we use heuristics (hopefully, covering + 99.9999% of cases). + */ + + /* Another thing to watch for is a possibility to use AltGr-* and + Ctrl-Alt-* with different semantic. + + Background: the layout defining the KLLF_ALTGR bit are treated + specially by the kernel: when VK_RMENU (=rightAlt, =AltGr) is pressed + (released), a press (release) of VK_LCONTROL is emulated (unless Ctrl + is already down). As a result, any press/release of AltGr is seen + by applications as a press/release of lCtrl AND rAlt. This is + applicable, in particular, to ToUnicode[Ex](). (Keyrepeat is covered + the same way!) + + NOTE: it IS possible to see bare rAlt even with KLLF_ALTGR; but this + requires a good finger coordination: doing (physically) + Down-lCtrl Down-rAlt Up-lCtrl Down-a + (doing quick enough, so that key repeat of rAlt [which would + generate new "fake" Down-lCtrl events] does not happens before 'a' + is down) results in no "fake" events, so the application will see + only rAlt down when 'a' is pressed. (However, fake Up-lCtrl WILL + be generated when rAlt goes UP.) + + In fact, note also that KLLF_ALTGR does not prohibit construction of + rCtrl-rAlt (just press them in this order!). + + Moreover: "traditional" layouts do not define distinct modifier-masks + for VK_LMENU and VK_RMENU (same for VK_L/RCONTROL). Instead, they + rely on the KLLF_ALTGR bit to make the behaviour of VK_LMENU and + VK_RMENU distinct. As a corollary, for such layouts, the produced + character is the same for AltGr-* (=rAlt-*) and Ctrl-Alt-* (in any + combination of handedness). For description of masks, see + + http://search.cpan.org/~ilyaz/UI-KeyboardLayout/lib/UI/KeyboardLayout.pm#Keyboard_input_on_Windows,_Part_I:_what_is_the_kernel_doing? + + By default, Emacs was using these coincidences via the following + heuristics: it was treating: + (*) keypresses with lCtrl-rAlt modifiers as if they are carrying + ONLY the character payload (no matter what the actual keyboard + was defining: if lCtrl-lAlt-b was delivering U+05df=beta, then + Emacs saw [beta]; if lCtrl-lAlt-b was undefined in the layout, + the keypress was completely ignored), and + (*) keypresses with the other combinations of handedness of Ctrl-Alt + modifiers (e.g., lCtrl-lAlt) as if they NEVER carry a character + payload (so they were reported "raw": if lCtrl-lAlt-b was + delivering beta, then Emacs saw event [C-A-b], and not [beta]). + This worked good for "traditional" layouts: users could type both + AltGr-x and Ctrl-Alt-x, and one was a character, another a bindable + event. + + However, for layouts which deliver different characters for AltGr-x + and lCtrl-lAlt-x, this scheme makes the latter character unaccessible + in Emacs. While it is easy to access functionality of [C-M-x] in + Emacs by other means (for example, by the `controlify' prefix, or + using lCtrl-rCtrl-x, or rCtrl-rAlt-x [in this order]), missing + characters cannot be reconstructed without a tedious manual work. */ + + /* These two cases are often going to be distinguishable, since at most + one of these character is defined with KBDCTRL | KBDMENU modifier + bitmap. (This heuristic breaks if both lCtrl-lAlt- AND lCtrl-rAlt- + are translated to modifier bitmaps distinct from KBDCTRL | KBDMENU, + or in the cases when lCtrl-lAlt-* and lCtrl-rAlt-* are generally + different, but lCtrl-lAlt-x and lCtrl-rAlt-x happen to deliver the + same character.) + + So we have 2 chunks of info: + (A) is it lCtrl-rAlt-, or lCtrl-lAlt, or some other combination? + (B) is the delivered character defined with KBDCTRL | KBDMENU bits? + Basing on (A) and (B), we should decide whether to ignore the + delivered character. (Before, Emacs was completely ignoring (B), and + was treating the 3-state of (A) as a bit.) This means that we have 6 + bits of customization. + + Additionally, a presence of two Ctrl down may be AltGr-rCtrl-.*/ + + /* Strip all non-Shift modifiers if: + - more than one UTF-16 code point delivered (can't call VkKeyScanW ()) + - or the character is a result of combining with a prefix key. */ + if (!after_dead && count == 1 && *b < 0x10000) + { + if (console_modifiers & (RIGHT_ALT_PRESSED | LEFT_ALT_PRESSED) + && console_modifiers & (RIGHT_CTRL_PRESSED | LEFT_CTRL_PRESSED)) + { + type_CtrlAlt = "bB"; /* generic bindable Ctrl-Alt- modifiers */ + if (console_modifiers & (LEFT_CTRL_PRESSED | RIGHT_CTRL_PRESSED) + == (LEFT_CTRL_PRESSED | RIGHT_CTRL_PRESSED)) + /* double-Ctrl: + e.g. AltGr-rCtrl on some layouts (in this order!) */ + type_CtrlAlt = "dD"; + else if (console_modifiers + & (LEFT_CTRL_PRESSED | LEFT_ALT_PRESSED) + == (LEFT_CTRL_PRESSED | LEFT_ALT_PRESSED)) + type_CtrlAlt = "lL"; /* Ctrl-Alt- modifiers on the left */ + else if (!NILP (Vw32_recognize_altgr) + && (console_modifiers + & (RIGHT_ALT_PRESSED | LEFT_CTRL_PRESSED)) + == (RIGHT_ALT_PRESSED | LEFT_CTRL_PRESSED)) + type_CtrlAlt = "gG"; /* modifiers as in AltGr */ + } + else if (wmsg.dwModifiers & (alt_modifier | meta_modifier) + || (console_modifiers + & (RIGHT_WIN_PRESSED | RIGHT_WIN_PRESSED + | APPS_PRESSED | SCROLLLOCK_ON))) + { + /* pure Alt (or combination of Alt, Win, APPS, scrolllock */ + type_CtrlAlt = "aA"; + } + if (type_CtrlAlt) + { + /* Out of bound bitmap: */ + SHORT r = VkKeyScanW( *b ), bitmap = 0x1FF; + + FPRINTF_WM_CHARS((stderr, "VkKeyScanW %#06x %#04x\n", (int)r, + wParam)); + if ((r & 0xFF) == wParam) + bitmap = r>>8; /* *b is reachable via simple interface */ + if (*type_CtrlAlt == 'a') /* Simple Alt seen */ + { + if ((bitmap & ~1) == 0) /* 1: KBDSHIFT */ + { + /* In "traditional" layouts, Alt without Ctrl does not + change the delivered character. This detects this + situation; it is safe to report this as Alt-something + - as opposed to delivering the reported character + without modifiers. */ + if (legacy_alt_meta + && *b > 0x7f && ('A' <= wParam && wParam <= 'Z')) + /* For backward-compatibility with older Emacsen, let + this be processed by another branch below (which + would convert it to Alt-Latin char via wParam). */ + return 0; + } + else + { + hairy = 1; + } + } + /* Check whether the delivered character(s) is accessible via + KBDCTRL | KBDALT ( | KBDSHIFT ) modifier mask (which is 7). */ + else if ((bitmap & ~1) != 6) + { + /* The character is not accessible via plain Ctrl-Alt(-Shift) + (which is, probably, same as AltGr) modifiers. + Either it was after a prefix key, or is combined with + modifier keys which we don't see, or there is an asymmetry + between left-hand and right-hand modifiers, or other hairy + stuff. */ + hairy = 1; + } + /* The best solution is to delegate these tough (but rarely + needed) choices to the user. Temporarily (???), it is + implemented as C macros. + + Essentially, there are 3 things to do: return 0 (handle to the + legacy processing code [ignoring the character payload]; keep + some modifiers (so that they will be processed by the binding + system [on top of the character payload]; strip modifiers [so + that `self-insert' is going to be triggered with the character + payload]). + + The default below should cover 99.9999% of cases: + (a) strip Alt- in the hairy case only; + (stripping = not ignoring) + (l) for lAlt-lCtrl, ignore the char in simple cases only; + (g) for what looks like AltGr, ignore the modifiers; + (d) for what looks like lCtrl-rCtrl-Alt (probably + AltGr-rCtrl), ignore the character in simple cases only; + (b) for other cases of Ctrl-Alt, ignore the character in + simple cases only. + + Essentially, in all hairy cases, and in looks-like-AltGr case, + we keep the character, ignoring the modifiers. In all the + other cases, we ignore the delivered character. + */ +#define S_TYPES_TO_IGNORE_CHARACTER_PAYLOAD "aldb" +#define S_TYPES_TO_REPORT_CHARACTER_PAYLOAD_WITH_MODIFIERS "" + if (strchr(S_TYPES_TO_IGNORE_CHARACTER_PAYLOAD, + type_CtrlAlt[hairy])) + return 0; + /* if in neither list, report all the modifiers we see COMBINED + WITH the reported character */ + if (strchr(S_TYPES_TO_REPORT_CHARACTER_PAYLOAD_WITH_MODIFIERS, + type_CtrlAlt[hairy])) + strip_ExtraMods = 0; + } + } + if (strip_ExtraMods) + wmsg.dwModifiers = wmsg.dwModifiers & shift_modifier; + + signal_user_input (); + while (count--) + { + FPRINTF_WM_CHARS((stderr, "unichar %#06x\n", *b)); + my_post_msg (&wmsg, hwnd, WM_UNICHAR, *b++, lParam); + } + if (!ctrl_cnt) /* Process ALSO as ctrl */ + return 1; + else + FPRINTF_WM_CHARS((stderr, "extra ctrl char\n")); + return -1; + } + else if (is_dead >= 0) + { + FPRINTF_WM_CHARS((stderr, "dead %#06x\n", is_dead)); + after_deadkey = is_dead; + return 1; + } + return 0; +} + /* Main window procedure */ static LRESULT CALLBACK @@ -2948,6 +3355,15 @@ w32_wnd_proc (HWND hwnd, UINT msg, WPARA /* Inform lisp thread of keyboard layout changes. */ my_post_msg (&wmsg, hwnd, msg, wParam, lParam); + /* The state of the finite automaton is separate per every input + language environment (so it does not change when one switches + to a different window with the same environment). Moreover, + the experiments show that the state is not remembered when + one switches back to the pre-previous environment. */ + after_deadkey = -1; + + /* XXXX??? What follows is a COMPLETE misunderstanding of Windows! */ + /* Clear dead keys in the keyboard state; for simplicity only preserve modifier key states. */ { @@ -3007,7 +3423,6 @@ w32_wnd_proc (HWND hwnd, UINT msg, WPARA /* Synchronize modifiers with current keystroke. */ sync_modifiers (); record_keydown (wParam, lParam); - wParam = map_keypad_keys (wParam, (lParam & 0x1000000L) != 0); windows_translate = 0; @@ -3117,6 +3532,46 @@ w32_wnd_proc (HWND hwnd, UINT msg, WPARA wParam = VK_NUMLOCK; break; default: + if (w32_unicode_gui) { + /* If this event generates characters or deadkeys, do not interpret + it as a "raw combination of modifiers and keysym". Hide + deadkeys, and use the generated character(s) instead of the + keysym. (Backward compatibility: exceptions for numpad keys + generating 0-9 . , / * - +, and for extra-Alt combined with a + non-Latin char.) + + Try to not report modifiers which have effect on which + character or deadkey is generated. + + Example (contrived): if rightAlt-? generates f (on a Cyrillic + keyboard layout), and Ctrl, leftAlt do not affect the generated + character, one wants to report Ctrl-leftAlt-f if the user + presses Ctrl-leftAlt-rightAlt-?. */ + int res; +#if 0 + /* Some of WM_CHAR may be fed to us directly, some are results of + TranslateMessage(). Using 0 as the first argument (in a + separate call) might help us distinguish these two cases. + + However, the keypress feeders would most probably expect the + "standard" message pump, when TranslateMessage() is called on + EVERY KeyDown/Keyup event. So they may feed us Down-Ctrl + Down-FAKE Char-o and expect us to recognize it as Ctrl-o. + Using 0 as the first argument would interfere with this. */ + deliver_wm_chars (0, hwnd, msg, wParam, lParam, 1); +#endif + /* Processing the generated WM_CHAR messages *WHILE* we handle + KEYDOWN/UP event is the best choice, since withoug any fuss, + we know all 3 of: scancode, virtual keycode, and expansion. + (Additionally, one knows boundaries of expansion of different + keypresses.) */ + res = deliver_wm_chars (1, hwnd, msg, wParam, lParam, 1); + windows_translate = -( res != 0 ); + if (res > 0) /* Bound to character(s) or a deadkey */ + break; + /* deliver_wm_chars() may make some branches after this vestigal */ + } + wParam = map_keypad_keys (wParam, (lParam & 0x1000000L) != 0); /* If not defined as a function key, change it to a WM_CHAR message. */ if (wParam > 255 || !lispy_function_keys[wParam]) { @@ -3184,6 +3639,8 @@ w32_wnd_proc (HWND hwnd, UINT msg, WPARA } } + if (windows_translate == -1) + break; translate: if (windows_translate) { --Q68bSM7Ycu6FN28Q Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="w32fns.c-diff-v2-relative" --- w32fns.c-sent2 2015-07-01 02:56:30.787672000 -0700 +++ w32fns.c 2015-07-08 16:32:11.187197700 -0700 @@ -2932,6 +2932,15 @@ get_wm_chars (HWND aWnd, int *buf, int b # define FPRINTF_WM_CHARS(ARG) 0 #endif +/* This is a heuristic only. This is supposed to track the state of the + finite automaton in the language environment of Windows. + + However, separate windows (if with the same different language + environments!) should have different values. Moreover, switching to a + non-Emacs window with the same language environment, and using (dead)keys + there would change the value stored in the kernel, but not this value. */ +static int after_deadkey = 0; + int deliver_wm_chars (int do_translate, HWND hwnd, UINT msg, UINT wParam, UINT lParam, int legacy_alt_meta) @@ -2940,7 +2949,7 @@ deliver_wm_chars (int do_translate, HWND points to a keypress. (However, the "old style" TranslateMessage() would deliver at most 16 of them.) Be on a safe side, and prepare to treat many more. */ - int ctrl_cnt, buf[1024], count, is_dead; + int ctrl_cnt, buf[1024], count, is_dead, after_dead = (after_deadkey != -1); /* Since the keypress processing logic of Windows has a lot of state, it is important to call TranslateMessage() for every keyup/keydown, AND @@ -2956,7 +2965,8 @@ deliver_wm_chars (int do_translate, HWND So, with the usual message pump, the following call to TranslateMessage() is not needed (and is going to be VERY harmful). With Emacs' message pump, the call is needed. */ - if (do_translate) { + if (do_translate) + { MSG windows_msg = { hwnd, msg, wParam, lParam, 0, {0,0} }; windows_msg.time = GetMessageTime (); @@ -2970,13 +2980,22 @@ deliver_wm_chars (int do_translate, HWND || modifier_set (VK_CONTROL), &ctrl_cnt, &is_dead, wParam, (lParam & 0x1000000L) != 0); - if (count) { + if (count) + { W32Msg wmsg; - int *b = buf, strip_Alt = 1; - - /* wParam is checked when converting CapsLock to Shift */ - wmsg.dwModifiers = do_translate - ? w32_get_key_modifiers (wParam, lParam) : 0; + DWORD console_modifiers = construct_console_modifiers (); + int *b = buf, strip_Alt = 1, strip_ExtraMods = 1, hairy = 0; + char *type_CtrlAlt = NULL; + + /* XXXX In fact, there may be another case when we need to do the same: + What happens if the string defined in the LIGATURES has length + 0? Probably, we will get count==0, but the state of the finite + automaton would reset to 0??? */ + after_deadkey = -1; + + /* wParam is checked when converting CapsLock to Shift; this is a clone + of w32_get_key_modifiers (). */ + wmsg.dwModifiers = w32_kbd_mods_to_emacs (console_modifiers, wParam); /* What follows is just heuristics; the correct treatement requires non-destructive ToUnicode(): @@ -2991,8 +3010,8 @@ deliver_wm_chars (int do_translate, HWND Example: assume that we know: (A) lCtrl+rCtrl+rAlt modifiers with VK_A key produce a Latin "f" - ("may be logical" with a JCUKEN-flavored Russian keyboard flavor); - (B) removing any one of lCtrl, rCtrl, rAlt changes the produced char; + ("may be logical" in JCUKEN-flavored Russian keyboard flavors); + (B) removing any of lCtrl, rCtrl, rAlt changes the produced char; (C) Win-modifier is not affecting the produced character (this is the common case: happens with all "standard" layouts). @@ -3000,7 +3019,7 @@ deliver_wm_chars (int do_translate, HWND What is the intent of the user? We need to guess the intent to decide which event to deliver to the application. - This looks like a reasonable logic: wince Win- modifier does not affect + This looks like a reasonable logic: since Win- modifier doesn't affect the output string, the user was pressing Win for SOME OTHER purpose. So the user wanted to generate Win-SOMETHING event. Now, what is something? If one takes the mantra that "character payload is more @@ -3008,38 +3027,196 @@ deliver_wm_chars (int do_translate, HWND payload", then one should ignore lCtrl+rCtrl+rAlt, ignore VK_A, and assume that the user wanted to generate Win-f. - Unfortunately, without non-destructive ToUnicode(), checking (B) and (C) - is out of question. So we use heuristics (hopefully, covering 99.9999% - of cases). + Unfortunately, without non-destructive ToUnicode(), checking (B),(C) + is out of question. So we use heuristics (hopefully, covering + 99.9999% of cases). */ - /* If ctrl-something delivers chars, ctrl and the rest should be hidden; - so the consumer of key-event won't interpret it as an accelerator. */ - if (wmsg.dwModifiers & ctrl_modifier) - wmsg.dwModifiers = wmsg.dwModifiers & shift_modifier; - /* In many keyboard layouts, (left) Alt is not changing the character. - Unless we are in this situation, strip Alt/Meta. */ - if (wmsg.dwModifiers & (alt_modifier | meta_modifier) - /* If alt-something delivers non-ASCIIchars, alt should be hidden */ - && count == 1 && *b < 0x10000) - { - SHORT r = VkKeyScanW( *b ); + /* Another thing to watch for is a possibility to use AltGr-* and + Ctrl-Alt-* with different semantic. - FPRINTF_WM_CHARS((stderr, "VkKeyScanW %#06x %#04x\n", (int)r, wParam)); - if ((r & 0xFF) == wParam && !(r & ~0x1FF)) - { - /* Char available without Alt modifier, so Alt is "on top" */ + Background: the layout defining the KLLF_ALTGR bit are treated + specially by the kernel: when VK_RMENU (=rightAlt, =AltGr) is pressed + (released), a press (release) of VK_LCONTROL is emulated (unless Ctrl + is already down). As a result, any press/release of AltGr is seen + by applications as a press/release of lCtrl AND rAlt. This is + applicable, in particular, to ToUnicode[Ex](). (Keyrepeat is covered + the same way!) + + NOTE: it IS possible to see bare rAlt even with KLLF_ALTGR; but this + requires a good finger coordination: doing (physically) + Down-lCtrl Down-rAlt Up-lCtrl Down-a + (doing quick enough, so that key repeat of rAlt [which would + generate new "fake" Down-lCtrl events] does not happens before 'a' + is down) results in no "fake" events, so the application will see + only rAlt down when 'a' is pressed. (However, fake Up-lCtrl WILL + be generated when rAlt goes UP.) + + In fact, note also that KLLF_ALTGR does not prohibit construction of + rCtrl-rAlt (just press them in this order!). + + Moreover: "traditional" layouts do not define distinct modifier-masks + for VK_LMENU and VK_RMENU (same for VK_L/RCONTROL). Instead, they + rely on the KLLF_ALTGR bit to make the behaviour of VK_LMENU and + VK_RMENU distinct. As a corollary, for such layouts, the produced + character is the same for AltGr-* (=rAlt-*) and Ctrl-Alt-* (in any + combination of handedness). For description of masks, see + + http://search.cpan.org/~ilyaz/UI-KeyboardLayout/lib/UI/KeyboardLayout.pm#Keyboard_input_on_Windows,_Part_I:_what_is_the_kernel_doing? + + By default, Emacs was using these coincidences via the following + heuristics: it was treating: + (*) keypresses with lCtrl-rAlt modifiers as if they are carrying + ONLY the character payload (no matter what the actual keyboard + was defining: if lCtrl-lAlt-b was delivering U+05df=beta, then + Emacs saw [beta]; if lCtrl-lAlt-b was undefined in the layout, + the keypress was completely ignored), and + (*) keypresses with the other combinations of handedness of Ctrl-Alt + modifiers (e.g., lCtrl-lAlt) as if they NEVER carry a character + payload (so they were reported "raw": if lCtrl-lAlt-b was + delivering beta, then Emacs saw event [C-A-b], and not [beta]). + This worked good for "traditional" layouts: users could type both + AltGr-x and Ctrl-Alt-x, and one was a character, another a bindable + event. + + However, for layouts which deliver different characters for AltGr-x + and lCtrl-lAlt-x, this scheme makes the latter character unaccessible + in Emacs. While it is easy to access functionality of [C-M-x] in + Emacs by other means (for example, by the `controlify' prefix, or + using lCtrl-rCtrl-x, or rCtrl-rAlt-x [in this order]), missing + characters cannot be reconstructed without a tedious manual work. */ + + /* These two cases are often going to be distinguishable, since at most + one of these character is defined with KBDCTRL | KBDMENU modifier + bitmap. (This heuristic breaks if both lCtrl-lAlt- AND lCtrl-rAlt- + are translated to modifier bitmaps distinct from KBDCTRL | KBDMENU, + or in the cases when lCtrl-lAlt-* and lCtrl-rAlt-* are generally + different, but lCtrl-lAlt-x and lCtrl-rAlt-x happen to deliver the + same character.) + + So we have 2 chunks of info: + (A) is it lCtrl-rAlt-, or lCtrl-lAlt, or some other combination? + (B) is the delivered character defined with KBDCTRL | KBDMENU bits? + Basing on (A) and (B), we should decide whether to ignore the + delivered character. (Before, Emacs was completely ignoring (B), and + was treating the 3-state of (A) as a bit.) This means that we have 6 + bits of customization. + + Additionally, a presence of two Ctrl down may be AltGr-rCtrl-.*/ + + /* Strip all non-Shift modifiers if: + - more than one UTF-16 code point delivered (can't call VkKeyScanW ()) + - or the character is a result of combining with a prefix key. */ + if (!after_dead && count == 1 && *b < 0x10000) + { + if (console_modifiers & (RIGHT_ALT_PRESSED | LEFT_ALT_PRESSED) + && console_modifiers & (RIGHT_CTRL_PRESSED | LEFT_CTRL_PRESSED)) + { + type_CtrlAlt = "bB"; /* generic bindable Ctrl-Alt- modifiers */ + if (console_modifiers & (LEFT_CTRL_PRESSED | RIGHT_CTRL_PRESSED) + == (LEFT_CTRL_PRESSED | RIGHT_CTRL_PRESSED)) + /* double-Ctrl: + e.g. AltGr-rCtrl on some layouts (in this order!) */ + type_CtrlAlt = "dD"; + else if (console_modifiers + & (LEFT_CTRL_PRESSED | LEFT_ALT_PRESSED) + == (LEFT_CTRL_PRESSED | LEFT_ALT_PRESSED)) + type_CtrlAlt = "lL"; /* Ctrl-Alt- modifiers on the left */ + else if (!NILP (Vw32_recognize_altgr) + && (console_modifiers + & (RIGHT_ALT_PRESSED | LEFT_CTRL_PRESSED)) + == (RIGHT_ALT_PRESSED | LEFT_CTRL_PRESSED)) + type_CtrlAlt = "gG"; /* modifiers as in AltGr */ + } + else if (wmsg.dwModifiers & (alt_modifier | meta_modifier) + || (console_modifiers + & (RIGHT_WIN_PRESSED | RIGHT_WIN_PRESSED + | APPS_PRESSED | SCROLLLOCK_ON))) + { + /* pure Alt (or combination of Alt, Win, APPS, scrolllock */ + type_CtrlAlt = "aA"; + } + if (type_CtrlAlt) + { + /* Out of bound bitmap: */ + SHORT r = VkKeyScanW( *b ), bitmap = 0x1FF; + + FPRINTF_WM_CHARS((stderr, "VkKeyScanW %#06x %#04x\n", (int)r, + wParam)); + if ((r & 0xFF) == wParam) + bitmap = r>>8; /* *b is reachable via simple interface */ + if (*type_CtrlAlt == 'a') /* Simple Alt seen */ + { + if ((bitmap & ~1) == 0) /* 1: KBDSHIFT */ + { + /* In "traditional" layouts, Alt without Ctrl does not + change the delivered character. This detects this + situation; it is safe to report this as Alt-something + - as opposed to delivering the reported character + without modifiers. */ if (legacy_alt_meta && *b > 0x7f && ('A' <= wParam && wParam <= 'Z')) /* For backward-compatibility with older Emacsen, let - this be processed by another branch below (which would convert - it to Alt-Latin char via wParam). */ + this be processed by another branch below (which + would convert it to Alt-Latin char via wParam). */ + return 0; + } + else + { + hairy = 1; + } + } + /* Check whether the delivered character(s) is accessible via + KBDCTRL | KBDALT ( | KBDSHIFT ) modifier mask (which is 7). */ + else if ((bitmap & ~1) != 6) + { + /* The character is not accessible via plain Ctrl-Alt(-Shift) + (which is, probably, same as AltGr) modifiers. + Either it was after a prefix key, or is combined with + modifier keys which we don't see, or there is an asymmetry + between left-hand and right-hand modifiers, or other hairy + stuff. */ + hairy = 1; + } + /* The best solution is to delegate these tough (but rarely + needed) choices to the user. Temporarily (???), it is + implemented as C macros. + + Essentially, there are 3 things to do: return 0 (handle to the + legacy processing code [ignoring the character payload]; keep + some modifiers (so that they will be processed by the binding + system [on top of the character payload]; strip modifiers [so + that `self-insert' is going to be triggered with the character + payload]). + + The default below should cover 99.9999% of cases: + (a) strip Alt- in the hairy case only; + (stripping = not ignoring) + (l) for lAlt-lCtrl, ignore the char in simple cases only; + (g) for what looks like AltGr, ignore the modifiers; + (d) for what looks like lCtrl-rCtrl-Alt (probably + AltGr-rCtrl), ignore the character in simple cases only; + (b) for other cases of Ctrl-Alt, ignore the character in + simple cases only. + + Essentially, in all hairy cases, and in looks-like-AltGr case, + we keep the character, ignoring the modifiers. In all the + other cases, we ignore the delivered character. + */ +#define S_TYPES_TO_IGNORE_CHARACTER_PAYLOAD "aldb" +#define S_TYPES_TO_REPORT_CHARACTER_PAYLOAD_WITH_MODIFIERS "" + if (strchr(S_TYPES_TO_IGNORE_CHARACTER_PAYLOAD, + type_CtrlAlt[hairy])) return 0; - strip_Alt = 0; + /* if in neither list, report all the modifiers we see COMBINED + WITH the reported character */ + if (strchr(S_TYPES_TO_REPORT_CHARACTER_PAYLOAD_WITH_MODIFIERS, + type_CtrlAlt[hairy])) + strip_ExtraMods = 0; } } - if (strip_Alt) - wmsg.dwModifiers = wmsg.dwModifiers & ~(alt_modifier | meta_modifier); + if (strip_ExtraMods) + wmsg.dwModifiers = wmsg.dwModifiers & shift_modifier; signal_user_input (); while (count--) @@ -3052,8 +3229,11 @@ deliver_wm_chars (int do_translate, HWND else FPRINTF_WM_CHARS((stderr, "extra ctrl char\n")); return -1; - } else if (is_dead >= 0) { + } + else if (is_dead >= 0) + { FPRINTF_WM_CHARS((stderr, "dead %#06x\n", is_dead)); + after_deadkey = is_dead; return 1; } return 0; @@ -3175,6 +3355,15 @@ w32_wnd_proc (HWND hwnd, UINT msg, WPARA /* Inform lisp thread of keyboard layout changes. */ my_post_msg (&wmsg, hwnd, msg, wParam, lParam); + /* The state of the finite automaton is separate per every input + language environment (so it does not change when one switches + to a different window with the same environment). Moreover, + the experiments show that the state is not remembered when + one switches back to the pre-previous environment. */ + after_deadkey = -1; + + /* XXXX??? What follows is a COMPLETE misunderstanding of Windows! */ + /* Clear dead keys in the keyboard state; for simplicity only preserve modifier key states. */ { --Q68bSM7Ycu6FN28Q--