From: Ilya Zakharevich <ilya@math.berkeley.edu>
To: Eli Zaretskii <eliz@gnu.org>
Cc: 19994@debbugs.gnu.org
Subject: bug#19994: 25.0.50; Unicode keyboard input on Windows
Date: Wed, 8 Jul 2015 17:02:59 -0700 [thread overview]
Message-ID: <20150709000259.GA7163@math.berkeley.edu> (raw)
In-Reply-To: <20150701100712.GA24175@math.berkeley.edu>
[-- Attachment #1: Type: text/plain, Size: 2011 bytes --]
On Wed, Jul 01, 2015 at 03:07:12AM -0700, Ilya Zakharevich wrote:
> On Wed, Mar 04, 2015 at 08:01:01PM +0200, Eli Zaretskii wrote:
> > I suggest, indeed, to clean up the code so we could commit it to the
> > master branch. That way, it will get wider testing, and we can fix
> I had no time to work on the code itself, but
> • I fixed the formatting,
> • I pumped up the docs,
> • I put in the suggested eassert().
The variant I sent was too primitive — it was not covering a (common?)
usage case when (with AltGr-layouts) leftCtrl+rightCtrl was behaving
differently than pressing AltGr:
• leftCtrl+rightCtrl would trigger C-M-key;
• altGr would enter the character payload.
This update
(0) fixes two formatting-style omissions;
(A) adds A LOAD of new comments;
(B) treats such important cases (as above) separately;
(z) Marks a piece of old code which does not make any sense.
(see the last chunk in the relative patch)
Notes:
• In (B), there are some decisions to make. I encapsulate these
decisions into two strings. For best result, these strings should
be user-customizable. However, currently they are just put into
C #defines.
When I sit on this more, and if these customizations turn out to
be useful, one can make them into Lisp variables.
• There is a bug in the (old) Emacs code which prevents some cases
treated in (B) from being really useful. I did not fix it yet.
To see the bug:
∘ switch to layout with AltGr;
∘ assume that AltGr-s produces ß (as with US International);
∘ pressing AltGr-rightControl-s produces Meta-ß;
∘ pressing rightControl-AltGr-s produces C-M-s.
(I do not think this effect is intentional.)
• And, BTW, is it documented anywhere that
leftControl-rightControl-key produces C-M-key?
I include two patches:
□ absolute (ignore the previous patches)
□ relative (with whitespace ignored) — for reading.
Enjoy,
Ilya
[-- Attachment #2: w32fns.c-diff-v2 --]
[-- Type: text/plain, Size: 24021 bytes --]
--- w32fns.c-ini 2015-01-30 15:33:23.505201400 -0800
+++ w32fns.c 2015-07-08 16:32:11.187197700 -0700
@@ -2832,6 +2832,413 @@ post_character_message (HWND hwnd, UINT
my_post_msg (&wmsg, hwnd, msg, wParam, lParam);
}
+static int
+get_wm_chars (HWND aWnd, int *buf, int buflen, int ignore_ctrl, int ctrl,
+ int *ctrl_cnt, int *is_dead, int vk, int exp)
+{
+ MSG msg;
+ /* If doubled is at the end, ignore it */
+ int i = buflen, doubled = 0, code_unit;
+
+ if (ctrl_cnt)
+ *ctrl_cnt = 0;
+ if (is_dead)
+ *is_dead = -1;
+ eassert(w32_unicode_gui);
+ while (buflen
+ /* Should be called only when w32_unicode_gui: */
+ && PeekMessageW(&msg, aWnd, WM_KEYFIRST, WM_KEYLAST,
+ PM_NOREMOVE | PM_NOYIELD)
+ && (msg.message == WM_CHAR || msg.message == WM_SYSCHAR
+ || msg.message == WM_DEADCHAR || msg.message == WM_SYSDEADCHAR
+ || msg.message == WM_UNICHAR))
+ {
+ /* We extract character payload, but in this call we handle only the
+ characters which comes BEFORE the next keyup/keydown message. */
+ int dead;
+
+ GetMessageW(&msg, aWnd, msg.message, msg.message);
+ dead = (msg.message == WM_DEADCHAR || msg.message == WM_SYSDEADCHAR);
+ if (is_dead)
+ *is_dead = (dead ? msg.wParam : -1);
+ if (dead)
+ continue;
+ code_unit = msg.wParam;
+ if (doubled)
+ {
+ /* had surrogate */
+ if (msg.message == WM_UNICHAR
+ || code_unit < 0xDC00 || code_unit > 0xDFFF)
+ { /* Mismatched first surrogate.
+ Pass both code units as if they were two characters. */
+ *buf++ = doubled;
+ if (!--buflen)
+ return i; /* Drop the 2nd char if at the end of the buffer. */
+ }
+ else /* see https://en.wikipedia.org/wiki/UTF-16 */
+ {
+ code_unit = (doubled << 10) + code_unit - 0x35FDC00;
+ }
+ doubled = 0;
+ }
+ else if (code_unit >= 0xD800 && code_unit <= 0xDBFF)
+ {
+ /* Handle mismatched 2nd surrogate the same as a normal character. */
+ doubled = code_unit;
+ continue;
+ }
+
+ /* The only "fake" characters delivered by ToUnicode() or
+ TranslateMessage() are:
+ 0x01 .. 0x1a for Ctrl-letter, Enter, Tab, Ctrl-Break, Esc, Backspace
+ 0x00 and 0x1b .. 0x1f for Control- []\@^_
+ 0x7f for Control-BackSpace
+ 0x20 for Control-Space */
+ if (ignore_ctrl
+ && (code_unit < 0x20 || code_unit == 0x7f
+ || (code_unit == 0x20 && ctrl)))
+ {
+ /* Non-character payload in a WM_CHAR
+ (Ctrl-something pressed, see above). Ignore, and report. */
+ if (ctrl_cnt)
+ *ctrl_cnt++;
+ continue;
+ }
+ /* Traditionally, Emacs would ignore the character payload of VK_NUMPAD*
+ keys, and would treat them later via `function-key-map'. In addition
+ to usual 102-key NUMPAD keys, this map also treats `kp-'-variants of
+ space, tab, enter, separator, equal. TAB and EQUAL, apparently,
+ cannot be generated on Win-GUI branch. ENTER is already handled
+ by the code above. According to `lispy_function_keys', kp_space is
+ generated by not-extended VK_CLEAR. (kp-tab != VK_OEM_NEC_EQUAL!).
+
+ We do similarly for backward-compatibility, but ignore only the
+ characters restorable later by `function-key-map'. */
+ if (code_unit < 0x7f
+ && ((vk >= VK_NUMPAD0 && vk <= VK_DIVIDE)
+ || (exp && ((vk >= VK_PRIOR && vk <= VK_DOWN) ||
+ vk == VK_INSERT || vk == VK_DELETE || vk == VK_CLEAR)))
+ && strchr("0123456789/*-+.,", code_unit))
+ continue;
+ *buf++ = code_unit;
+ buflen--;
+ }
+ return i - buflen;
+}
+
+#ifdef DBG_WM_CHARS
+# define FPRINTF_WM_CHARS(ARG) fprintf ARG
+#else
+# define FPRINTF_WM_CHARS(ARG) 0
+#endif
+
+/* This is a heuristic only. This is supposed to track the state of the
+ finite automaton in the language environment of Windows.
+
+ However, separate windows (if with the same different language
+ environments!) should have different values. Moreover, switching to a
+ non-Emacs window with the same language environment, and using (dead)keys
+ there would change the value stored in the kernel, but not this value. */
+static int after_deadkey = 0;
+
+int
+deliver_wm_chars (int do_translate, HWND hwnd, UINT msg, UINT wParam,
+ UINT lParam, int legacy_alt_meta)
+{
+ /* An "old style" keyboard description may assign up to 125 UTF-16 code
+ points to a keypress.
+ (However, the "old style" TranslateMessage() would deliver at most 16 of
+ them.) Be on a safe side, and prepare to treat many more. */
+ int ctrl_cnt, buf[1024], count, is_dead, after_dead = (after_deadkey != -1);
+
+ /* Since the keypress processing logic of Windows has a lot of state, it
+ is important to call TranslateMessage() for every keyup/keydown, AND
+ do it exactly once. (The actual change of state is done by
+ ToUnicode[Ex](), which is called by TranslateMessage(). So one can
+ call ToUnicode[Ex]() instead.)
+
+ The "usual" message pump calls TranslateMessage() for EVERY event.
+ Emacs calls TranslateMessage() very selectively (is it needed for doing
+ some tricky stuff with Win95??? With newer Windows, selectiveness is,
+ most probably, not needed - and harms a lot).
+
+ So, with the usual message pump, the following call to TranslateMessage()
+ is not needed (and is going to be VERY harmful). With Emacs' message
+ pump, the call is needed. */
+ if (do_translate)
+ {
+ MSG windows_msg = { hwnd, msg, wParam, lParam, 0, {0,0} };
+
+ windows_msg.time = GetMessageTime ();
+ TranslateMessage (&windows_msg);
+ }
+ count = get_wm_chars (hwnd, buf, sizeof(buf)/sizeof(*buf), 1,
+ /* The message may have been synthesized by
+ who knows what; be conservative. */
+ modifier_set (VK_LCONTROL)
+ || modifier_set (VK_RCONTROL)
+ || modifier_set (VK_CONTROL),
+ &ctrl_cnt, &is_dead, wParam,
+ (lParam & 0x1000000L) != 0);
+ if (count)
+ {
+ W32Msg wmsg;
+ DWORD console_modifiers = construct_console_modifiers ();
+ int *b = buf, strip_Alt = 1, strip_ExtraMods = 1, hairy = 0;
+ char *type_CtrlAlt = NULL;
+
+ /* XXXX In fact, there may be another case when we need to do the same:
+ What happens if the string defined in the LIGATURES has length
+ 0? Probably, we will get count==0, but the state of the finite
+ automaton would reset to 0??? */
+ after_deadkey = -1;
+
+ /* wParam is checked when converting CapsLock to Shift; this is a clone
+ of w32_get_key_modifiers (). */
+ wmsg.dwModifiers = w32_kbd_mods_to_emacs (console_modifiers, wParam);
+
+ /* What follows is just heuristics; the correct treatement requires
+ non-destructive ToUnicode():
+ http://search.cpan.org/~ilyaz/UI-KeyboardLayout/lib/UI/KeyboardLayout.pm#Can_an_application_on_Windows_accept_keyboard_events?_Part_IV:_application-specific_modifiers
+
+ What one needs to find is:
+ * which of the present modifiers AFFECT the resulting char(s)
+ (so should be stripped, since their EFFECT is "already
+ taken into account" in the string in buf), and
+ * which modifiers are not affecting buf, so should be reported to
+ the application for further treatment.
+
+ Example: assume that we know:
+ (A) lCtrl+rCtrl+rAlt modifiers with VK_A key produce a Latin "f"
+ ("may be logical" in JCUKEN-flavored Russian keyboard flavors);
+ (B) removing any of lCtrl, rCtrl, rAlt changes the produced char;
+ (C) Win-modifier is not affecting the produced character
+ (this is the common case: happens with all "standard" layouts).
+
+ Suppose the user presses Win+lCtrl+rCtrl+rAlt modifiers with VK_A.
+ What is the intent of the user? We need to guess the intent to decide
+ which event to deliver to the application.
+
+ This looks like a reasonable logic: since Win- modifier doesn't affect
+ the output string, the user was pressing Win for SOME OTHER purpose.
+ So the user wanted to generate Win-SOMETHING event. Now, what is
+ something? If one takes the mantra that "character payload is more
+ important than the combination of keypresses which resulted in this
+ payload", then one should ignore lCtrl+rCtrl+rAlt, ignore VK_A, and
+ assume that the user wanted to generate Win-f.
+
+ Unfortunately, without non-destructive ToUnicode(), checking (B),(C)
+ is out of question. So we use heuristics (hopefully, covering
+ 99.9999% of cases).
+ */
+
+ /* Another thing to watch for is a possibility to use AltGr-* and
+ Ctrl-Alt-* with different semantic.
+
+ Background: the layout defining the KLLF_ALTGR bit are treated
+ specially by the kernel: when VK_RMENU (=rightAlt, =AltGr) is pressed
+ (released), a press (release) of VK_LCONTROL is emulated (unless Ctrl
+ is already down). As a result, any press/release of AltGr is seen
+ by applications as a press/release of lCtrl AND rAlt. This is
+ applicable, in particular, to ToUnicode[Ex](). (Keyrepeat is covered
+ the same way!)
+
+ NOTE: it IS possible to see bare rAlt even with KLLF_ALTGR; but this
+ requires a good finger coordination: doing (physically)
+ Down-lCtrl Down-rAlt Up-lCtrl Down-a
+ (doing quick enough, so that key repeat of rAlt [which would
+ generate new "fake" Down-lCtrl events] does not happens before 'a'
+ is down) results in no "fake" events, so the application will see
+ only rAlt down when 'a' is pressed. (However, fake Up-lCtrl WILL
+ be generated when rAlt goes UP.)
+
+ In fact, note also that KLLF_ALTGR does not prohibit construction of
+ rCtrl-rAlt (just press them in this order!).
+
+ Moreover: "traditional" layouts do not define distinct modifier-masks
+ for VK_LMENU and VK_RMENU (same for VK_L/RCONTROL). Instead, they
+ rely on the KLLF_ALTGR bit to make the behaviour of VK_LMENU and
+ VK_RMENU distinct. As a corollary, for such layouts, the produced
+ character is the same for AltGr-* (=rAlt-*) and Ctrl-Alt-* (in any
+ combination of handedness). For description of masks, see
+
+ http://search.cpan.org/~ilyaz/UI-KeyboardLayout/lib/UI/KeyboardLayout.pm#Keyboard_input_on_Windows,_Part_I:_what_is_the_kernel_doing?
+
+ By default, Emacs was using these coincidences via the following
+ heuristics: it was treating:
+ (*) keypresses with lCtrl-rAlt modifiers as if they are carrying
+ ONLY the character payload (no matter what the actual keyboard
+ was defining: if lCtrl-lAlt-b was delivering U+05df=beta, then
+ Emacs saw [beta]; if lCtrl-lAlt-b was undefined in the layout,
+ the keypress was completely ignored), and
+ (*) keypresses with the other combinations of handedness of Ctrl-Alt
+ modifiers (e.g., lCtrl-lAlt) as if they NEVER carry a character
+ payload (so they were reported "raw": if lCtrl-lAlt-b was
+ delivering beta, then Emacs saw event [C-A-b], and not [beta]).
+ This worked good for "traditional" layouts: users could type both
+ AltGr-x and Ctrl-Alt-x, and one was a character, another a bindable
+ event.
+
+ However, for layouts which deliver different characters for AltGr-x
+ and lCtrl-lAlt-x, this scheme makes the latter character unaccessible
+ in Emacs. While it is easy to access functionality of [C-M-x] in
+ Emacs by other means (for example, by the `controlify' prefix, or
+ using lCtrl-rCtrl-x, or rCtrl-rAlt-x [in this order]), missing
+ characters cannot be reconstructed without a tedious manual work. */
+
+ /* These two cases are often going to be distinguishable, since at most
+ one of these character is defined with KBDCTRL | KBDMENU modifier
+ bitmap. (This heuristic breaks if both lCtrl-lAlt- AND lCtrl-rAlt-
+ are translated to modifier bitmaps distinct from KBDCTRL | KBDMENU,
+ or in the cases when lCtrl-lAlt-* and lCtrl-rAlt-* are generally
+ different, but lCtrl-lAlt-x and lCtrl-rAlt-x happen to deliver the
+ same character.)
+
+ So we have 2 chunks of info:
+ (A) is it lCtrl-rAlt-, or lCtrl-lAlt, or some other combination?
+ (B) is the delivered character defined with KBDCTRL | KBDMENU bits?
+ Basing on (A) and (B), we should decide whether to ignore the
+ delivered character. (Before, Emacs was completely ignoring (B), and
+ was treating the 3-state of (A) as a bit.) This means that we have 6
+ bits of customization.
+
+ Additionally, a presence of two Ctrl down may be AltGr-rCtrl-.*/
+
+ /* Strip all non-Shift modifiers if:
+ - more than one UTF-16 code point delivered (can't call VkKeyScanW ())
+ - or the character is a result of combining with a prefix key. */
+ if (!after_dead && count == 1 && *b < 0x10000)
+ {
+ if (console_modifiers & (RIGHT_ALT_PRESSED | LEFT_ALT_PRESSED)
+ && console_modifiers & (RIGHT_CTRL_PRESSED | LEFT_CTRL_PRESSED))
+ {
+ type_CtrlAlt = "bB"; /* generic bindable Ctrl-Alt- modifiers */
+ if (console_modifiers & (LEFT_CTRL_PRESSED | RIGHT_CTRL_PRESSED)
+ == (LEFT_CTRL_PRESSED | RIGHT_CTRL_PRESSED))
+ /* double-Ctrl:
+ e.g. AltGr-rCtrl on some layouts (in this order!) */
+ type_CtrlAlt = "dD";
+ else if (console_modifiers
+ & (LEFT_CTRL_PRESSED | LEFT_ALT_PRESSED)
+ == (LEFT_CTRL_PRESSED | LEFT_ALT_PRESSED))
+ type_CtrlAlt = "lL"; /* Ctrl-Alt- modifiers on the left */
+ else if (!NILP (Vw32_recognize_altgr)
+ && (console_modifiers
+ & (RIGHT_ALT_PRESSED | LEFT_CTRL_PRESSED))
+ == (RIGHT_ALT_PRESSED | LEFT_CTRL_PRESSED))
+ type_CtrlAlt = "gG"; /* modifiers as in AltGr */
+ }
+ else if (wmsg.dwModifiers & (alt_modifier | meta_modifier)
+ || (console_modifiers
+ & (RIGHT_WIN_PRESSED | RIGHT_WIN_PRESSED
+ | APPS_PRESSED | SCROLLLOCK_ON)))
+ {
+ /* pure Alt (or combination of Alt, Win, APPS, scrolllock */
+ type_CtrlAlt = "aA";
+ }
+ if (type_CtrlAlt)
+ {
+ /* Out of bound bitmap: */
+ SHORT r = VkKeyScanW( *b ), bitmap = 0x1FF;
+
+ FPRINTF_WM_CHARS((stderr, "VkKeyScanW %#06x %#04x\n", (int)r,
+ wParam));
+ if ((r & 0xFF) == wParam)
+ bitmap = r>>8; /* *b is reachable via simple interface */
+ if (*type_CtrlAlt == 'a') /* Simple Alt seen */
+ {
+ if ((bitmap & ~1) == 0) /* 1: KBDSHIFT */
+ {
+ /* In "traditional" layouts, Alt without Ctrl does not
+ change the delivered character. This detects this
+ situation; it is safe to report this as Alt-something
+ - as opposed to delivering the reported character
+ without modifiers. */
+ if (legacy_alt_meta
+ && *b > 0x7f && ('A' <= wParam && wParam <= 'Z'))
+ /* For backward-compatibility with older Emacsen, let
+ this be processed by another branch below (which
+ would convert it to Alt-Latin char via wParam). */
+ return 0;
+ }
+ else
+ {
+ hairy = 1;
+ }
+ }
+ /* Check whether the delivered character(s) is accessible via
+ KBDCTRL | KBDALT ( | KBDSHIFT ) modifier mask (which is 7). */
+ else if ((bitmap & ~1) != 6)
+ {
+ /* The character is not accessible via plain Ctrl-Alt(-Shift)
+ (which is, probably, same as AltGr) modifiers.
+ Either it was after a prefix key, or is combined with
+ modifier keys which we don't see, or there is an asymmetry
+ between left-hand and right-hand modifiers, or other hairy
+ stuff. */
+ hairy = 1;
+ }
+ /* The best solution is to delegate these tough (but rarely
+ needed) choices to the user. Temporarily (???), it is
+ implemented as C macros.
+
+ Essentially, there are 3 things to do: return 0 (handle to the
+ legacy processing code [ignoring the character payload]; keep
+ some modifiers (so that they will be processed by the binding
+ system [on top of the character payload]; strip modifiers [so
+ that `self-insert' is going to be triggered with the character
+ payload]).
+
+ The default below should cover 99.9999% of cases:
+ (a) strip Alt- in the hairy case only;
+ (stripping = not ignoring)
+ (l) for lAlt-lCtrl, ignore the char in simple cases only;
+ (g) for what looks like AltGr, ignore the modifiers;
+ (d) for what looks like lCtrl-rCtrl-Alt (probably
+ AltGr-rCtrl), ignore the character in simple cases only;
+ (b) for other cases of Ctrl-Alt, ignore the character in
+ simple cases only.
+
+ Essentially, in all hairy cases, and in looks-like-AltGr case,
+ we keep the character, ignoring the modifiers. In all the
+ other cases, we ignore the delivered character.
+ */
+#define S_TYPES_TO_IGNORE_CHARACTER_PAYLOAD "aldb"
+#define S_TYPES_TO_REPORT_CHARACTER_PAYLOAD_WITH_MODIFIERS ""
+ if (strchr(S_TYPES_TO_IGNORE_CHARACTER_PAYLOAD,
+ type_CtrlAlt[hairy]))
+ return 0;
+ /* if in neither list, report all the modifiers we see COMBINED
+ WITH the reported character */
+ if (strchr(S_TYPES_TO_REPORT_CHARACTER_PAYLOAD_WITH_MODIFIERS,
+ type_CtrlAlt[hairy]))
+ strip_ExtraMods = 0;
+ }
+ }
+ if (strip_ExtraMods)
+ wmsg.dwModifiers = wmsg.dwModifiers & shift_modifier;
+
+ signal_user_input ();
+ while (count--)
+ {
+ FPRINTF_WM_CHARS((stderr, "unichar %#06x\n", *b));
+ my_post_msg (&wmsg, hwnd, WM_UNICHAR, *b++, lParam);
+ }
+ if (!ctrl_cnt) /* Process ALSO as ctrl */
+ return 1;
+ else
+ FPRINTF_WM_CHARS((stderr, "extra ctrl char\n"));
+ return -1;
+ }
+ else if (is_dead >= 0)
+ {
+ FPRINTF_WM_CHARS((stderr, "dead %#06x\n", is_dead));
+ after_deadkey = is_dead;
+ return 1;
+ }
+ return 0;
+}
+
/* Main window procedure */
static LRESULT CALLBACK
@@ -2948,6 +3355,15 @@ w32_wnd_proc (HWND hwnd, UINT msg, WPARA
/* Inform lisp thread of keyboard layout changes. */
my_post_msg (&wmsg, hwnd, msg, wParam, lParam);
+ /* The state of the finite automaton is separate per every input
+ language environment (so it does not change when one switches
+ to a different window with the same environment). Moreover,
+ the experiments show that the state is not remembered when
+ one switches back to the pre-previous environment. */
+ after_deadkey = -1;
+
+ /* XXXX??? What follows is a COMPLETE misunderstanding of Windows! */
+
/* Clear dead keys in the keyboard state; for simplicity only
preserve modifier key states. */
{
@@ -3007,7 +3423,6 @@ w32_wnd_proc (HWND hwnd, UINT msg, WPARA
/* Synchronize modifiers with current keystroke. */
sync_modifiers ();
record_keydown (wParam, lParam);
- wParam = map_keypad_keys (wParam, (lParam & 0x1000000L) != 0);
windows_translate = 0;
@@ -3117,6 +3532,46 @@ w32_wnd_proc (HWND hwnd, UINT msg, WPARA
wParam = VK_NUMLOCK;
break;
default:
+ if (w32_unicode_gui) {
+ /* If this event generates characters or deadkeys, do not interpret
+ it as a "raw combination of modifiers and keysym". Hide
+ deadkeys, and use the generated character(s) instead of the
+ keysym. (Backward compatibility: exceptions for numpad keys
+ generating 0-9 . , / * - +, and for extra-Alt combined with a
+ non-Latin char.)
+
+ Try to not report modifiers which have effect on which
+ character or deadkey is generated.
+
+ Example (contrived): if rightAlt-? generates f (on a Cyrillic
+ keyboard layout), and Ctrl, leftAlt do not affect the generated
+ character, one wants to report Ctrl-leftAlt-f if the user
+ presses Ctrl-leftAlt-rightAlt-?. */
+ int res;
+#if 0
+ /* Some of WM_CHAR may be fed to us directly, some are results of
+ TranslateMessage(). Using 0 as the first argument (in a
+ separate call) might help us distinguish these two cases.
+
+ However, the keypress feeders would most probably expect the
+ "standard" message pump, when TranslateMessage() is called on
+ EVERY KeyDown/Keyup event. So they may feed us Down-Ctrl
+ Down-FAKE Char-o and expect us to recognize it as Ctrl-o.
+ Using 0 as the first argument would interfere with this. */
+ deliver_wm_chars (0, hwnd, msg, wParam, lParam, 1);
+#endif
+ /* Processing the generated WM_CHAR messages *WHILE* we handle
+ KEYDOWN/UP event is the best choice, since withoug any fuss,
+ we know all 3 of: scancode, virtual keycode, and expansion.
+ (Additionally, one knows boundaries of expansion of different
+ keypresses.) */
+ res = deliver_wm_chars (1, hwnd, msg, wParam, lParam, 1);
+ windows_translate = -( res != 0 );
+ if (res > 0) /* Bound to character(s) or a deadkey */
+ break;
+ /* deliver_wm_chars() may make some branches after this vestigal */
+ }
+ wParam = map_keypad_keys (wParam, (lParam & 0x1000000L) != 0);
/* If not defined as a function key, change it to a WM_CHAR message. */
if (wParam > 255 || !lispy_function_keys[wParam])
{
@@ -3184,6 +3639,8 @@ w32_wnd_proc (HWND hwnd, UINT msg, WPARA
}
}
+ if (windows_translate == -1)
+ break;
translate:
if (windows_translate)
{
[-- Attachment #3: w32fns.c-diff-v2-relative --]
[-- Type: text/plain, Size: 17600 bytes --]
--- w32fns.c-sent2 2015-07-01 02:56:30.787672000 -0700
+++ w32fns.c 2015-07-08 16:32:11.187197700 -0700
@@ -2932,6 +2932,15 @@ get_wm_chars (HWND aWnd, int *buf, int b
# define FPRINTF_WM_CHARS(ARG) 0
#endif
+/* This is a heuristic only. This is supposed to track the state of the
+ finite automaton in the language environment of Windows.
+
+ However, separate windows (if with the same different language
+ environments!) should have different values. Moreover, switching to a
+ non-Emacs window with the same language environment, and using (dead)keys
+ there would change the value stored in the kernel, but not this value. */
+static int after_deadkey = 0;
+
int
deliver_wm_chars (int do_translate, HWND hwnd, UINT msg, UINT wParam,
UINT lParam, int legacy_alt_meta)
@@ -2940,7 +2949,7 @@ deliver_wm_chars (int do_translate, HWND
points to a keypress.
(However, the "old style" TranslateMessage() would deliver at most 16 of
them.) Be on a safe side, and prepare to treat many more. */
- int ctrl_cnt, buf[1024], count, is_dead;
+ int ctrl_cnt, buf[1024], count, is_dead, after_dead = (after_deadkey != -1);
/* Since the keypress processing logic of Windows has a lot of state, it
is important to call TranslateMessage() for every keyup/keydown, AND
@@ -2956,7 +2965,8 @@ deliver_wm_chars (int do_translate, HWND
So, with the usual message pump, the following call to TranslateMessage()
is not needed (and is going to be VERY harmful). With Emacs' message
pump, the call is needed. */
- if (do_translate) {
+ if (do_translate)
+ {
MSG windows_msg = { hwnd, msg, wParam, lParam, 0, {0,0} };
windows_msg.time = GetMessageTime ();
@@ -2970,13 +2980,22 @@ deliver_wm_chars (int do_translate, HWND
|| modifier_set (VK_CONTROL),
&ctrl_cnt, &is_dead, wParam,
(lParam & 0x1000000L) != 0);
- if (count) {
+ if (count)
+ {
W32Msg wmsg;
- int *b = buf, strip_Alt = 1;
-
- /* wParam is checked when converting CapsLock to Shift */
- wmsg.dwModifiers = do_translate
- ? w32_get_key_modifiers (wParam, lParam) : 0;
+ DWORD console_modifiers = construct_console_modifiers ();
+ int *b = buf, strip_Alt = 1, strip_ExtraMods = 1, hairy = 0;
+ char *type_CtrlAlt = NULL;
+
+ /* XXXX In fact, there may be another case when we need to do the same:
+ What happens if the string defined in the LIGATURES has length
+ 0? Probably, we will get count==0, but the state of the finite
+ automaton would reset to 0??? */
+ after_deadkey = -1;
+
+ /* wParam is checked when converting CapsLock to Shift; this is a clone
+ of w32_get_key_modifiers (). */
+ wmsg.dwModifiers = w32_kbd_mods_to_emacs (console_modifiers, wParam);
/* What follows is just heuristics; the correct treatement requires
non-destructive ToUnicode():
@@ -2991,8 +3010,8 @@ deliver_wm_chars (int do_translate, HWND
Example: assume that we know:
(A) lCtrl+rCtrl+rAlt modifiers with VK_A key produce a Latin "f"
- ("may be logical" with a JCUKEN-flavored Russian keyboard flavor);
- (B) removing any one of lCtrl, rCtrl, rAlt changes the produced char;
+ ("may be logical" in JCUKEN-flavored Russian keyboard flavors);
+ (B) removing any of lCtrl, rCtrl, rAlt changes the produced char;
(C) Win-modifier is not affecting the produced character
(this is the common case: happens with all "standard" layouts).
@@ -3000,7 +3019,7 @@ deliver_wm_chars (int do_translate, HWND
What is the intent of the user? We need to guess the intent to decide
which event to deliver to the application.
- This looks like a reasonable logic: wince Win- modifier does not affect
+ This looks like a reasonable logic: since Win- modifier doesn't affect
the output string, the user was pressing Win for SOME OTHER purpose.
So the user wanted to generate Win-SOMETHING event. Now, what is
something? If one takes the mantra that "character payload is more
@@ -3008,38 +3027,196 @@ deliver_wm_chars (int do_translate, HWND
payload", then one should ignore lCtrl+rCtrl+rAlt, ignore VK_A, and
assume that the user wanted to generate Win-f.
- Unfortunately, without non-destructive ToUnicode(), checking (B) and (C)
- is out of question. So we use heuristics (hopefully, covering 99.9999%
- of cases).
+ Unfortunately, without non-destructive ToUnicode(), checking (B),(C)
+ is out of question. So we use heuristics (hopefully, covering
+ 99.9999% of cases).
*/
- /* If ctrl-something delivers chars, ctrl and the rest should be hidden;
- so the consumer of key-event won't interpret it as an accelerator. */
- if (wmsg.dwModifiers & ctrl_modifier)
- wmsg.dwModifiers = wmsg.dwModifiers & shift_modifier;
- /* In many keyboard layouts, (left) Alt is not changing the character.
- Unless we are in this situation, strip Alt/Meta. */
- if (wmsg.dwModifiers & (alt_modifier | meta_modifier)
- /* If alt-something delivers non-ASCIIchars, alt should be hidden */
- && count == 1 && *b < 0x10000)
- {
- SHORT r = VkKeyScanW( *b );
+ /* Another thing to watch for is a possibility to use AltGr-* and
+ Ctrl-Alt-* with different semantic.
- FPRINTF_WM_CHARS((stderr, "VkKeyScanW %#06x %#04x\n", (int)r, wParam));
- if ((r & 0xFF) == wParam && !(r & ~0x1FF))
- {
- /* Char available without Alt modifier, so Alt is "on top" */
+ Background: the layout defining the KLLF_ALTGR bit are treated
+ specially by the kernel: when VK_RMENU (=rightAlt, =AltGr) is pressed
+ (released), a press (release) of VK_LCONTROL is emulated (unless Ctrl
+ is already down). As a result, any press/release of AltGr is seen
+ by applications as a press/release of lCtrl AND rAlt. This is
+ applicable, in particular, to ToUnicode[Ex](). (Keyrepeat is covered
+ the same way!)
+
+ NOTE: it IS possible to see bare rAlt even with KLLF_ALTGR; but this
+ requires a good finger coordination: doing (physically)
+ Down-lCtrl Down-rAlt Up-lCtrl Down-a
+ (doing quick enough, so that key repeat of rAlt [which would
+ generate new "fake" Down-lCtrl events] does not happens before 'a'
+ is down) results in no "fake" events, so the application will see
+ only rAlt down when 'a' is pressed. (However, fake Up-lCtrl WILL
+ be generated when rAlt goes UP.)
+
+ In fact, note also that KLLF_ALTGR does not prohibit construction of
+ rCtrl-rAlt (just press them in this order!).
+
+ Moreover: "traditional" layouts do not define distinct modifier-masks
+ for VK_LMENU and VK_RMENU (same for VK_L/RCONTROL). Instead, they
+ rely on the KLLF_ALTGR bit to make the behaviour of VK_LMENU and
+ VK_RMENU distinct. As a corollary, for such layouts, the produced
+ character is the same for AltGr-* (=rAlt-*) and Ctrl-Alt-* (in any
+ combination of handedness). For description of masks, see
+
+ http://search.cpan.org/~ilyaz/UI-KeyboardLayout/lib/UI/KeyboardLayout.pm#Keyboard_input_on_Windows,_Part_I:_what_is_the_kernel_doing?
+
+ By default, Emacs was using these coincidences via the following
+ heuristics: it was treating:
+ (*) keypresses with lCtrl-rAlt modifiers as if they are carrying
+ ONLY the character payload (no matter what the actual keyboard
+ was defining: if lCtrl-lAlt-b was delivering U+05df=beta, then
+ Emacs saw [beta]; if lCtrl-lAlt-b was undefined in the layout,
+ the keypress was completely ignored), and
+ (*) keypresses with the other combinations of handedness of Ctrl-Alt
+ modifiers (e.g., lCtrl-lAlt) as if they NEVER carry a character
+ payload (so they were reported "raw": if lCtrl-lAlt-b was
+ delivering beta, then Emacs saw event [C-A-b], and not [beta]).
+ This worked good for "traditional" layouts: users could type both
+ AltGr-x and Ctrl-Alt-x, and one was a character, another a bindable
+ event.
+
+ However, for layouts which deliver different characters for AltGr-x
+ and lCtrl-lAlt-x, this scheme makes the latter character unaccessible
+ in Emacs. While it is easy to access functionality of [C-M-x] in
+ Emacs by other means (for example, by the `controlify' prefix, or
+ using lCtrl-rCtrl-x, or rCtrl-rAlt-x [in this order]), missing
+ characters cannot be reconstructed without a tedious manual work. */
+
+ /* These two cases are often going to be distinguishable, since at most
+ one of these character is defined with KBDCTRL | KBDMENU modifier
+ bitmap. (This heuristic breaks if both lCtrl-lAlt- AND lCtrl-rAlt-
+ are translated to modifier bitmaps distinct from KBDCTRL | KBDMENU,
+ or in the cases when lCtrl-lAlt-* and lCtrl-rAlt-* are generally
+ different, but lCtrl-lAlt-x and lCtrl-rAlt-x happen to deliver the
+ same character.)
+
+ So we have 2 chunks of info:
+ (A) is it lCtrl-rAlt-, or lCtrl-lAlt, or some other combination?
+ (B) is the delivered character defined with KBDCTRL | KBDMENU bits?
+ Basing on (A) and (B), we should decide whether to ignore the
+ delivered character. (Before, Emacs was completely ignoring (B), and
+ was treating the 3-state of (A) as a bit.) This means that we have 6
+ bits of customization.
+
+ Additionally, a presence of two Ctrl down may be AltGr-rCtrl-.*/
+
+ /* Strip all non-Shift modifiers if:
+ - more than one UTF-16 code point delivered (can't call VkKeyScanW ())
+ - or the character is a result of combining with a prefix key. */
+ if (!after_dead && count == 1 && *b < 0x10000)
+ {
+ if (console_modifiers & (RIGHT_ALT_PRESSED | LEFT_ALT_PRESSED)
+ && console_modifiers & (RIGHT_CTRL_PRESSED | LEFT_CTRL_PRESSED))
+ {
+ type_CtrlAlt = "bB"; /* generic bindable Ctrl-Alt- modifiers */
+ if (console_modifiers & (LEFT_CTRL_PRESSED | RIGHT_CTRL_PRESSED)
+ == (LEFT_CTRL_PRESSED | RIGHT_CTRL_PRESSED))
+ /* double-Ctrl:
+ e.g. AltGr-rCtrl on some layouts (in this order!) */
+ type_CtrlAlt = "dD";
+ else if (console_modifiers
+ & (LEFT_CTRL_PRESSED | LEFT_ALT_PRESSED)
+ == (LEFT_CTRL_PRESSED | LEFT_ALT_PRESSED))
+ type_CtrlAlt = "lL"; /* Ctrl-Alt- modifiers on the left */
+ else if (!NILP (Vw32_recognize_altgr)
+ && (console_modifiers
+ & (RIGHT_ALT_PRESSED | LEFT_CTRL_PRESSED))
+ == (RIGHT_ALT_PRESSED | LEFT_CTRL_PRESSED))
+ type_CtrlAlt = "gG"; /* modifiers as in AltGr */
+ }
+ else if (wmsg.dwModifiers & (alt_modifier | meta_modifier)
+ || (console_modifiers
+ & (RIGHT_WIN_PRESSED | RIGHT_WIN_PRESSED
+ | APPS_PRESSED | SCROLLLOCK_ON)))
+ {
+ /* pure Alt (or combination of Alt, Win, APPS, scrolllock */
+ type_CtrlAlt = "aA";
+ }
+ if (type_CtrlAlt)
+ {
+ /* Out of bound bitmap: */
+ SHORT r = VkKeyScanW( *b ), bitmap = 0x1FF;
+
+ FPRINTF_WM_CHARS((stderr, "VkKeyScanW %#06x %#04x\n", (int)r,
+ wParam));
+ if ((r & 0xFF) == wParam)
+ bitmap = r>>8; /* *b is reachable via simple interface */
+ if (*type_CtrlAlt == 'a') /* Simple Alt seen */
+ {
+ if ((bitmap & ~1) == 0) /* 1: KBDSHIFT */
+ {
+ /* In "traditional" layouts, Alt without Ctrl does not
+ change the delivered character. This detects this
+ situation; it is safe to report this as Alt-something
+ - as opposed to delivering the reported character
+ without modifiers. */
if (legacy_alt_meta
&& *b > 0x7f && ('A' <= wParam && wParam <= 'Z'))
/* For backward-compatibility with older Emacsen, let
- this be processed by another branch below (which would convert
- it to Alt-Latin char via wParam). */
+ this be processed by another branch below (which
+ would convert it to Alt-Latin char via wParam). */
+ return 0;
+ }
+ else
+ {
+ hairy = 1;
+ }
+ }
+ /* Check whether the delivered character(s) is accessible via
+ KBDCTRL | KBDALT ( | KBDSHIFT ) modifier mask (which is 7). */
+ else if ((bitmap & ~1) != 6)
+ {
+ /* The character is not accessible via plain Ctrl-Alt(-Shift)
+ (which is, probably, same as AltGr) modifiers.
+ Either it was after a prefix key, or is combined with
+ modifier keys which we don't see, or there is an asymmetry
+ between left-hand and right-hand modifiers, or other hairy
+ stuff. */
+ hairy = 1;
+ }
+ /* The best solution is to delegate these tough (but rarely
+ needed) choices to the user. Temporarily (???), it is
+ implemented as C macros.
+
+ Essentially, there are 3 things to do: return 0 (handle to the
+ legacy processing code [ignoring the character payload]; keep
+ some modifiers (so that they will be processed by the binding
+ system [on top of the character payload]; strip modifiers [so
+ that `self-insert' is going to be triggered with the character
+ payload]).
+
+ The default below should cover 99.9999% of cases:
+ (a) strip Alt- in the hairy case only;
+ (stripping = not ignoring)
+ (l) for lAlt-lCtrl, ignore the char in simple cases only;
+ (g) for what looks like AltGr, ignore the modifiers;
+ (d) for what looks like lCtrl-rCtrl-Alt (probably
+ AltGr-rCtrl), ignore the character in simple cases only;
+ (b) for other cases of Ctrl-Alt, ignore the character in
+ simple cases only.
+
+ Essentially, in all hairy cases, and in looks-like-AltGr case,
+ we keep the character, ignoring the modifiers. In all the
+ other cases, we ignore the delivered character.
+ */
+#define S_TYPES_TO_IGNORE_CHARACTER_PAYLOAD "aldb"
+#define S_TYPES_TO_REPORT_CHARACTER_PAYLOAD_WITH_MODIFIERS ""
+ if (strchr(S_TYPES_TO_IGNORE_CHARACTER_PAYLOAD,
+ type_CtrlAlt[hairy]))
return 0;
- strip_Alt = 0;
+ /* if in neither list, report all the modifiers we see COMBINED
+ WITH the reported character */
+ if (strchr(S_TYPES_TO_REPORT_CHARACTER_PAYLOAD_WITH_MODIFIERS,
+ type_CtrlAlt[hairy]))
+ strip_ExtraMods = 0;
}
}
- if (strip_Alt)
- wmsg.dwModifiers = wmsg.dwModifiers & ~(alt_modifier | meta_modifier);
+ if (strip_ExtraMods)
+ wmsg.dwModifiers = wmsg.dwModifiers & shift_modifier;
signal_user_input ();
while (count--)
@@ -3052,8 +3229,11 @@ deliver_wm_chars (int do_translate, HWND
else
FPRINTF_WM_CHARS((stderr, "extra ctrl char\n"));
return -1;
- } else if (is_dead >= 0) {
+ }
+ else if (is_dead >= 0)
+ {
FPRINTF_WM_CHARS((stderr, "dead %#06x\n", is_dead));
+ after_deadkey = is_dead;
return 1;
}
return 0;
@@ -3175,6 +3355,15 @@ w32_wnd_proc (HWND hwnd, UINT msg, WPARA
/* Inform lisp thread of keyboard layout changes. */
my_post_msg (&wmsg, hwnd, msg, wParam, lParam);
+ /* The state of the finite automaton is separate per every input
+ language environment (so it does not change when one switches
+ to a different window with the same environment). Moreover,
+ the experiments show that the state is not remembered when
+ one switches back to the pre-previous environment. */
+ after_deadkey = -1;
+
+ /* XXXX??? What follows is a COMPLETE misunderstanding of Windows! */
+
/* Clear dead keys in the keyboard state; for simplicity only
preserve modifier key states. */
{
next prev parent reply other threads:[~2015-07-09 0:02 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-03 23:09 bug#19994: 25.0.50; Unicode keyboard input on Windows Ilya Zakharevich
2015-03-04 18:01 ` Eli Zaretskii
2015-03-06 0:43 ` Ilya Zakharevich
2015-03-06 10:52 ` Eli Zaretskii
2015-03-06 11:40 ` Ilya Zakharevich
2015-03-06 14:00 ` Eli Zaretskii
2015-07-01 10:07 ` Ilya Zakharevich
2015-07-09 0:02 ` Ilya Zakharevich [this message]
2015-07-31 9:23 ` Eli Zaretskii
2015-08-01 7:40 ` Eli Zaretskii
2015-08-02 14:42 ` Eli Zaretskii
2020-08-12 16:32 ` Stefan Kangas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150709000259.GA7163@math.berkeley.edu \
--to=ilya@math.berkeley.edu \
--cc=19994@debbugs.gnu.org \
--cc=eliz@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).