unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Ilya Zakharevich <ilya@math.berkeley.edu>
To: Eli Zaretskii <eliz@gnu.org>
Cc: 19994@debbugs.gnu.org
Subject: bug#19994: 25.0.50; Unicode keyboard input on Windows
Date: Wed, 8 Jul 2015 17:02:59 -0700	[thread overview]
Message-ID: <20150709000259.GA7163@math.berkeley.edu> (raw)
In-Reply-To: <20150701100712.GA24175@math.berkeley.edu>

[-- Attachment #1: Type: text/plain, Size: 2011 bytes --]

On Wed, Jul 01, 2015 at 03:07:12AM -0700, Ilya Zakharevich wrote:
> On Wed, Mar 04, 2015 at 08:01:01PM +0200, Eli Zaretskii wrote:

> > I suggest, indeed, to clean up the code so we could commit it to the
> > master branch.  That way, it will get wider testing, and we can fix

> I had no time to work on the code itself, but
>   • I fixed the formatting,
>   • I pumped up the docs,
>   • I put in the suggested eassert().

The variant I sent was too primitive — it was not covering a (common?)
usage case when (with AltGr-layouts) leftCtrl+rightCtrl was behaving
differently than pressing AltGr:
   • leftCtrl+rightCtrl would trigger C-M-key;
   • altGr would enter the character payload.

This update

  (0) fixes two formatting-style omissions;

  (A) adds A LOAD of new comments;
  (B) treats such important cases (as above) separately;

  (z) Marks a piece of old code which does not make any sense.
        (see the last chunk in the relative patch)

Notes:

  • In (B), there are some decisions to make.  I encapsulate these
    decisions into two strings.  For best result, these strings should
    be user-customizable.  However, currently they are just put into
    C #defines.

    When I sit on this more, and if these customizations turn out to
    be useful, one can make them into Lisp variables.

  • There is a bug in the (old) Emacs code which prevents some cases
    treated in (B) from being really useful.  I did not fix it yet.

    To see the bug:
      ∘ switch to layout with AltGr;
      ∘ assume that AltGr-s produces ß (as with US International);
      ∘ pressing AltGr-rightControl-s produces Meta-ß;
      ∘ pressing rightControl-AltGr-s produces C-M-s.
    (I do not think this effect is intentional.)

  • And, BTW, is it documented anywhere that
    leftControl-rightControl-key produces C-M-key?

I include two patches:
  □   absolute (ignore the previous patches)
  □   relative (with whitespace ignored) — for reading.

Enjoy,
Ilya

[-- Attachment #2: w32fns.c-diff-v2 --]
[-- Type: text/plain, Size: 24021 bytes --]

--- w32fns.c-ini	2015-01-30 15:33:23.505201400 -0800
+++ w32fns.c	2015-07-08 16:32:11.187197700 -0700
@@ -2832,6 +2832,413 @@ post_character_message (HWND hwnd, UINT
   my_post_msg (&wmsg, hwnd, msg, wParam, lParam);
 }
 
+static int
+get_wm_chars (HWND aWnd, int *buf, int buflen, int ignore_ctrl, int ctrl, 
+              int *ctrl_cnt, int *is_dead, int vk, int exp)
+{
+  MSG msg;
+  /* If doubled is at the end, ignore it */
+  int i = buflen, doubled = 0, code_unit;
+
+  if (ctrl_cnt)
+    *ctrl_cnt = 0;
+  if (is_dead)
+    *is_dead = -1;
+  eassert(w32_unicode_gui);
+  while (buflen
+  	 /* Should be called only when w32_unicode_gui: */
+         && PeekMessageW(&msg, aWnd, WM_KEYFIRST, WM_KEYLAST, 
+         	      PM_NOREMOVE | PM_NOYIELD)
+         && (msg.message == WM_CHAR || msg.message == WM_SYSCHAR 
+             || msg.message == WM_DEADCHAR || msg.message == WM_SYSDEADCHAR 
+             || msg.message == WM_UNICHAR)) 
+    { 
+      /* We extract character payload, but in this call we handle only the 
+         characters which comes BEFORE the next keyup/keydown message. */
+      int dead;
+
+      GetMessageW(&msg, aWnd, msg.message, msg.message);
+      dead = (msg.message == WM_DEADCHAR || msg.message == WM_SYSDEADCHAR);
+      if (is_dead)
+        *is_dead = (dead ? msg.wParam : -1);
+      if (dead)
+        continue;
+      code_unit = msg.wParam;
+      if (doubled) 
+        { 
+          /* had surrogate */
+          if (msg.message == WM_UNICHAR 
+              || code_unit < 0xDC00 || code_unit > 0xDFFF) 
+            { /* Mismatched first surrogate.  
+                 Pass both code units as if they were two characters. */
+              *buf++ = doubled;
+              if (!--buflen)
+                return i; /* Drop the 2nd char if at the end of the buffer. */
+            } 
+          else /* see https://en.wikipedia.org/wiki/UTF-16 */
+            {
+              code_unit = (doubled << 10) + code_unit - 0x35FDC00;
+            }
+          doubled = 0;
+        } 
+      else if (code_unit >= 0xD800 && code_unit <= 0xDBFF) 
+        {    
+          /* Handle mismatched 2nd surrogate the same as a normal character. */
+          doubled = code_unit;
+          continue;
+        }
+
+      /* The only "fake" characters delivered by ToUnicode() or 
+         TranslateMessage() are: 
+         0x01 .. 0x1a for Ctrl-letter, Enter, Tab, Ctrl-Break, Esc, Backspace
+         0x00 and 0x1b .. 0x1f for Control- []\@^_ 
+         0x7f for Control-BackSpace
+         0x20 for Control-Space */
+      if (ignore_ctrl 
+          && (code_unit < 0x20 || code_unit == 0x7f 
+              || (code_unit == 0x20 && ctrl))) 
+        { 
+          /* Non-character payload in a WM_CHAR
+             (Ctrl-something pressed, see above).  Ignore, and report. */
+          if (ctrl_cnt)
+            *ctrl_cnt++;
+          continue;
+        }
+      /* Traditionally, Emacs would ignore the character payload of VK_NUMPAD* 
+         keys, and would treat them later via `function-key-map'.  In addition
+         to usual 102-key NUMPAD keys, this map also treats `kp-'-variants of
+         space, tab, enter, separator, equal.  TAB  and EQUAL, apparently, 
+         cannot be generated on Win-GUI branch.  ENTER is already handled 
+         by the code above.  According to `lispy_function_keys', kp_space is 
+         generated by not-extended VK_CLEAR.  (kp-tab !=  VK_OEM_NEC_EQUAL!). 
+       
+         We do similarly for backward-compatibility, but ignore only the
+         characters restorable later by `function-key-map'. */
+      if (code_unit < 0x7f 
+          && ((vk >= VK_NUMPAD0 && vk <= VK_DIVIDE) 
+              || (exp && ((vk >= VK_PRIOR && vk <= VK_DOWN) || 
+                     vk == VK_INSERT || vk == VK_DELETE || vk == VK_CLEAR))) 
+          && strchr("0123456789/*-+.,", code_unit))
+        continue;
+      *buf++ = code_unit;
+      buflen--;
+    }
+  return i - buflen;
+}
+
+#ifdef DBG_WM_CHARS
+#  define FPRINTF_WM_CHARS(ARG)	fprintf ARG
+#else
+#  define FPRINTF_WM_CHARS(ARG)	0
+#endif
+
+/* This is a heuristic only.  This is supposed to track the state of the
+   finite automaton in the language environment of Windows. 
+   
+   However, separate windows (if with the same different language 
+   environments!) should  have different values.  Moreover, switching to a 
+   non-Emacs window with the same language environment, and using (dead)keys 
+   there would change the value stored in the kernel, but not this value. */
+static int after_deadkey = 0;
+
+int
+deliver_wm_chars (int do_translate, HWND hwnd, UINT msg, UINT wParam, 
+                  UINT lParam, int legacy_alt_meta)
+{
+  /* An "old style" keyboard description may assign up to 125 UTF-16 code 
+     points to a keypress. 
+     (However, the "old style" TranslateMessage() would deliver at most 16 of 
+     them.)  Be on a safe side, and prepare to treat many more. */
+  int ctrl_cnt, buf[1024], count, is_dead, after_dead = (after_deadkey != -1);
+
+  /* Since the keypress processing logic of Windows has a lot of state, it 
+     is important to call TranslateMessage() for every keyup/keydown, AND
+     do it exactly once.  (The actual change of state is done by
+     ToUnicode[Ex](), which is called by TranslateMessage().  So one can
+     call ToUnicode[Ex]() instead.)
+     
+     The "usual" message pump calls TranslateMessage() for EVERY event.
+     Emacs calls TranslateMessage() very selectively (is it needed for doing 
+     some tricky stuff with Win95???  With newer Windows, selectiveness is,
+     most probably, not needed - and harms a lot). 
+     
+     So, with the usual message pump, the following call to TranslateMessage() 
+     is not needed (and is going to be VERY harmful).  With Emacs' message 
+     pump, the call is needed.  */
+  if (do_translate) 
+    {
+      MSG windows_msg = { hwnd, msg, wParam, lParam, 0, {0,0} };
+
+      windows_msg.time = GetMessageTime ();
+      TranslateMessage (&windows_msg);
+    }
+  count = get_wm_chars (hwnd, buf, sizeof(buf)/sizeof(*buf), 1,
+                        /* The message may have been synthesized by 
+                           who knows what; be conservative. */
+                        modifier_set (VK_LCONTROL) 
+                          || modifier_set (VK_RCONTROL) 
+                          || modifier_set (VK_CONTROL), 
+                        &ctrl_cnt, &is_dead, wParam, 
+                        (lParam & 0x1000000L) != 0);
+  if (count) 
+    {
+      W32Msg wmsg;
+      DWORD console_modifiers = construct_console_modifiers ();
+      int *b = buf, strip_Alt = 1, strip_ExtraMods = 1, hairy = 0;
+      char *type_CtrlAlt = NULL;
+
+      /*  XXXX In fact, there may be another case when we need to do the same: 
+               What happens if the string defined in the LIGATURES has length
+               0?  Probably, we will get count==0, but the state of the finite
+               automaton would reset to 0???  */
+     after_deadkey = -1;
+      
+      /* wParam is checked when converting CapsLock to Shift; this is a clone 
+         of w32_get_key_modifiers (). */
+      wmsg.dwModifiers = w32_kbd_mods_to_emacs (console_modifiers, wParam);
+
+      /* What follows is just heuristics; the correct treatement requires 
+         non-destructive ToUnicode(): 
+           http://search.cpan.org/~ilyaz/UI-KeyboardLayout/lib/UI/KeyboardLayout.pm#Can_an_application_on_Windows_accept_keyboard_events?_Part_IV:_application-specific_modifiers
+
+         What one needs to find is: 
+           * which of the present modifiers AFFECT the resulting char(s) 
+             (so should be stripped, since their EFFECT is "already
+              taken into account" in the string in buf), and 
+           * which modifiers are not affecting buf, so should be reported to
+             the application for further treatment.
+       
+         Example: assume that we know:
+           (A) lCtrl+rCtrl+rAlt modifiers with VK_A key produce a Latin "f"
+               ("may be logical" in JCUKEN-flavored Russian keyboard flavors);
+           (B) removing any of lCtrl, rCtrl, rAlt changes the produced char;
+           (C) Win-modifier is not affecting the produced character 
+               (this is the common case: happens with all "standard" layouts).
+
+         Suppose the user presses Win+lCtrl+rCtrl+rAlt modifiers with VK_A.
+         What is the intent of the user?  We need to guess the intent to decide
+         which event to deliver to the application.
+       
+         This looks like a reasonable logic: since Win- modifier doesn't affect
+         the output string, the user was pressing Win for SOME OTHER purpose.
+         So the user wanted to generate Win-SOMETHING event.  Now, what is
+         something?  If one takes the mantra that "character payload is more 
+         important than the combination of keypresses which resulted in this 
+         payload", then one should ignore lCtrl+rCtrl+rAlt, ignore VK_A, and
+         assume that the user wanted to generate Win-f.
+       
+         Unfortunately, without non-destructive ToUnicode(), checking (B),(C)
+         is out of question.  So we use heuristics (hopefully, covering 
+         99.9999% of cases).
+       */
+
+      /* Another thing to watch for is a possibility to use AltGr-* and 
+         Ctrl-Alt-* with different semantic.
+     
+         Background: the layout defining the KLLF_ALTGR bit are treated 
+         specially by the kernel: when VK_RMENU (=rightAlt, =AltGr) is pressed 
+         (released), a press (release) of VK_LCONTROL is emulated (unless Ctrl 
+         is already down).  As a result, any press/release of AltGr is seen 
+         by applications as a press/release of lCtrl AND rAlt.  This is 
+         applicable, in particular, to ToUnicode[Ex]().  (Keyrepeat is covered
+         the same way!)
+     
+           NOTE: it IS possible to see bare rAlt even with KLLF_ALTGR; but this
+           requires a good finger coordination: doing (physically)
+             Down-lCtrl Down-rAlt Up-lCtrl Down-a 
+           (doing quick enough, so that key repeat of rAlt [which would 
+           generate new "fake" Down-lCtrl events] does not happens before 'a' 
+           is down) results in no "fake" events, so the application will see 
+           only rAlt down when 'a' is pressed.  (However, fake Up-lCtrl WILL 
+           be generated when rAlt goes UP.)
+       
+           In fact, note also that KLLF_ALTGR does not prohibit construction of 
+           rCtrl-rAlt (just press them in this order!).
+     
+         Moreover: "traditional" layouts do not define distinct modifier-masks
+         for VK_LMENU and VK_RMENU (same for VK_L/RCONTROL).  Instead, they 
+         rely on the KLLF_ALTGR bit to make the behaviour of VK_LMENU and 
+         VK_RMENU distinct.  As a corollary, for such layouts, the produced 
+         character is the same for AltGr-* (=rAlt-*) and Ctrl-Alt-* (in any 
+         combination of handedness).  For description of masks, see
+     
+           http://search.cpan.org/~ilyaz/UI-KeyboardLayout/lib/UI/KeyboardLayout.pm#Keyboard_input_on_Windows,_Part_I:_what_is_the_kernel_doing?
+     
+         By default, Emacs was using these coincidences via the following
+         heuristics: it was treating: 
+          (*) keypresses with lCtrl-rAlt modifiers as if they are carrying 
+              ONLY the character payload (no matter what the actual keyboard 
+              was defining: if lCtrl-lAlt-b was delivering U+05df=beta, then 
+              Emacs saw [beta]; if lCtrl-lAlt-b was undefined in the layout, 
+              the keypress was completely ignored), and
+          (*) keypresses with the other combinations of handedness of Ctrl-Alt 
+              modifiers (e.g., lCtrl-lAlt) as if they NEVER carry a character 
+              payload (so they were reported "raw": if lCtrl-lAlt-b was 
+              delivering beta, then Emacs saw event [C-A-b], and not [beta]).
+         This worked good for "traditional" layouts: users could type both 
+         AltGr-x and Ctrl-Alt-x, and one was a character, another a bindable 
+         event.
+     
+         However, for layouts which deliver different characters for AltGr-x 
+         and lCtrl-lAlt-x, this scheme makes the latter character unaccessible 
+         in Emacs.  While it is easy to access functionality of [C-M-x] in 
+         Emacs by other means (for example, by the `controlify' prefix, or 
+         using lCtrl-rCtrl-x, or rCtrl-rAlt-x [in this order]), missing 
+         characters cannot be reconstructed without a tedious manual work. */
+     
+      /* These two cases are often going to be distinguishable, since at most 
+         one of these character is defined with KBDCTRL | KBDMENU modifier 
+         bitmap.  (This heuristic breaks if both lCtrl-lAlt- AND lCtrl-rAlt- 
+         are translated to modifier bitmaps distinct from KBDCTRL | KBDMENU, 
+         or in the cases when lCtrl-lAlt-* and lCtrl-rAlt-* are generally 
+         different, but lCtrl-lAlt-x and lCtrl-rAlt-x happen to deliver the 
+         same character.)
+     
+         So we have 2 chunks of info:
+           (A) is it lCtrl-rAlt-, or lCtrl-lAlt, or some other combination?
+           (B) is the delivered character defined with KBDCTRL | KBDMENU bits?
+         Basing on (A) and (B), we should decide whether to ignore the 
+         delivered character.  (Before, Emacs was completely ignoring (B), and 
+         was treating the 3-state of (A) as a bit.)  This means that we have 6 
+         bits of customization. 
+         
+         Additionally, a presence of two Ctrl down may be AltGr-rCtrl-.*/
+
+      /* Strip all non-Shift modifiers if: 
+        - more than one UTF-16 code point delivered (can't call VkKeyScanW ())
+        - or the character is a result of combining with a prefix key. */
+      if (!after_dead && count == 1 && *b < 0x10000)
+        {
+          if (console_modifiers & (RIGHT_ALT_PRESSED | LEFT_ALT_PRESSED)
+              && console_modifiers & (RIGHT_CTRL_PRESSED | LEFT_CTRL_PRESSED))
+            {
+              type_CtrlAlt = "bB";   /* generic bindable Ctrl-Alt- modifiers */
+              if (console_modifiers & (LEFT_CTRL_PRESSED | RIGHT_CTRL_PRESSED)
+                  == (LEFT_CTRL_PRESSED | RIGHT_CTRL_PRESSED))
+                 /* double-Ctrl: 
+                    e.g. AltGr-rCtrl on some layouts (in this order!) */
+                type_CtrlAlt = "dD";
+              else if (console_modifiers 
+              	       & (LEFT_CTRL_PRESSED | LEFT_ALT_PRESSED)
+                       == (LEFT_CTRL_PRESSED | LEFT_ALT_PRESSED))
+                type_CtrlAlt = "lL"; /* Ctrl-Alt- modifiers on the left */
+              else if (!NILP (Vw32_recognize_altgr)
+                       && (console_modifiers 
+                           & (RIGHT_ALT_PRESSED | LEFT_CTRL_PRESSED))
+                          == (RIGHT_ALT_PRESSED | LEFT_CTRL_PRESSED))
+                type_CtrlAlt = "gG"; /* modifiers as in AltGr */
+            }
+          else if (wmsg.dwModifiers & (alt_modifier | meta_modifier)
+                   || (console_modifiers 
+                       & (RIGHT_WIN_PRESSED | RIGHT_WIN_PRESSED 
+                          | APPS_PRESSED | SCROLLLOCK_ON)))
+            {
+              /* pure Alt (or combination of Alt, Win, APPS, scrolllock */
+              type_CtrlAlt = "aA";
+            }
+          if (type_CtrlAlt)
+            {
+              /* Out of bound bitmap: */
+              SHORT r = VkKeyScanW( *b ), bitmap = 0x1FF;
+
+              FPRINTF_WM_CHARS((stderr, "VkKeyScanW %#06x %#04x\n", (int)r, 
+                               wParam));
+              if ((r & 0xFF) == wParam)
+                bitmap = r>>8; /* *b is reachable via simple interface */
+              if (*type_CtrlAlt == 'a') /* Simple Alt seen */
+                {
+                  if ((bitmap & ~1) == 0) /* 1: KBDSHIFT */
+                    {
+                      /* In "traditional" layouts, Alt without Ctrl does not 
+                         change the delivered character.  This detects this 
+                         situation; it is safe to report this as Alt-something 
+                          - as opposed to delivering the reported character 
+                          without modifiers. */
+                      if (legacy_alt_meta 
+                          && *b > 0x7f && ('A' <= wParam && wParam <= 'Z'))
+                        /* For backward-compatibility with older Emacsen, let
+                           this be processed by another branch below (which 
+                           would convert it to Alt-Latin char via wParam). */
+                        return 0;
+                    }
+                  else
+                    {
+                      hairy = 1;
+                    }
+                }
+              /* Check whether the delivered character(s) is accessible via 
+                 KBDCTRL | KBDALT ( | KBDSHIFT ) modifier mask (which is 7). */
+              else if ((bitmap & ~1) != 6)
+                {
+                  /* The character is not accessible via plain Ctrl-Alt(-Shift) 
+                     (which is, probably, same as AltGr) modifiers. 
+                     Either it was after a prefix key, or is combined with 
+                     modifier keys which we don't see, or there is an asymmetry
+                     between left-hand and right-hand modifiers, or other hairy
+                     stuff. */
+                  hairy = 1;
+                }
+              /* The best solution is to delegate these tough (but rarely 
+                 needed) choices to the user.  Temporarily (???), it is 
+                 implemented as C macros.
+               
+                 Essentially, there are 3 things to do: return 0 (handle to the
+                 legacy processing code [ignoring the character payload]; keep 
+                 some modifiers (so that they will be processed by the binding 
+                 system [on top of the character payload]; strip modifiers [so 
+                 that `self-insert' is going to be triggered with the character 
+                 payload]). 
+                 
+                 The default below should cover 99.9999% of cases: 
+                   (a) strip Alt- in the hairy case only;  
+                       (stripping = not ignoring) 
+                   (l) for lAlt-lCtrl, ignore the char in simple cases only;
+                   (g) for what looks like AltGr, ignore the modifiers;
+                   (d) for what looks like lCtrl-rCtrl-Alt (probably
+                       AltGr-rCtrl), ignore the character in simple cases only;
+                   (b) for other cases of Ctrl-Alt, ignore the character in
+                       simple cases only.
+
+                 Essentially, in all hairy cases, and in looks-like-AltGr case,
+                 we keep the character, ignoring the modifiers.  In all the
+                 other cases, we ignore the delivered character.
+                */
+#define S_TYPES_TO_IGNORE_CHARACTER_PAYLOAD "aldb"
+#define S_TYPES_TO_REPORT_CHARACTER_PAYLOAD_WITH_MODIFIERS ""
+              if (strchr(S_TYPES_TO_IGNORE_CHARACTER_PAYLOAD, 
+			 type_CtrlAlt[hairy]))
+                return 0;
+              /* if in neither list, report all the modifiers we see COMBINED 
+                 WITH the reported character */
+              if (strchr(S_TYPES_TO_REPORT_CHARACTER_PAYLOAD_WITH_MODIFIERS, 
+			  type_CtrlAlt[hairy]))
+                strip_ExtraMods = 0;
+            }
+        }
+      if (strip_ExtraMods)
+        wmsg.dwModifiers = wmsg.dwModifiers & shift_modifier;
+    
+      signal_user_input ();
+      while (count--)
+        {
+          FPRINTF_WM_CHARS((stderr, "unichar %#06x\n", *b));
+          my_post_msg (&wmsg, hwnd, WM_UNICHAR, *b++, lParam);
+        }
+      if (!ctrl_cnt) /* Process ALSO as ctrl */
+        return 1;
+      else
+          FPRINTF_WM_CHARS((stderr, "extra ctrl char\n"));
+      return -1;
+    } 
+  else if (is_dead >= 0) 
+    {
+      FPRINTF_WM_CHARS((stderr, "dead %#06x\n", is_dead));
+      after_deadkey = is_dead;
+      return 1;
+    } 
+  return 0;
+}
+
 /* Main window procedure */
 
 static LRESULT CALLBACK
@@ -2948,6 +3355,15 @@ w32_wnd_proc (HWND hwnd, UINT msg, WPARA
       /* Inform lisp thread of keyboard layout changes.  */
       my_post_msg (&wmsg, hwnd, msg, wParam, lParam);
 
+      /* The state of the finite automaton is separate per every input 
+         language environment (so it does not change when one switches 
+         to a different window with the same environment).  Moreover,
+         the experiments show that the state is not remembered when
+         one switches back to the pre-previous environment. */
+      after_deadkey = -1;
+
+      /* XXXX??? What follows is a COMPLETE misunderstanding of Windows! */
+      
       /* Clear dead keys in the keyboard state; for simplicity only
          preserve modifier key states.  */
       {
@@ -3007,7 +3423,6 @@ w32_wnd_proc (HWND hwnd, UINT msg, WPARA
       /* Synchronize modifiers with current keystroke.  */
       sync_modifiers ();
       record_keydown (wParam, lParam);
-      wParam = map_keypad_keys (wParam, (lParam & 0x1000000L) != 0);
 
       windows_translate = 0;
 
@@ -3117,6 +3532,46 @@ w32_wnd_proc (HWND hwnd, UINT msg, WPARA
 	    wParam = VK_NUMLOCK;
 	  break;
 	default:
+	  if (w32_unicode_gui) {	
+	    /* If this event generates characters or deadkeys, do not interpret 
+	       it as a "raw combination of modifiers and keysym".  Hide  
+	       deadkeys, and use the generated character(s) instead of the  
+	       keysym.   (Backward compatibility: exceptions for numpad keys 
+	       generating 0-9 . , / * - +, and for extra-Alt combined with a 
+	       non-Latin char.) 
+	       
+	       Try to not report modifiers which have effect on which 
+	       character or deadkey is generated.
+	       
+	       Example (contrived): if rightAlt-? generates f (on a Cyrillic 
+	       keyboard layout), and Ctrl, leftAlt do not affect the generated
+	       character, one wants to report Ctrl-leftAlt-f if the user 
+	       presses Ctrl-leftAlt-rightAlt-?. */
+	    int res; 
+#if 0
+	    /* Some of WM_CHAR may be fed to us directly, some are results of 
+	       TranslateMessage().  Using 0 as the first argument (in a 
+	       separate call) might help us distinguish these two cases.
+
+	       However, the keypress feeders would most probably expect the
+	       "standard" message pump, when TranslateMessage() is called on 
+	       EVERY KeyDown/Keyup event.  So they may feed us Down-Ctrl
+	       Down-FAKE Char-o and expect us to recognize it as Ctrl-o.
+	       Using 0 as the first argument would interfere with this.  */
+	    deliver_wm_chars (0, hwnd, msg, wParam, lParam, 1);
+#endif
+	    /* Processing the generated WM_CHAR messages *WHILE* we handle 
+	       KEYDOWN/UP event is the best choice, since withoug any fuss, 
+	       we know all 3 of: scancode, virtual keycode, and expansion. 
+	       (Additionally, one knows boundaries of expansion of different
+	       keypresses.) */
+	    res = deliver_wm_chars (1, hwnd, msg, wParam, lParam, 1);
+	    windows_translate = -( res != 0 );
+	    if (res > 0) /* Bound to character(s) or a deadkey */
+	      break;
+	    /* deliver_wm_chars() may make some branches after this vestigal */
+	  }
+          wParam = map_keypad_keys (wParam, (lParam & 0x1000000L) != 0);
 	  /* If not defined as a function key, change it to a WM_CHAR message. */
 	  if (wParam > 255 || !lispy_function_keys[wParam])
 	    {
@@ -3184,6 +3639,8 @@ w32_wnd_proc (HWND hwnd, UINT msg, WPARA
 	    }
 	}
 
+    if (windows_translate == -1)
+      break;
     translate:
       if (windows_translate)
 	{

[-- Attachment #3: w32fns.c-diff-v2-relative --]
[-- Type: text/plain, Size: 17600 bytes --]

--- w32fns.c-sent2	2015-07-01 02:56:30.787672000 -0700
+++ w32fns.c	2015-07-08 16:32:11.187197700 -0700
@@ -2932,6 +2932,15 @@ get_wm_chars (HWND aWnd, int *buf, int b
 #  define FPRINTF_WM_CHARS(ARG)	0
 #endif
 
+/* This is a heuristic only.  This is supposed to track the state of the
+   finite automaton in the language environment of Windows. 
+   
+   However, separate windows (if with the same different language 
+   environments!) should  have different values.  Moreover, switching to a 
+   non-Emacs window with the same language environment, and using (dead)keys 
+   there would change the value stored in the kernel, but not this value. */
+static int after_deadkey = 0;
+
 int
 deliver_wm_chars (int do_translate, HWND hwnd, UINT msg, UINT wParam, 
                   UINT lParam, int legacy_alt_meta)
@@ -2940,7 +2949,7 @@ deliver_wm_chars (int do_translate, HWND
      points to a keypress. 
      (However, the "old style" TranslateMessage() would deliver at most 16 of 
      them.)  Be on a safe side, and prepare to treat many more. */
-  int ctrl_cnt, buf[1024], count, is_dead;
+  int ctrl_cnt, buf[1024], count, is_dead, after_dead = (after_deadkey != -1);
 
   /* Since the keypress processing logic of Windows has a lot of state, it 
      is important to call TranslateMessage() for every keyup/keydown, AND
@@ -2956,7 +2965,8 @@ deliver_wm_chars (int do_translate, HWND
      So, with the usual message pump, the following call to TranslateMessage() 
      is not needed (and is going to be VERY harmful).  With Emacs' message 
      pump, the call is needed.  */
-  if (do_translate) {
+  if (do_translate) 
+    {
       MSG windows_msg = { hwnd, msg, wParam, lParam, 0, {0,0} };
 
       windows_msg.time = GetMessageTime ();
@@ -2970,13 +2980,22 @@ deliver_wm_chars (int do_translate, HWND
                           || modifier_set (VK_CONTROL), 
                         &ctrl_cnt, &is_dead, wParam, 
                         (lParam & 0x1000000L) != 0);
-  if (count) {
+  if (count) 
+    {
     W32Msg wmsg;
-    int *b = buf, strip_Alt = 1;
-
-    /* wParam is checked when converting CapsLock to Shift */
-    wmsg.dwModifiers = do_translate 
-	? w32_get_key_modifiers (wParam, lParam) : 0;
+      DWORD console_modifiers = construct_console_modifiers ();
+      int *b = buf, strip_Alt = 1, strip_ExtraMods = 1, hairy = 0;
+      char *type_CtrlAlt = NULL;
+
+      /*  XXXX In fact, there may be another case when we need to do the same: 
+               What happens if the string defined in the LIGATURES has length
+               0?  Probably, we will get count==0, but the state of the finite
+               automaton would reset to 0???  */
+     after_deadkey = -1;
+      
+      /* wParam is checked when converting CapsLock to Shift; this is a clone 
+         of w32_get_key_modifiers (). */
+      wmsg.dwModifiers = w32_kbd_mods_to_emacs (console_modifiers, wParam);
 
     /* What follows is just heuristics; the correct treatement requires 
        non-destructive ToUnicode(): 
@@ -2991,8 +3010,8 @@ deliver_wm_chars (int do_translate, HWND
        
        Example: assume that we know:
          (A) lCtrl+rCtrl+rAlt modifiers with VK_A key produce a Latin "f"
-             ("may be logical" with a JCUKEN-flavored Russian keyboard flavor);
-         (B) removing any one of lCtrl, rCtrl, rAlt changes the produced char;
+               ("may be logical" in JCUKEN-flavored Russian keyboard flavors);
+           (B) removing any of lCtrl, rCtrl, rAlt changes the produced char;
          (C) Win-modifier is not affecting the produced character 
              (this is the common case: happens with all "standard" layouts).
 
@@ -3000,7 +3019,7 @@ deliver_wm_chars (int do_translate, HWND
        What is the intent of the user?  We need to guess the intent to decide  
        which event to deliver to the application.
        
-       This looks like a reasonable logic: wince Win- modifier does not affect 
+         This looks like a reasonable logic: since Win- modifier doesn't affect
        the output string, the user was pressing Win for SOME OTHER purpose.
        So the user wanted to generate Win-SOMETHING event.  Now, what is
        something?  If one takes the mantra that "character payload is more 
@@ -3008,38 +3027,196 @@ deliver_wm_chars (int do_translate, HWND
        payload", then one should ignore lCtrl+rCtrl+rAlt, ignore VK_A, and
        assume that the user wanted to generate Win-f.
        
-       Unfortunately, without non-destructive ToUnicode(), checking (B) and (C)
-       is out of question.  So we use heuristics (hopefully, covering 99.9999%
-       of cases).
+         Unfortunately, without non-destructive ToUnicode(), checking (B),(C)
+         is out of question.  So we use heuristics (hopefully, covering 
+         99.9999% of cases).
      */
     
-    /* If ctrl-something delivers chars, ctrl and the rest should be hidden; 
-       so the consumer of key-event won't interpret it as an accelerator. */
-    if (wmsg.dwModifiers & ctrl_modifier)
-      wmsg.dwModifiers = wmsg.dwModifiers & shift_modifier;
-    /* In many keyboard layouts, (left) Alt is not changing the character.  
-       Unless we are in this situation, strip Alt/Meta. */
-    if (wmsg.dwModifiers & (alt_modifier | meta_modifier) 
-        /* If alt-something delivers non-ASCIIchars, alt should be hidden */
-        && count == 1 && *b < 0x10000) 
-      {
-        SHORT r = VkKeyScanW( *b );
+      /* Another thing to watch for is a possibility to use AltGr-* and 
+         Ctrl-Alt-* with different semantic.
 
-        FPRINTF_WM_CHARS((stderr, "VkKeyScanW %#06x %#04x\n", (int)r, wParam));
-        if ((r & 0xFF) == wParam && !(r & ~0x1FF)) 
-          {	
-            /* Char available without Alt modifier, so Alt is "on top" */
+         Background: the layout defining the KLLF_ALTGR bit are treated 
+         specially by the kernel: when VK_RMENU (=rightAlt, =AltGr) is pressed 
+         (released), a press (release) of VK_LCONTROL is emulated (unless Ctrl 
+         is already down).  As a result, any press/release of AltGr is seen 
+         by applications as a press/release of lCtrl AND rAlt.  This is 
+         applicable, in particular, to ToUnicode[Ex]().  (Keyrepeat is covered
+         the same way!)
+     
+           NOTE: it IS possible to see bare rAlt even with KLLF_ALTGR; but this
+           requires a good finger coordination: doing (physically)
+             Down-lCtrl Down-rAlt Up-lCtrl Down-a 
+           (doing quick enough, so that key repeat of rAlt [which would 
+           generate new "fake" Down-lCtrl events] does not happens before 'a' 
+           is down) results in no "fake" events, so the application will see 
+           only rAlt down when 'a' is pressed.  (However, fake Up-lCtrl WILL 
+           be generated when rAlt goes UP.)
+       
+           In fact, note also that KLLF_ALTGR does not prohibit construction of 
+           rCtrl-rAlt (just press them in this order!).
+     
+         Moreover: "traditional" layouts do not define distinct modifier-masks
+         for VK_LMENU and VK_RMENU (same for VK_L/RCONTROL).  Instead, they 
+         rely on the KLLF_ALTGR bit to make the behaviour of VK_LMENU and 
+         VK_RMENU distinct.  As a corollary, for such layouts, the produced 
+         character is the same for AltGr-* (=rAlt-*) and Ctrl-Alt-* (in any 
+         combination of handedness).  For description of masks, see
+     
+           http://search.cpan.org/~ilyaz/UI-KeyboardLayout/lib/UI/KeyboardLayout.pm#Keyboard_input_on_Windows,_Part_I:_what_is_the_kernel_doing?
+     
+         By default, Emacs was using these coincidences via the following
+         heuristics: it was treating: 
+          (*) keypresses with lCtrl-rAlt modifiers as if they are carrying 
+              ONLY the character payload (no matter what the actual keyboard 
+              was defining: if lCtrl-lAlt-b was delivering U+05df=beta, then 
+              Emacs saw [beta]; if lCtrl-lAlt-b was undefined in the layout, 
+              the keypress was completely ignored), and
+          (*) keypresses with the other combinations of handedness of Ctrl-Alt 
+              modifiers (e.g., lCtrl-lAlt) as if they NEVER carry a character 
+              payload (so they were reported "raw": if lCtrl-lAlt-b was 
+              delivering beta, then Emacs saw event [C-A-b], and not [beta]).
+         This worked good for "traditional" layouts: users could type both 
+         AltGr-x and Ctrl-Alt-x, and one was a character, another a bindable 
+         event.
+     
+         However, for layouts which deliver different characters for AltGr-x 
+         and lCtrl-lAlt-x, this scheme makes the latter character unaccessible 
+         in Emacs.  While it is easy to access functionality of [C-M-x] in 
+         Emacs by other means (for example, by the `controlify' prefix, or 
+         using lCtrl-rCtrl-x, or rCtrl-rAlt-x [in this order]), missing 
+         characters cannot be reconstructed without a tedious manual work. */
+     
+      /* These two cases are often going to be distinguishable, since at most 
+         one of these character is defined with KBDCTRL | KBDMENU modifier 
+         bitmap.  (This heuristic breaks if both lCtrl-lAlt- AND lCtrl-rAlt- 
+         are translated to modifier bitmaps distinct from KBDCTRL | KBDMENU, 
+         or in the cases when lCtrl-lAlt-* and lCtrl-rAlt-* are generally 
+         different, but lCtrl-lAlt-x and lCtrl-rAlt-x happen to deliver the 
+         same character.)
+     
+         So we have 2 chunks of info:
+           (A) is it lCtrl-rAlt-, or lCtrl-lAlt, or some other combination?
+           (B) is the delivered character defined with KBDCTRL | KBDMENU bits?
+         Basing on (A) and (B), we should decide whether to ignore the 
+         delivered character.  (Before, Emacs was completely ignoring (B), and 
+         was treating the 3-state of (A) as a bit.)  This means that we have 6 
+         bits of customization. 
+         
+         Additionally, a presence of two Ctrl down may be AltGr-rCtrl-.*/
+
+      /* Strip all non-Shift modifiers if: 
+        - more than one UTF-16 code point delivered (can't call VkKeyScanW ())
+        - or the character is a result of combining with a prefix key. */
+      if (!after_dead && count == 1 && *b < 0x10000)
+        {
+          if (console_modifiers & (RIGHT_ALT_PRESSED | LEFT_ALT_PRESSED)
+              && console_modifiers & (RIGHT_CTRL_PRESSED | LEFT_CTRL_PRESSED))
+            {
+              type_CtrlAlt = "bB";   /* generic bindable Ctrl-Alt- modifiers */
+              if (console_modifiers & (LEFT_CTRL_PRESSED | RIGHT_CTRL_PRESSED)
+                  == (LEFT_CTRL_PRESSED | RIGHT_CTRL_PRESSED))
+                 /* double-Ctrl: 
+                    e.g. AltGr-rCtrl on some layouts (in this order!) */
+                type_CtrlAlt = "dD";
+              else if (console_modifiers 
+              	       & (LEFT_CTRL_PRESSED | LEFT_ALT_PRESSED)
+                       == (LEFT_CTRL_PRESSED | LEFT_ALT_PRESSED))
+                type_CtrlAlt = "lL"; /* Ctrl-Alt- modifiers on the left */
+              else if (!NILP (Vw32_recognize_altgr)
+                       && (console_modifiers 
+                           & (RIGHT_ALT_PRESSED | LEFT_CTRL_PRESSED))
+                          == (RIGHT_ALT_PRESSED | LEFT_CTRL_PRESSED))
+                type_CtrlAlt = "gG"; /* modifiers as in AltGr */
+            }
+          else if (wmsg.dwModifiers & (alt_modifier | meta_modifier)
+                   || (console_modifiers 
+                       & (RIGHT_WIN_PRESSED | RIGHT_WIN_PRESSED 
+                          | APPS_PRESSED | SCROLLLOCK_ON)))
+            {
+              /* pure Alt (or combination of Alt, Win, APPS, scrolllock */
+              type_CtrlAlt = "aA";
+            }
+          if (type_CtrlAlt)
+            {
+              /* Out of bound bitmap: */
+              SHORT r = VkKeyScanW( *b ), bitmap = 0x1FF;
+
+              FPRINTF_WM_CHARS((stderr, "VkKeyScanW %#06x %#04x\n", (int)r, 
+                               wParam));
+              if ((r & 0xFF) == wParam)
+                bitmap = r>>8; /* *b is reachable via simple interface */
+              if (*type_CtrlAlt == 'a') /* Simple Alt seen */
+                {
+                  if ((bitmap & ~1) == 0) /* 1: KBDSHIFT */
+                    {
+                      /* In "traditional" layouts, Alt without Ctrl does not 
+                         change the delivered character.  This detects this 
+                         situation; it is safe to report this as Alt-something 
+                          - as opposed to delivering the reported character 
+                          without modifiers. */
             if (legacy_alt_meta 
                 && *b > 0x7f && ('A' <= wParam && wParam <= 'Z'))
 	      /* For backward-compatibility with older Emacsen, let
-	         this be processed by another branch below (which would convert 
-	         it to Alt-Latin char via wParam). */
+                           this be processed by another branch below (which 
+                           would convert it to Alt-Latin char via wParam). */
+                        return 0;
+                    }
+                  else
+                    {
+                      hairy = 1;
+                    }
+                }
+              /* Check whether the delivered character(s) is accessible via 
+                 KBDCTRL | KBDALT ( | KBDSHIFT ) modifier mask (which is 7). */
+              else if ((bitmap & ~1) != 6)
+                {
+                  /* The character is not accessible via plain Ctrl-Alt(-Shift) 
+                     (which is, probably, same as AltGr) modifiers. 
+                     Either it was after a prefix key, or is combined with 
+                     modifier keys which we don't see, or there is an asymmetry
+                     between left-hand and right-hand modifiers, or other hairy
+                     stuff. */
+                  hairy = 1;
+                }
+              /* The best solution is to delegate these tough (but rarely 
+                 needed) choices to the user.  Temporarily (???), it is 
+                 implemented as C macros.
+               
+                 Essentially, there are 3 things to do: return 0 (handle to the
+                 legacy processing code [ignoring the character payload]; keep 
+                 some modifiers (so that they will be processed by the binding 
+                 system [on top of the character payload]; strip modifiers [so 
+                 that `self-insert' is going to be triggered with the character 
+                 payload]). 
+                 
+                 The default below should cover 99.9999% of cases: 
+                   (a) strip Alt- in the hairy case only;  
+                       (stripping = not ignoring) 
+                   (l) for lAlt-lCtrl, ignore the char in simple cases only;
+                   (g) for what looks like AltGr, ignore the modifiers;
+                   (d) for what looks like lCtrl-rCtrl-Alt (probably
+                       AltGr-rCtrl), ignore the character in simple cases only;
+                   (b) for other cases of Ctrl-Alt, ignore the character in
+                       simple cases only.
+
+                 Essentially, in all hairy cases, and in looks-like-AltGr case,
+                 we keep the character, ignoring the modifiers.  In all the
+                 other cases, we ignore the delivered character.
+                */
+#define S_TYPES_TO_IGNORE_CHARACTER_PAYLOAD "aldb"
+#define S_TYPES_TO_REPORT_CHARACTER_PAYLOAD_WITH_MODIFIERS ""
+              if (strchr(S_TYPES_TO_IGNORE_CHARACTER_PAYLOAD, 
+			 type_CtrlAlt[hairy]))
               return 0;
-            strip_Alt = 0;
+              /* if in neither list, report all the modifiers we see COMBINED 
+                 WITH the reported character */
+              if (strchr(S_TYPES_TO_REPORT_CHARACTER_PAYLOAD_WITH_MODIFIERS, 
+			  type_CtrlAlt[hairy]))
+                strip_ExtraMods = 0;
           }
       }
-    if (strip_Alt)
-      wmsg.dwModifiers = wmsg.dwModifiers & ~(alt_modifier | meta_modifier);
+      if (strip_ExtraMods)
+        wmsg.dwModifiers = wmsg.dwModifiers & shift_modifier;
     
     signal_user_input ();
     while (count--)
@@ -3052,8 +3229,11 @@ deliver_wm_chars (int do_translate, HWND
     else
         FPRINTF_WM_CHARS((stderr, "extra ctrl char\n"));
     return -1;
-  } else if (is_dead >= 0) {
+    } 
+  else if (is_dead >= 0) 
+    {
       FPRINTF_WM_CHARS((stderr, "dead %#06x\n", is_dead));
+      after_deadkey = is_dead;
       return 1;
   }
   return 0;
@@ -3175,6 +3355,15 @@ w32_wnd_proc (HWND hwnd, UINT msg, WPARA
       /* Inform lisp thread of keyboard layout changes.  */
       my_post_msg (&wmsg, hwnd, msg, wParam, lParam);
 
+      /* The state of the finite automaton is separate per every input 
+         language environment (so it does not change when one switches 
+         to a different window with the same environment).  Moreover,
+         the experiments show that the state is not remembered when
+         one switches back to the pre-previous environment. */
+      after_deadkey = -1;
+
+      /* XXXX??? What follows is a COMPLETE misunderstanding of Windows! */
+      
       /* Clear dead keys in the keyboard state; for simplicity only
          preserve modifier key states.  */
       {

  reply	other threads:[~2015-07-09  0:02 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-03 23:09 bug#19994: 25.0.50; Unicode keyboard input on Windows Ilya Zakharevich
2015-03-04 18:01 ` Eli Zaretskii
2015-03-06  0:43   ` Ilya Zakharevich
2015-03-06 10:52     ` Eli Zaretskii
2015-03-06 11:40       ` Ilya Zakharevich
2015-03-06 14:00         ` Eli Zaretskii
2015-07-01 10:07   ` Ilya Zakharevich
2015-07-09  0:02     ` Ilya Zakharevich [this message]
2015-07-31  9:23       ` Eli Zaretskii
2015-08-01  7:40         ` Eli Zaretskii
2015-08-02 14:42           ` Eli Zaretskii
2020-08-12 16:32             ` Stefan Kangas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150709000259.GA7163@math.berkeley.edu \
    --to=ilya@math.berkeley.edu \
    --cc=19994@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).