unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#19994: 25.0.50; Unicode keyboard input on Windows
@ 2015-03-03 23:09 Ilya Zakharevich
  2015-03-04 18:01 ` Eli Zaretskii
  0 siblings, 1 reply; 12+ messages in thread
From: Ilya Zakharevich @ 2015-03-03 23:09 UTC (permalink / raw)
  To: 19994

I’m working on a patch to make Unicode keyboard input to work properly on
Windows (in graphic mode).  The problems with the current implementation 
stem from the facts that

  • on Windows, it IS possible to implement a bullet-proof system of Unicode
    input (at least, for GUI applications);

  • However, how to do it is completely undocumented.

      [See
        http://search.cpan.org/~ilyaz/UI-KeyboardLayout/lib/UI/KeyboardLayout.pm#Keyboard_input_on_Windows:_interaction_of_applications_and_the_kernel
      ]

So, essentially, all developers of applications try to design their own 
set of heuristical approaches which 

  • cover several keyboard layouts they can put their hands on;

  • more or less follow the design goals of their applications.

The approach taken by Emacs is to break the keyboard keys (VK’s) into 
several groups, and treat different groups differently.  Only the keys on
the main island of the keyboard may input characters.  Moreover, only 
the most common combinations of modifiers are allowed to be used for
the character input.  (In addition, there are plain bugs — like treating
UTF-16 as if it were UTF-32.)

  [I gave a very terse description on
     https://groups.google.com/forum/?hl=en#!search/emacs$20keyboard$20windows$20ilya/gnu.emacs.help/ZHpZK2YfFuo/aAyZFUxrFeEJ
  ]

The “correct” approach should proceed in exactly the opposite direction:
if a keypress produces a character, it should be treated as a 
character — no matter where on the physical keyboard the key is residing,
and which modifiers were pressed.

The patch below

  • Implements this “primacy of characters” doctrine;
  
  • As far as I could see, is compatible with the current work of Emacs
    on “simple keyboard layouts”;
  
  • Worked at some moment (before I started a massive addition of 
    comments ;-] — and maybe it is still working, I did not touch it for a
    month);
  
  • (Currently) ignores the indent coding rules;
  
  • Passes all the test thrown at it by my super-puper-all-bells-and-whistles
    layouts; see e.g.
       http://k.ilyaz.org/windows/izKeys-visual-maps.html#examples
  
  • Is not bullet-proof: 
      ∘ I use one heuristic to detect which modifiers are “consumed” by the
        character input, and which are “on top” of character input;

      ∘ It does not (same as the current Emacs) support 
          Unicode-entered-by-Alt-numbers.
  
  • Does not fix a bug with UTF-16 of stand-alone (pumped to us) WM_CHAR’s.

If I ever find more time to work on it, I plan to:

  1) Add yet more documentation;

  2) Change a little bit the logic of detection of consumed/extra 
     modifiers.  This change may be cosmetic only — or maybe, with some 
     extremely devilous layouts, it may be beneficial.
     
     (I have not seen layouts where this change would matter, though!
      And I looked though the source code of hundred(s).)

  3) Bring it in sync with the Emacs coding style.

Meanwhile, I would greatly appreciate all input related to the current 
state of the patch.  (I *HOPE* that I did not break (many!) special cases
in the current implementation — but such things are hard to be sure in!)

Thanks for the parts of Emacs which ARE working great,
Ilya

=======================================================

--- w32fns.c-ini	2015-01-30 15:33:23.505201400 -0800
+++ w32fns.c	2015-02-15 02:46:12.070091800 -0800
@@ -2832,6 +2832,126 @@ post_character_message (HWND hwnd, UINT
   my_post_msg (&wmsg, hwnd, msg, wParam, lParam);
 }
 
+static int
+get_wm_chars (HWND aWnd, int *buf, int buflen, int ignore_ctrl, int ctrl, int *ctrl_cnt, int *is_dead, int vk, int exp)
+{
+  MSG msg;
+  int i = buflen, doubled = 0, code_unit;	/* If doubled is at the end, ignore it */
+  if (ctrl_cnt)
+    *ctrl_cnt = 0;
+  if (is_dead)
+    *is_dead = -1;
+  while (buflen &&				/* Should be called only when w32_unicode_gui */
+         PeekMessageW(&msg, aWnd, WM_KEYFIRST, WM_KEYLAST, PM_NOREMOVE | PM_NOYIELD) &&
+         (msg.message == WM_CHAR || msg.message == WM_SYSCHAR || 
+          msg.message == WM_DEADCHAR || msg.message == WM_SYSDEADCHAR || msg.message == WM_UNICHAR)) {	/* Not contigious */
+    int dead;
+
+    GetMessageW(&msg, aWnd, msg.message, msg.message);
+    dead = (msg.message == WM_DEADCHAR || msg.message == WM_SYSDEADCHAR);
+    if (is_dead)
+      *is_dead = (dead ? msg.wParam : -1);
+    if (dead)
+      continue;
+    code_unit = msg.wParam;
+    if (doubled) {				/* had surrogate */
+      if (msg.message == WM_UNICHAR || code_unit < 0xDC00 || code_unit > 0xDFFF) {
+        /* Mismatched first surrogate.  Pass both code units as if they were two characters. */
+        *buf++ = doubled;
+        if (!--buflen)	// Drop the second char if at the end of the buffer
+          return i;
+      } else {
+        code_unit = (doubled << 10) + code_unit - 0x35FDC00;
+      }
+      doubled = 0;
+    } else if (code_unit >= 0xD800 && code_unit <= 0xDBFF) {
+      doubled = code_unit;
+      continue;
+    }    /* We handle mismatched second surrogate the same as a normal character. */
+    /* The only "fake" characters delivered by ToUnicode() or TranslateMessage() are: 
+       0x01 .. 0x1a for Control-chars, 
+       0x00 and 0x1b .. 0x1f for Control- []\@^_ 
+       0x7f for Control-BackSpace
+       0x20 for Control-Space */
+    if (ignore_ctrl && (code_unit < 0x20 || code_unit == 0x7f || (code_unit == 0x20 && ctrl))) {
+      /* Non-character payload in a WM_CHAR (Ctrl-something pressed).  Ignore. */
+      if (ctrl_cnt)
+        *ctrl_cnt++;
+      continue;
+    }
+    if (code_unit < 0x7f && 
+        ((vk >= VK_NUMPAD0 && vk <= VK_DIVIDE) ||
+         (exp && ((vk >= VK_PRIOR && vk <= VK_DOWN) || 
+                   vk == VK_INSERT || vk == VK_DELETE || vk == VK_CLEAR))) &&
+         strchr("0123456789/*-+.,", code_unit))	/* Traditionally, Emacs translates these to characters later, in `self-insert-character' */
+	continue;
+    *buf++ = code_unit;
+    buflen--;
+  }
+  return i - buflen;
+}
+
+int
+deliver_wm_chars (int do_translate, HWND hwnd, UINT msg, UINT wParam, UINT lParam)
+{
+  /* An "old style" keyboard description may assign up to 125 UTF-16 code points to a keypress. 
+     (However, the "old style" TranslateMessage() would deliver at most 16 of them.)  Be on a
+     safe side, and prepare to treat many more. */
+  int ctrl_cnt, buf[1024], count, is_dead;
+
+  if (do_translate) {
+      MSG windows_msg = { hwnd, msg, wParam, lParam, 0, {0,0} };
+
+      windows_msg.time = GetMessageTime ();
+      TranslateMessage (&windows_msg);
+  }
+  count = get_wm_chars (hwnd, buf, sizeof(buf)/sizeof(*buf), 1,
+                        /* The message may have been synthesized by who knows what; be conservative. */
+                        modifier_set (VK_LCONTROL) || modifier_set (VK_RCONTROL) || modifier_set (VK_CONTROL), 
+                        &ctrl_cnt, &is_dead, wParam, (lParam & 0x1000000L) != 0);
+  if (count) {
+    W32Msg wmsg;
+    int *b = buf, strip_Alt = 1;
+
+    /* wParam is checked when converting CapsLock to Shift */
+    wmsg.dwModifiers = do_translate ? w32_get_key_modifiers (wParam, lParam) : 0;
+
+    /* What follows is just heuristics; the correct treatement requires non-destructive ToUnicode(). */
+    if (wmsg.dwModifiers & ctrl_modifier)	/* If ctrl-something delivers chars, ctrl and the rest should be hidden */
+      wmsg.dwModifiers = wmsg.dwModifiers & shift_modifier;
+    /* In many keyboard layouts, (left) Alt is not changing the character.  Unless we are in this situation, strip Alt/Meta. */
+    if (wmsg.dwModifiers & (alt_modifier | meta_modifier) &&	/* If alt-something delivers non-ASCIIchars, alt should be hidden */
+        count == 1 && *b < 0x10000) {
+      SHORT r = VkKeyScanW( *b );
+
+      fprintf(stderr, "VkKeyScanW %#06x %#04x\n", (int)r, wParam);
+      if ((r & 0xFF) == wParam && !(r & ~0x1FF)) {	/* Char available without Alt modifier, so Alt is "on top" */
+         if (*b > 0x7f && ('A' <= wParam && wParam <= 'Z'))
+           return 0;					/* Another branch below would convert it to Alt-Latin char via wParam */	
+         strip_Alt = 0;
+      }
+    }
+    if (strip_Alt)
+      wmsg.dwModifiers = wmsg.dwModifiers & ~(alt_modifier | meta_modifier);
+    
+    signal_user_input ();
+    while (count--)
+      {
+        fprintf(stderr, "unichar %#06x\n", *b);
+        my_post_msg (&wmsg, hwnd, WM_UNICHAR, *b++, lParam);
+      }
+    if (!ctrl_cnt)	/* Process ALSO as ctrl */
+      return 1;
+    else
+        fprintf(stderr, "extra ctrl char\n");
+    return -1;
+  } else if (is_dead >= 0) {
+      fprintf(stderr, "dead %#06x\n", is_dead);
+      return 1;
+  }
+  return 0;
+}
+
 /* Main window procedure */
 
 static LRESULT CALLBACK
@@ -3007,7 +3127,6 @@ w32_wnd_proc (HWND hwnd, UINT msg, WPARA
       /* Synchronize modifiers with current keystroke.  */
       sync_modifiers ();
       record_keydown (wParam, lParam);
-      wParam = map_keypad_keys (wParam, (lParam & 0x1000000L) != 0);
 
       windows_translate = 0;
 
@@ -3117,6 +3236,45 @@ w32_wnd_proc (HWND hwnd, UINT msg, WPARA
 	    wParam = VK_NUMLOCK;
 	  break;
 	default:
+	  if (w32_unicode_gui) {	
+	    /* If this event generates characters or deadkeys, do not interpret 
+	       it as a "raw combination of modifiers and keysym".  Hide  
+	       deadkeys, and use the generated character(s) instead of the  
+	       keysym.   (Backward compatibility: exceptions for numpad keys 
+	       generating 0-9 . , / * - +, and for extra-Alt combined with a 
+	       non-Latin char.) 
+	       
+	       Try to not report modifiers which have effect on which 
+	       character or deadkey is generated.
+	       
+	       Example (contrived): if rightAlt-? generates f (on a Cyrillic 
+	       keyboard layout), and Ctrl, leftAlt do not affect the generated
+	       character, one wants to report Ctrl-leftAlt-f if the user 
+	       presses Ctrl-leftAlt-rightAlt-?. */
+	    int res; 
+#if 0
+	    /* Some of WM_CHAR may be fed to us directly, some are results of 
+	       TranslateMessage().  Using 0 as the first argument (in a 
+	       separate call) might help us distinguish these two cases.
+
+	       However, the keypress feeders would most probably expect the
+	       "standard" message pump, when TranslateMessage() is called on 
+	       EVERY KeyDown/Keyup event.  So they may feed us Down-Ctrl
+	       Down-FAKE Char-o and expect us to recognize it as Ctrl-o.
+	       Using 0 as the first argument would interfere with this.  */
+	    deliver_wm_chars (0, hwnd, msg, wParam, lParam);
+#endif
+	    /* Processing the generated WM_CHAR messages *WHILE* we handle 
+	       KEYDOWN/UP event is the best choice, since withoug any fuss, 
+	       we know all 3 of: scancode, virtual keycode, and expansion. 
+	       (Additionally, one knows boundaries of expansion of different
+	       keypresses.) */
+	    res = deliver_wm_chars (1, hwnd, msg, wParam, lParam);
+	    windows_translate = -( res != 0 );
+	    if (res > 0)		/* Bound to character(s) or a deadkey */
+	      break;
+	  }				/* Some branches after this one may be not needed */
+          wParam = map_keypad_keys (wParam, (lParam & 0x1000000L) != 0);
 	  /* If not defined as a function key, change it to a WM_CHAR message. */
 	  if (wParam > 255 || !lispy_function_keys[wParam])
 	    {
@@ -3184,6 +3342,8 @@ w32_wnd_proc (HWND hwnd, UINT msg, WPARA
 	    }
 	}
 
+    if (windows_translate == -1)
+      break;
     translate:
       if (windows_translate)
 	{


=======================================================



In GNU Emacs 25.0.50.20 (i686-pc-mingw32)
 of 2015-02-08 on BUCEFAL
Repository revision: d5e3922e08587e7eb9e5aec2e9f84cbda405f857
Windowing system distributor `Microsoft Corp.', version 6.1.7601
Configured using:
 `configure --prefix=/k/test'

Configured features:
SOUND NOTIFY ACL

Important settings:
  value of $LANG: ENU
  locale-coding-system: cp1252

Major mode: Fundamental

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  buffer-read-only: t
  line-number-mode: t

Recent messages:
For information about GNU Emacs and the GNU system, type C-h C-a.

Load-path shadows:
None found.

Features:
(shadow sort gnus-util mail-extr emacsbug message dired format-spec
rfc822 mml easymenu mml-sec mm-decode mm-bodies mm-encode mail-parse
rfc2231 mailabbrev gmm-utils mailheader sendmail rfc2047 rfc2045
ietf-drums mm-util help-fns mail-prsvr mail-utils time-date tooltip
eldoc electric uniquify ediff-hook vc-hooks lisp-float-type mwheel
dos-w32 ls-lisp disp-table w32-win w32-vars tool-bar dnd fontset image
regexp-opt fringe tabulated-list newcomment elisp-mode lisp-mode
prog-mode register page menu-bar rfn-eshadow timer select scroll-bar
mouse jit-lock font-lock syntax facemenu font-core frame cham georgian
utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean
japanese hebrew greek romanian slovak czech european ethiopic indian
cyrillic chinese case-table epa-hook jka-cmpr-hook help simple abbrev
minibuffer cl-preloaded nadvice loaddefs button faces cus-face macroexp
files text-properties overlay sha1 md5 base64 format env code-pages mule
custom widget hashtable-print-readable backquote make-network-process
w32notify w32 multi-tty emacs)

Memory information:
((conses 8 80324 9864)
 (symbols 32 17968 0)
 (miscs 32 85 128)
 (strings 16 12688 4007)
 (string-bytes 1 324435)
 (vectors 8 9470)
 (vector-slots 4 390690 6074)
 (floats 8 65 62)
 (intervals 28 243 45)
 (buffers 516 13))





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#19994: 25.0.50; Unicode keyboard input on Windows
  2015-03-03 23:09 bug#19994: 25.0.50; Unicode keyboard input on Windows Ilya Zakharevich
@ 2015-03-04 18:01 ` Eli Zaretskii
  2015-03-06  0:43   ` Ilya Zakharevich
  2015-07-01 10:07   ` Ilya Zakharevich
  0 siblings, 2 replies; 12+ messages in thread
From: Eli Zaretskii @ 2015-03-04 18:01 UTC (permalink / raw)
  To: Ilya Zakharevich; +Cc: 19994

> Date: Tue, 3 Mar 2015 15:09:49 -0800
> From: Ilya Zakharevich <nospam-abuse@ilyaz.org>
> 
> I’m working on a patch to make Unicode keyboard input to work properly on
> Windows (in graphic mode).

Thanks!

> The patch below
> 
>   • Implements this “primacy of characters” doctrine;
>  
>   • As far as I could see, is compatible with the current work of Emacs
>     on “simple keyboard layouts”;
>  
>   • Worked at some moment (before I started a massive addition of 
>     comments ;-] — and maybe it is still working, I did not touch it for a
>     month);
>  
>   • (Currently) ignores the indent coding rules;
>  
>   • Passes all the test thrown at it by my super-puper-all-bells-and-whistles
>     layouts; see e.g.
>        http://k.ilyaz.org/windows/izKeys-visual-maps.html#examples

Any chance of coming up with a few tests for this code, and adding
them to the test/ directory?

> If I ever find more time to work on it, I plan to:
>
>   1) Add yet more documentation;
> 
>   2) Change a little bit the logic of detection of consumed/extra 
>      modifiers.  This change may be cosmetic only — or maybe, with some 
>      extremely devilous layouts, it may be beneficial.
>     
>      (I have not seen layouts where this change would matter, though!
>       And I looked though the source code of hundred(s).)
> 
>   3) Bring it in sync with the Emacs coding style.

I suggest, indeed, to clean up the code so we could commit it to the
master branch.  That way, it will get wider testing, and we can fix
whatever problems it might cause.  Any deficiencies that don't cause
regressions wrt the current code can be fixed later, or even not at
all (if we decide them to not be important enough).

Question: did you try this code with IME input methods?

> Meanwhile, I would greatly appreciate all input related to the current 
> state of the patch.

Some of that (but not much) below.

> +static int
> +get_wm_chars (HWND aWnd, int *buf, int buflen, int ignore_ctrl, int ctrl, int
                            ^^^^^^^^
Why 'int' and not 'wchar_t'?

> +  while (buflen &&                             /* Should be called only when  w32_unicode_gui */
> +         PeekMessageW(&msg, aWnd, WM_KEYFIRST, WM_KEYLAST, PM_NOREMOVE | PM_NOYIELD) &&

Indeed, any "wide" APIs should only be called when w32_unicode_gui is
on, and there should be alternative code for when w32_unicode_gui is
off.  We still try to support Windows 9X.

> +      if (msg.message == WM_UNICHAR || code_unit < 0xDC00 || code_unit > 
> 0xDFFF) {
> +        /* Mismatched first surrogate.  Pass both code units as if they were 
> two characters. */
> +        *buf++ = doubled;
> +        if (!--buflen) // Drop the second char if at the end of the buffer
> +          return i;
> +      } else {
> +        code_unit = (doubled << 10) + code_unit - 0x35FDC00;
> +      }
> +      doubled = 0;
> +    } else if (code_unit >= 0xD800 && code_unit <= 0xDBFF) {

Either explain the "magic" constants in comments, or, better, use
macros with descriptive names.

> +  int ctrl_cnt, buf[1024], count, is_dead;

I think buf[] should be an array of wchar_t.  Also, will this code
work for the non-w32_unicode_gui mode?

> +  if (count) {
> +    W32Msg wmsg;
> +    int *b = buf, strip_Alt = 1;

Likewise with 'b'.

> +      SHORT r = VkKeyScanW( *b );

VkKeyScanW should be called only if w32_unicode_gui is on.  (Or maybe
the caller is only called when w32_unicode_gui is on, in which case
maybe we should have an eassert there.)

> +      fprintf(stderr, "VkKeyScanW %#06x %#04x\n", (int)r, wParam);
> +      if ((r & 0xFF) == wParam && !(r & ~0x1FF)) {     /* Char available 
> without Alt modifier, so Alt is "on top" */
> +         if (*b > 0x7f && ('A' <= wParam && wParam <= 'Z'))
> +           return 0;                                   /* Another branch below 
> would convert it to Alt-Latin char via wParam */        
> +         strip_Alt = 0;
> +      }
> +    }
> +    if (strip_Alt)
> +      wmsg.dwModifiers = wmsg.dwModifiers & ~(alt_modifier | meta_modifier);
> +    
> +    signal_user_input ();
> +    while (count--)
> +      {
> +        fprintf(stderr, "unichar %#06x\n", *b);
> +        my_post_msg (&wmsg, hwnd, WM_UNICHAR, *b++, lParam);
> +      }
> +    if (!ctrl_cnt)     /* Process ALSO as ctrl */
> +      return 1;
> +    else
> +        fprintf(stderr, "extra ctrl char\n");
> +    return -1;
> +  } else if (is_dead >= 0) {
> +      fprintf(stderr, "dead %#06x\n", is_dead);
> +      return 1;
> +  }

Lots of debugging output here that should be removed.

Thanks again for working on this.





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#19994: 25.0.50; Unicode keyboard input on Windows
  2015-03-04 18:01 ` Eli Zaretskii
@ 2015-03-06  0:43   ` Ilya Zakharevich
  2015-03-06 10:52     ` Eli Zaretskii
  2015-07-01 10:07   ` Ilya Zakharevich
  1 sibling, 1 reply; 12+ messages in thread
From: Ilya Zakharevich @ 2015-03-06  0:43 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 19994

On Wed, Mar 04, 2015 at 08:01:01PM +0200, Eli Zaretskii wrote:
> > +static int
> > +get_wm_chars (HWND aWnd, int *buf, int buflen, int ignore_ctrl, int ctrl, int
>                             ^^^^^^^^
> Why 'int' and not 'wchar_t'?

This is for a Unicode chars.  They won’t fit into (Windows’ style) wchar_t.

> > +  while (buflen &&                             /* Should be called only when  w32_unicode_gui */
> > +         PeekMessageW(&msg, aWnd, WM_KEYFIRST, WM_KEYLAST, PM_NOREMOVE | PM_NOYIELD) &&
> 
> Indeed, any "wide" APIs should only be called when w32_unicode_gui is
> on, and there should be alternative code for when w32_unicode_gui is
> off.  We still try to support Windows 9X.

The caller ensures this.  Yes, assert() would be beneficial here.

> > +  int ctrl_cnt, buf[1024], count, is_dead;
> 
> I think buf[] should be an array of wchar_t.  Also, will this code
> work for the non-w32_unicode_gui mode?

This code is pure-GUI.  For non-GUI “bindable” input on Windows the
major hurdle is that 

  (A) I know no way to distinguish a “prefix key” (deadkey) keypress
      from a keypress which should trigger user bindings;

  (B) with “non-destructive ToUnicode()”, one WOULD be able to
      distinguish these two cases, — but I have no clue how to find
      out the current keyboard layout of a console session.

      (There is a lot of examples of code which returns the keyboard
       layout of a window; — but these examples do not work for
       console sessions.  I suppose that the reason is that the window
       is actually owned by a system process, and one does not have
       permissions to access its properties.)

Thanks,
Ilya





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#19994: 25.0.50; Unicode keyboard input on Windows
  2015-03-06  0:43   ` Ilya Zakharevich
@ 2015-03-06 10:52     ` Eli Zaretskii
  2015-03-06 11:40       ` Ilya Zakharevich
  0 siblings, 1 reply; 12+ messages in thread
From: Eli Zaretskii @ 2015-03-06 10:52 UTC (permalink / raw)
  To: Ilya Zakharevich; +Cc: 19994

> Date: Thu, 5 Mar 2015 16:43:32 -0800
> From: Ilya Zakharevich <ilya@math.berkeley.edu>
> Cc: 19994@debbugs.gnu.org
> 
> On Wed, Mar 04, 2015 at 08:01:01PM +0200, Eli Zaretskii wrote:
> > > +static int
> > > +get_wm_chars (HWND aWnd, int *buf, int buflen, int ignore_ctrl, int ctrl, int
> >                             ^^^^^^^^
> > Why 'int' and not 'wchar_t'?
> 
> This is for a Unicode chars.  They won’t fit into (Windows’ style) wchar_t.

Right.

> > Also, will this code work for the non-w32_unicode_gui mode?
> 
> This code is pure-GUI.  For non-GUI “bindable” input on Windows the
> major hurdle is that 

No, that's not what I meant.  I meant GUI sessions in which
w32_unicode_gui is zero, i.e. Windows 9X systems.

Console input is a different matter (and is handled separately, see
w32inevt.c).

Thanks.





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#19994: 25.0.50; Unicode keyboard input on Windows
  2015-03-06 10:52     ` Eli Zaretskii
@ 2015-03-06 11:40       ` Ilya Zakharevich
  2015-03-06 14:00         ` Eli Zaretskii
  0 siblings, 1 reply; 12+ messages in thread
From: Ilya Zakharevich @ 2015-03-06 11:40 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 19994

On Fri, Mar 06, 2015 at 12:52:08PM +0200, Eli Zaretskii wrote:
> > > Also, will this code work for the non-w32_unicode_gui mode?
> > 
> > This code is pure-GUI.  For non-GUI “bindable” input on Windows the
> > major hurdle is that 
> 
> No, that's not what I meant.  I meant GUI sessions in which
> w32_unicode_gui is zero, i.e. Windows 9X systems.

Unless w32_unicode_gui is set, the changes made by this patch are a NOP.

Ilya





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#19994: 25.0.50; Unicode keyboard input on Windows
  2015-03-06 11:40       ` Ilya Zakharevich
@ 2015-03-06 14:00         ` Eli Zaretskii
  0 siblings, 0 replies; 12+ messages in thread
From: Eli Zaretskii @ 2015-03-06 14:00 UTC (permalink / raw)
  To: Ilya Zakharevich; +Cc: 19994

> Date: Fri, 6 Mar 2015 03:40:03 -0800
> From: Ilya Zakharevich <ilya@math.berkeley.edu>
> Cc: 19994@debbugs.gnu.org
> 
> On Fri, Mar 06, 2015 at 12:52:08PM +0200, Eli Zaretskii wrote:
> > > > Also, will this code work for the non-w32_unicode_gui mode?
> > > 
> > > This code is pure-GUI.  For non-GUI “bindable” input on Windows the
> > > major hurdle is that 
> > 
> > No, that's not what I meant.  I meant GUI sessions in which
> > w32_unicode_gui is zero, i.e. Windows 9X systems.
> 
> Unless w32_unicode_gui is set, the changes made by this patch are a NOP.

That's fine, thanks.





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#19994: 25.0.50; Unicode keyboard input on Windows
  2015-03-04 18:01 ` Eli Zaretskii
  2015-03-06  0:43   ` Ilya Zakharevich
@ 2015-07-01 10:07   ` Ilya Zakharevich
  2015-07-09  0:02     ` Ilya Zakharevich
  1 sibling, 1 reply; 12+ messages in thread
From: Ilya Zakharevich @ 2015-07-01 10:07 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 19994

On Wed, Mar 04, 2015 at 08:01:01PM +0200, Eli Zaretskii wrote:
> > Date: Tue, 3 Mar 2015 15:09:49 -0800
> > From: Ilya Zakharevich <nospam-abuse@ilyaz.org>
> > 
> > I’m working on a patch to make Unicode keyboard input to work properly on
> > Windows (in graphic mode).

> I suggest, indeed, to clean up the code so we could commit it to the
> master branch.  That way, it will get wider testing, and we can fix
> whatever problems it might cause.  Any deficiencies that don't cause
> regressions wrt the current code can be fixed later, or even not at
> all (if we decide them to not be important enough).

I had no time to work on the code itself, but
  • I fixed the formatting,
  • I pumped up the docs,
  • I put in the suggested eassert().

----------------

As it was before, the patch
  • defines two new static functions,
  • delays modification of wParam as late as needed (moves 1 LoC in
    w32_wnd_proc()), and
  • adds 8 LoC to w32_wnd_proc().
The call to these static functions is conditional on w32_unicode_gui.

Enjoy,
Ilya

--- w32fns.c-ini	2015-01-30 15:33:23.505201400 -0800
+++ w32fns.c	2015-07-01 02:56:30.787672000 -0700
@@ -2832,6 +2832,233 @@ post_character_message (HWND hwnd, UINT
   my_post_msg (&wmsg, hwnd, msg, wParam, lParam);
 }
 
+static int
+get_wm_chars (HWND aWnd, int *buf, int buflen, int ignore_ctrl, int ctrl, 
+              int *ctrl_cnt, int *is_dead, int vk, int exp)
+{
+  MSG msg;
+  /* If doubled is at the end, ignore it */
+  int i = buflen, doubled = 0, code_unit;
+
+  if (ctrl_cnt)
+    *ctrl_cnt = 0;
+  if (is_dead)
+    *is_dead = -1;
+  eassert(w32_unicode_gui);
+  while (buflen
+  	 /* Should be called only when w32_unicode_gui: */
+         && PeekMessageW(&msg, aWnd, WM_KEYFIRST, WM_KEYLAST, 
+         	      PM_NOREMOVE | PM_NOYIELD)
+         && (msg.message == WM_CHAR || msg.message == WM_SYSCHAR 
+             || msg.message == WM_DEADCHAR || msg.message == WM_SYSDEADCHAR 
+             || msg.message == WM_UNICHAR)) 
+    { 
+      /* We extract character payload, but in this call we handle only the 
+         characters which comes BEFORE the next keyup/keydown message. */
+      int dead;
+
+      GetMessageW(&msg, aWnd, msg.message, msg.message);
+      dead = (msg.message == WM_DEADCHAR || msg.message == WM_SYSDEADCHAR);
+      if (is_dead)
+        *is_dead = (dead ? msg.wParam : -1);
+      if (dead)
+        continue;
+      code_unit = msg.wParam;
+      if (doubled) 
+        { 
+          /* had surrogate */
+          if (msg.message == WM_UNICHAR 
+              || code_unit < 0xDC00 || code_unit > 0xDFFF) 
+            { /* Mismatched first surrogate.  
+                 Pass both code units as if they were two characters. */
+              *buf++ = doubled;
+              if (!--buflen)
+                return i; /* Drop the 2nd char if at the end of the buffer. */
+            } 
+          else /* see https://en.wikipedia.org/wiki/UTF-16 */
+            {
+              code_unit = (doubled << 10) + code_unit - 0x35FDC00;
+            }
+          doubled = 0;
+        } 
+      else if (code_unit >= 0xD800 && code_unit <= 0xDBFF) 
+        {    
+          /* Handle mismatched 2nd surrogate the same as a normal character. */
+          doubled = code_unit;
+          continue;
+        }
+
+      /* The only "fake" characters delivered by ToUnicode() or 
+         TranslateMessage() are: 
+         0x01 .. 0x1a for Ctrl-letter, Enter, Tab, Ctrl-Break, Esc, Backspace
+         0x00 and 0x1b .. 0x1f for Control- []\@^_ 
+         0x7f for Control-BackSpace
+         0x20 for Control-Space */
+      if (ignore_ctrl 
+          && (code_unit < 0x20 || code_unit == 0x7f 
+              || (code_unit == 0x20 && ctrl))) 
+        { 
+          /* Non-character payload in a WM_CHAR
+             (Ctrl-something pressed, see above).  Ignore, and report. */
+          if (ctrl_cnt)
+            *ctrl_cnt++;
+          continue;
+        }
+      /* Traditionally, Emacs would ignore the character payload of VK_NUMPAD* 
+         keys, and would treat them later via `function-key-map'.  In addition
+         to usual 102-key NUMPAD keys, this map also treats `kp-'-variants of
+         space, tab, enter, separator, equal.  TAB  and EQUAL, apparently, 
+         cannot be generated on Win-GUI branch.  ENTER is already handled 
+         by the code above.  According to `lispy_function_keys', kp_space is 
+         generated by not-extended VK_CLEAR.  (kp-tab !=  VK_OEM_NEC_EQUAL!). 
+       
+         We do similarly for backward-compatibility, but ignore only the
+         characters restorable later by `function-key-map'. */
+      if (code_unit < 0x7f 
+          && ((vk >= VK_NUMPAD0 && vk <= VK_DIVIDE) 
+              || (exp && ((vk >= VK_PRIOR && vk <= VK_DOWN) || 
+                     vk == VK_INSERT || vk == VK_DELETE || vk == VK_CLEAR))) 
+          && strchr("0123456789/*-+.,", code_unit))
+        continue;
+      *buf++ = code_unit;
+      buflen--;
+    }
+  return i - buflen;
+}
+
+#ifdef DBG_WM_CHARS
+#  define FPRINTF_WM_CHARS(ARG)	fprintf ARG
+#else
+#  define FPRINTF_WM_CHARS(ARG)	0
+#endif
+
+int
+deliver_wm_chars (int do_translate, HWND hwnd, UINT msg, UINT wParam, 
+                  UINT lParam, int legacy_alt_meta)
+{
+  /* An "old style" keyboard description may assign up to 125 UTF-16 code 
+     points to a keypress. 
+     (However, the "old style" TranslateMessage() would deliver at most 16 of 
+     them.)  Be on a safe side, and prepare to treat many more. */
+  int ctrl_cnt, buf[1024], count, is_dead;
+
+  /* Since the keypress processing logic of Windows has a lot of state, it 
+     is important to call TranslateMessage() for every keyup/keydown, AND
+     do it exactly once.  (The actual change of state is done by
+     ToUnicode[Ex](), which is called by TranslateMessage().  So one can
+     call ToUnicode[Ex]() instead.)
+     
+     The "usual" message pump calls TranslateMessage() for EVERY event.
+     Emacs calls TranslateMessage() very selectively (is it needed for doing 
+     some tricky stuff with Win95???  With newer Windows, selectiveness is,
+     most probably, not needed - and harms a lot). 
+     
+     So, with the usual message pump, the following call to TranslateMessage() 
+     is not needed (and is going to be VERY harmful).  With Emacs' message 
+     pump, the call is needed.  */
+  if (do_translate) {
+      MSG windows_msg = { hwnd, msg, wParam, lParam, 0, {0,0} };
+
+      windows_msg.time = GetMessageTime ();
+      TranslateMessage (&windows_msg);
+  }
+  count = get_wm_chars (hwnd, buf, sizeof(buf)/sizeof(*buf), 1,
+                        /* The message may have been synthesized by 
+                           who knows what; be conservative. */
+                        modifier_set (VK_LCONTROL) 
+                          || modifier_set (VK_RCONTROL) 
+                          || modifier_set (VK_CONTROL), 
+                        &ctrl_cnt, &is_dead, wParam, 
+                        (lParam & 0x1000000L) != 0);
+  if (count) {
+    W32Msg wmsg;
+    int *b = buf, strip_Alt = 1;
+
+    /* wParam is checked when converting CapsLock to Shift */
+    wmsg.dwModifiers = do_translate 
+	? w32_get_key_modifiers (wParam, lParam) : 0;
+
+    /* What follows is just heuristics; the correct treatement requires 
+       non-destructive ToUnicode(): 
+         http://search.cpan.org/~ilyaz/UI-KeyboardLayout/lib/UI/KeyboardLayout.pm#Can_an_application_on_Windows_accept_keyboard_events?_Part_IV:_application-specific_modifiers
+
+       What one needs to find is: 
+         * which of the present modifiers AFFECT the resulting char(s) 
+           (so should be stripped, since their EFFECT is "already
+            taken into account" in the string in buf), and 
+         * which modifiers are not affecting buf, so should be reported to
+           the application for further treatment.
+       
+       Example: assume that we know:
+         (A) lCtrl+rCtrl+rAlt modifiers with VK_A key produce a Latin "f"
+             ("may be logical" with a JCUKEN-flavored Russian keyboard flavor);
+         (B) removing any one of lCtrl, rCtrl, rAlt changes the produced char;
+         (C) Win-modifier is not affecting the produced character 
+             (this is the common case: happens with all "standard" layouts).
+
+       Suppose the user presses Win+lCtrl+rCtrl+rAlt modifiers with VK_A.
+       What is the intent of the user?  We need to guess the intent to decide  
+       which event to deliver to the application.
+       
+       This looks like a reasonable logic: wince Win- modifier does not affect 
+       the output string, the user was pressing Win for SOME OTHER purpose.
+       So the user wanted to generate Win-SOMETHING event.  Now, what is
+       something?  If one takes the mantra that "character payload is more 
+       important than the combination of keypresses which resulted in this 
+       payload", then one should ignore lCtrl+rCtrl+rAlt, ignore VK_A, and
+       assume that the user wanted to generate Win-f.
+       
+       Unfortunately, without non-destructive ToUnicode(), checking (B) and (C)
+       is out of question.  So we use heuristics (hopefully, covering 99.9999%
+       of cases).
+     */
+    
+    /* If ctrl-something delivers chars, ctrl and the rest should be hidden; 
+       so the consumer of key-event won't interpret it as an accelerator. */
+    if (wmsg.dwModifiers & ctrl_modifier)
+      wmsg.dwModifiers = wmsg.dwModifiers & shift_modifier;
+    /* In many keyboard layouts, (left) Alt is not changing the character.  
+       Unless we are in this situation, strip Alt/Meta. */
+    if (wmsg.dwModifiers & (alt_modifier | meta_modifier) 
+        /* If alt-something delivers non-ASCIIchars, alt should be hidden */
+        && count == 1 && *b < 0x10000) 
+      {
+        SHORT r = VkKeyScanW( *b );
+
+        FPRINTF_WM_CHARS((stderr, "VkKeyScanW %#06x %#04x\n", (int)r, wParam));
+        if ((r & 0xFF) == wParam && !(r & ~0x1FF)) 
+          {	
+            /* Char available without Alt modifier, so Alt is "on top" */
+            if (legacy_alt_meta 
+                && *b > 0x7f && ('A' <= wParam && wParam <= 'Z'))
+	      /* For backward-compatibility with older Emacsen, let
+	         this be processed by another branch below (which would convert 
+	         it to Alt-Latin char via wParam). */
+              return 0;
+            strip_Alt = 0;
+          }
+      }
+    if (strip_Alt)
+      wmsg.dwModifiers = wmsg.dwModifiers & ~(alt_modifier | meta_modifier);
+    
+    signal_user_input ();
+    while (count--)
+      {
+        FPRINTF_WM_CHARS((stderr, "unichar %#06x\n", *b));
+        my_post_msg (&wmsg, hwnd, WM_UNICHAR, *b++, lParam);
+      }
+    if (!ctrl_cnt) /* Process ALSO as ctrl */
+      return 1;
+    else
+        FPRINTF_WM_CHARS((stderr, "extra ctrl char\n"));
+    return -1;
+  } else if (is_dead >= 0) {
+      FPRINTF_WM_CHARS((stderr, "dead %#06x\n", is_dead));
+      return 1;
+  }
+  return 0;
+}
+
 /* Main window procedure */
 
 static LRESULT CALLBACK
@@ -3007,7 +3234,6 @@ w32_wnd_proc (HWND hwnd, UINT msg, WPARA
       /* Synchronize modifiers with current keystroke.  */
       sync_modifiers ();
       record_keydown (wParam, lParam);
-      wParam = map_keypad_keys (wParam, (lParam & 0x1000000L) != 0);
 
       windows_translate = 0;
 
@@ -3117,6 +3343,46 @@ w32_wnd_proc (HWND hwnd, UINT msg, WPARA
 	    wParam = VK_NUMLOCK;
 	  break;
 	default:
+	  if (w32_unicode_gui) {	
+	    /* If this event generates characters or deadkeys, do not interpret 
+	       it as a "raw combination of modifiers and keysym".  Hide  
+	       deadkeys, and use the generated character(s) instead of the  
+	       keysym.   (Backward compatibility: exceptions for numpad keys 
+	       generating 0-9 . , / * - +, and for extra-Alt combined with a 
+	       non-Latin char.) 
+	       
+	       Try to not report modifiers which have effect on which 
+	       character or deadkey is generated.
+	       
+	       Example (contrived): if rightAlt-? generates f (on a Cyrillic 
+	       keyboard layout), and Ctrl, leftAlt do not affect the generated
+	       character, one wants to report Ctrl-leftAlt-f if the user 
+	       presses Ctrl-leftAlt-rightAlt-?. */
+	    int res; 
+#if 0
+	    /* Some of WM_CHAR may be fed to us directly, some are results of 
+	       TranslateMessage().  Using 0 as the first argument (in a 
+	       separate call) might help us distinguish these two cases.
+
+	       However, the keypress feeders would most probably expect the
+	       "standard" message pump, when TranslateMessage() is called on 
+	       EVERY KeyDown/Keyup event.  So they may feed us Down-Ctrl
+	       Down-FAKE Char-o and expect us to recognize it as Ctrl-o.
+	       Using 0 as the first argument would interfere with this.  */
+	    deliver_wm_chars (0, hwnd, msg, wParam, lParam, 1);
+#endif
+	    /* Processing the generated WM_CHAR messages *WHILE* we handle 
+	       KEYDOWN/UP event is the best choice, since withoug any fuss, 
+	       we know all 3 of: scancode, virtual keycode, and expansion. 
+	       (Additionally, one knows boundaries of expansion of different
+	       keypresses.) */
+	    res = deliver_wm_chars (1, hwnd, msg, wParam, lParam, 1);
+	    windows_translate = -( res != 0 );
+	    if (res > 0) /* Bound to character(s) or a deadkey */
+	      break;
+	    /* deliver_wm_chars() may make some branches after this vestigal */
+	  }
+          wParam = map_keypad_keys (wParam, (lParam & 0x1000000L) != 0);
 	  /* If not defined as a function key, change it to a WM_CHAR message. */
 	  if (wParam > 255 || !lispy_function_keys[wParam])
 	    {
@@ -3184,6 +3450,8 @@ w32_wnd_proc (HWND hwnd, UINT msg, WPARA
 	    }
 	}
 
+    if (windows_translate == -1)
+      break;
     translate:
       if (windows_translate)
 	{





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#19994: 25.0.50; Unicode keyboard input on Windows
  2015-07-01 10:07   ` Ilya Zakharevich
@ 2015-07-09  0:02     ` Ilya Zakharevich
  2015-07-31  9:23       ` Eli Zaretskii
  0 siblings, 1 reply; 12+ messages in thread
From: Ilya Zakharevich @ 2015-07-09  0:02 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 19994

[-- Attachment #1: Type: text/plain, Size: 2011 bytes --]

On Wed, Jul 01, 2015 at 03:07:12AM -0700, Ilya Zakharevich wrote:
> On Wed, Mar 04, 2015 at 08:01:01PM +0200, Eli Zaretskii wrote:

> > I suggest, indeed, to clean up the code so we could commit it to the
> > master branch.  That way, it will get wider testing, and we can fix

> I had no time to work on the code itself, but
>   • I fixed the formatting,
>   • I pumped up the docs,
>   • I put in the suggested eassert().

The variant I sent was too primitive — it was not covering a (common?)
usage case when (with AltGr-layouts) leftCtrl+rightCtrl was behaving
differently than pressing AltGr:
   • leftCtrl+rightCtrl would trigger C-M-key;
   • altGr would enter the character payload.

This update

  (0) fixes two formatting-style omissions;

  (A) adds A LOAD of new comments;
  (B) treats such important cases (as above) separately;

  (z) Marks a piece of old code which does not make any sense.
        (see the last chunk in the relative patch)

Notes:

  • In (B), there are some decisions to make.  I encapsulate these
    decisions into two strings.  For best result, these strings should
    be user-customizable.  However, currently they are just put into
    C #defines.

    When I sit on this more, and if these customizations turn out to
    be useful, one can make them into Lisp variables.

  • There is a bug in the (old) Emacs code which prevents some cases
    treated in (B) from being really useful.  I did not fix it yet.

    To see the bug:
      ∘ switch to layout with AltGr;
      ∘ assume that AltGr-s produces ß (as with US International);
      ∘ pressing AltGr-rightControl-s produces Meta-ß;
      ∘ pressing rightControl-AltGr-s produces C-M-s.
    (I do not think this effect is intentional.)

  • And, BTW, is it documented anywhere that
    leftControl-rightControl-key produces C-M-key?

I include two patches:
  □   absolute (ignore the previous patches)
  □   relative (with whitespace ignored) — for reading.

Enjoy,
Ilya

[-- Attachment #2: w32fns.c-diff-v2 --]
[-- Type: text/plain, Size: 24021 bytes --]

--- w32fns.c-ini	2015-01-30 15:33:23.505201400 -0800
+++ w32fns.c	2015-07-08 16:32:11.187197700 -0700
@@ -2832,6 +2832,413 @@ post_character_message (HWND hwnd, UINT
   my_post_msg (&wmsg, hwnd, msg, wParam, lParam);
 }
 
+static int
+get_wm_chars (HWND aWnd, int *buf, int buflen, int ignore_ctrl, int ctrl, 
+              int *ctrl_cnt, int *is_dead, int vk, int exp)
+{
+  MSG msg;
+  /* If doubled is at the end, ignore it */
+  int i = buflen, doubled = 0, code_unit;
+
+  if (ctrl_cnt)
+    *ctrl_cnt = 0;
+  if (is_dead)
+    *is_dead = -1;
+  eassert(w32_unicode_gui);
+  while (buflen
+  	 /* Should be called only when w32_unicode_gui: */
+         && PeekMessageW(&msg, aWnd, WM_KEYFIRST, WM_KEYLAST, 
+         	      PM_NOREMOVE | PM_NOYIELD)
+         && (msg.message == WM_CHAR || msg.message == WM_SYSCHAR 
+             || msg.message == WM_DEADCHAR || msg.message == WM_SYSDEADCHAR 
+             || msg.message == WM_UNICHAR)) 
+    { 
+      /* We extract character payload, but in this call we handle only the 
+         characters which comes BEFORE the next keyup/keydown message. */
+      int dead;
+
+      GetMessageW(&msg, aWnd, msg.message, msg.message);
+      dead = (msg.message == WM_DEADCHAR || msg.message == WM_SYSDEADCHAR);
+      if (is_dead)
+        *is_dead = (dead ? msg.wParam : -1);
+      if (dead)
+        continue;
+      code_unit = msg.wParam;
+      if (doubled) 
+        { 
+          /* had surrogate */
+          if (msg.message == WM_UNICHAR 
+              || code_unit < 0xDC00 || code_unit > 0xDFFF) 
+            { /* Mismatched first surrogate.  
+                 Pass both code units as if they were two characters. */
+              *buf++ = doubled;
+              if (!--buflen)
+                return i; /* Drop the 2nd char if at the end of the buffer. */
+            } 
+          else /* see https://en.wikipedia.org/wiki/UTF-16 */
+            {
+              code_unit = (doubled << 10) + code_unit - 0x35FDC00;
+            }
+          doubled = 0;
+        } 
+      else if (code_unit >= 0xD800 && code_unit <= 0xDBFF) 
+        {    
+          /* Handle mismatched 2nd surrogate the same as a normal character. */
+          doubled = code_unit;
+          continue;
+        }
+
+      /* The only "fake" characters delivered by ToUnicode() or 
+         TranslateMessage() are: 
+         0x01 .. 0x1a for Ctrl-letter, Enter, Tab, Ctrl-Break, Esc, Backspace
+         0x00 and 0x1b .. 0x1f for Control- []\@^_ 
+         0x7f for Control-BackSpace
+         0x20 for Control-Space */
+      if (ignore_ctrl 
+          && (code_unit < 0x20 || code_unit == 0x7f 
+              || (code_unit == 0x20 && ctrl))) 
+        { 
+          /* Non-character payload in a WM_CHAR
+             (Ctrl-something pressed, see above).  Ignore, and report. */
+          if (ctrl_cnt)
+            *ctrl_cnt++;
+          continue;
+        }
+      /* Traditionally, Emacs would ignore the character payload of VK_NUMPAD* 
+         keys, and would treat them later via `function-key-map'.  In addition
+         to usual 102-key NUMPAD keys, this map also treats `kp-'-variants of
+         space, tab, enter, separator, equal.  TAB  and EQUAL, apparently, 
+         cannot be generated on Win-GUI branch.  ENTER is already handled 
+         by the code above.  According to `lispy_function_keys', kp_space is 
+         generated by not-extended VK_CLEAR.  (kp-tab !=  VK_OEM_NEC_EQUAL!). 
+       
+         We do similarly for backward-compatibility, but ignore only the
+         characters restorable later by `function-key-map'. */
+      if (code_unit < 0x7f 
+          && ((vk >= VK_NUMPAD0 && vk <= VK_DIVIDE) 
+              || (exp && ((vk >= VK_PRIOR && vk <= VK_DOWN) || 
+                     vk == VK_INSERT || vk == VK_DELETE || vk == VK_CLEAR))) 
+          && strchr("0123456789/*-+.,", code_unit))
+        continue;
+      *buf++ = code_unit;
+      buflen--;
+    }
+  return i - buflen;
+}
+
+#ifdef DBG_WM_CHARS
+#  define FPRINTF_WM_CHARS(ARG)	fprintf ARG
+#else
+#  define FPRINTF_WM_CHARS(ARG)	0
+#endif
+
+/* This is a heuristic only.  This is supposed to track the state of the
+   finite automaton in the language environment of Windows. 
+   
+   However, separate windows (if with the same different language 
+   environments!) should  have different values.  Moreover, switching to a 
+   non-Emacs window with the same language environment, and using (dead)keys 
+   there would change the value stored in the kernel, but not this value. */
+static int after_deadkey = 0;
+
+int
+deliver_wm_chars (int do_translate, HWND hwnd, UINT msg, UINT wParam, 
+                  UINT lParam, int legacy_alt_meta)
+{
+  /* An "old style" keyboard description may assign up to 125 UTF-16 code 
+     points to a keypress. 
+     (However, the "old style" TranslateMessage() would deliver at most 16 of 
+     them.)  Be on a safe side, and prepare to treat many more. */
+  int ctrl_cnt, buf[1024], count, is_dead, after_dead = (after_deadkey != -1);
+
+  /* Since the keypress processing logic of Windows has a lot of state, it 
+     is important to call TranslateMessage() for every keyup/keydown, AND
+     do it exactly once.  (The actual change of state is done by
+     ToUnicode[Ex](), which is called by TranslateMessage().  So one can
+     call ToUnicode[Ex]() instead.)
+     
+     The "usual" message pump calls TranslateMessage() for EVERY event.
+     Emacs calls TranslateMessage() very selectively (is it needed for doing 
+     some tricky stuff with Win95???  With newer Windows, selectiveness is,
+     most probably, not needed - and harms a lot). 
+     
+     So, with the usual message pump, the following call to TranslateMessage() 
+     is not needed (and is going to be VERY harmful).  With Emacs' message 
+     pump, the call is needed.  */
+  if (do_translate) 
+    {
+      MSG windows_msg = { hwnd, msg, wParam, lParam, 0, {0,0} };
+
+      windows_msg.time = GetMessageTime ();
+      TranslateMessage (&windows_msg);
+    }
+  count = get_wm_chars (hwnd, buf, sizeof(buf)/sizeof(*buf), 1,
+                        /* The message may have been synthesized by 
+                           who knows what; be conservative. */
+                        modifier_set (VK_LCONTROL) 
+                          || modifier_set (VK_RCONTROL) 
+                          || modifier_set (VK_CONTROL), 
+                        &ctrl_cnt, &is_dead, wParam, 
+                        (lParam & 0x1000000L) != 0);
+  if (count) 
+    {
+      W32Msg wmsg;
+      DWORD console_modifiers = construct_console_modifiers ();
+      int *b = buf, strip_Alt = 1, strip_ExtraMods = 1, hairy = 0;
+      char *type_CtrlAlt = NULL;
+
+      /*  XXXX In fact, there may be another case when we need to do the same: 
+               What happens if the string defined in the LIGATURES has length
+               0?  Probably, we will get count==0, but the state of the finite
+               automaton would reset to 0???  */
+     after_deadkey = -1;
+      
+      /* wParam is checked when converting CapsLock to Shift; this is a clone 
+         of w32_get_key_modifiers (). */
+      wmsg.dwModifiers = w32_kbd_mods_to_emacs (console_modifiers, wParam);
+
+      /* What follows is just heuristics; the correct treatement requires 
+         non-destructive ToUnicode(): 
+           http://search.cpan.org/~ilyaz/UI-KeyboardLayout/lib/UI/KeyboardLayout.pm#Can_an_application_on_Windows_accept_keyboard_events?_Part_IV:_application-specific_modifiers
+
+         What one needs to find is: 
+           * which of the present modifiers AFFECT the resulting char(s) 
+             (so should be stripped, since their EFFECT is "already
+              taken into account" in the string in buf), and 
+           * which modifiers are not affecting buf, so should be reported to
+             the application for further treatment.
+       
+         Example: assume that we know:
+           (A) lCtrl+rCtrl+rAlt modifiers with VK_A key produce a Latin "f"
+               ("may be logical" in JCUKEN-flavored Russian keyboard flavors);
+           (B) removing any of lCtrl, rCtrl, rAlt changes the produced char;
+           (C) Win-modifier is not affecting the produced character 
+               (this is the common case: happens with all "standard" layouts).
+
+         Suppose the user presses Win+lCtrl+rCtrl+rAlt modifiers with VK_A.
+         What is the intent of the user?  We need to guess the intent to decide
+         which event to deliver to the application.
+       
+         This looks like a reasonable logic: since Win- modifier doesn't affect
+         the output string, the user was pressing Win for SOME OTHER purpose.
+         So the user wanted to generate Win-SOMETHING event.  Now, what is
+         something?  If one takes the mantra that "character payload is more 
+         important than the combination of keypresses which resulted in this 
+         payload", then one should ignore lCtrl+rCtrl+rAlt, ignore VK_A, and
+         assume that the user wanted to generate Win-f.
+       
+         Unfortunately, without non-destructive ToUnicode(), checking (B),(C)
+         is out of question.  So we use heuristics (hopefully, covering 
+         99.9999% of cases).
+       */
+
+      /* Another thing to watch for is a possibility to use AltGr-* and 
+         Ctrl-Alt-* with different semantic.
+     
+         Background: the layout defining the KLLF_ALTGR bit are treated 
+         specially by the kernel: when VK_RMENU (=rightAlt, =AltGr) is pressed 
+         (released), a press (release) of VK_LCONTROL is emulated (unless Ctrl 
+         is already down).  As a result, any press/release of AltGr is seen 
+         by applications as a press/release of lCtrl AND rAlt.  This is 
+         applicable, in particular, to ToUnicode[Ex]().  (Keyrepeat is covered
+         the same way!)
+     
+           NOTE: it IS possible to see bare rAlt even with KLLF_ALTGR; but this
+           requires a good finger coordination: doing (physically)
+             Down-lCtrl Down-rAlt Up-lCtrl Down-a 
+           (doing quick enough, so that key repeat of rAlt [which would 
+           generate new "fake" Down-lCtrl events] does not happens before 'a' 
+           is down) results in no "fake" events, so the application will see 
+           only rAlt down when 'a' is pressed.  (However, fake Up-lCtrl WILL 
+           be generated when rAlt goes UP.)
+       
+           In fact, note also that KLLF_ALTGR does not prohibit construction of 
+           rCtrl-rAlt (just press them in this order!).
+     
+         Moreover: "traditional" layouts do not define distinct modifier-masks
+         for VK_LMENU and VK_RMENU (same for VK_L/RCONTROL).  Instead, they 
+         rely on the KLLF_ALTGR bit to make the behaviour of VK_LMENU and 
+         VK_RMENU distinct.  As a corollary, for such layouts, the produced 
+         character is the same for AltGr-* (=rAlt-*) and Ctrl-Alt-* (in any 
+         combination of handedness).  For description of masks, see
+     
+           http://search.cpan.org/~ilyaz/UI-KeyboardLayout/lib/UI/KeyboardLayout.pm#Keyboard_input_on_Windows,_Part_I:_what_is_the_kernel_doing?
+     
+         By default, Emacs was using these coincidences via the following
+         heuristics: it was treating: 
+          (*) keypresses with lCtrl-rAlt modifiers as if they are carrying 
+              ONLY the character payload (no matter what the actual keyboard 
+              was defining: if lCtrl-lAlt-b was delivering U+05df=beta, then 
+              Emacs saw [beta]; if lCtrl-lAlt-b was undefined in the layout, 
+              the keypress was completely ignored), and
+          (*) keypresses with the other combinations of handedness of Ctrl-Alt 
+              modifiers (e.g., lCtrl-lAlt) as if they NEVER carry a character 
+              payload (so they were reported "raw": if lCtrl-lAlt-b was 
+              delivering beta, then Emacs saw event [C-A-b], and not [beta]).
+         This worked good for "traditional" layouts: users could type both 
+         AltGr-x and Ctrl-Alt-x, and one was a character, another a bindable 
+         event.
+     
+         However, for layouts which deliver different characters for AltGr-x 
+         and lCtrl-lAlt-x, this scheme makes the latter character unaccessible 
+         in Emacs.  While it is easy to access functionality of [C-M-x] in 
+         Emacs by other means (for example, by the `controlify' prefix, or 
+         using lCtrl-rCtrl-x, or rCtrl-rAlt-x [in this order]), missing 
+         characters cannot be reconstructed without a tedious manual work. */
+     
+      /* These two cases are often going to be distinguishable, since at most 
+         one of these character is defined with KBDCTRL | KBDMENU modifier 
+         bitmap.  (This heuristic breaks if both lCtrl-lAlt- AND lCtrl-rAlt- 
+         are translated to modifier bitmaps distinct from KBDCTRL | KBDMENU, 
+         or in the cases when lCtrl-lAlt-* and lCtrl-rAlt-* are generally 
+         different, but lCtrl-lAlt-x and lCtrl-rAlt-x happen to deliver the 
+         same character.)
+     
+         So we have 2 chunks of info:
+           (A) is it lCtrl-rAlt-, or lCtrl-lAlt, or some other combination?
+           (B) is the delivered character defined with KBDCTRL | KBDMENU bits?
+         Basing on (A) and (B), we should decide whether to ignore the 
+         delivered character.  (Before, Emacs was completely ignoring (B), and 
+         was treating the 3-state of (A) as a bit.)  This means that we have 6 
+         bits of customization. 
+         
+         Additionally, a presence of two Ctrl down may be AltGr-rCtrl-.*/
+
+      /* Strip all non-Shift modifiers if: 
+        - more than one UTF-16 code point delivered (can't call VkKeyScanW ())
+        - or the character is a result of combining with a prefix key. */
+      if (!after_dead && count == 1 && *b < 0x10000)
+        {
+          if (console_modifiers & (RIGHT_ALT_PRESSED | LEFT_ALT_PRESSED)
+              && console_modifiers & (RIGHT_CTRL_PRESSED | LEFT_CTRL_PRESSED))
+            {
+              type_CtrlAlt = "bB";   /* generic bindable Ctrl-Alt- modifiers */
+              if (console_modifiers & (LEFT_CTRL_PRESSED | RIGHT_CTRL_PRESSED)
+                  == (LEFT_CTRL_PRESSED | RIGHT_CTRL_PRESSED))
+                 /* double-Ctrl: 
+                    e.g. AltGr-rCtrl on some layouts (in this order!) */
+                type_CtrlAlt = "dD";
+              else if (console_modifiers 
+              	       & (LEFT_CTRL_PRESSED | LEFT_ALT_PRESSED)
+                       == (LEFT_CTRL_PRESSED | LEFT_ALT_PRESSED))
+                type_CtrlAlt = "lL"; /* Ctrl-Alt- modifiers on the left */
+              else if (!NILP (Vw32_recognize_altgr)
+                       && (console_modifiers 
+                           & (RIGHT_ALT_PRESSED | LEFT_CTRL_PRESSED))
+                          == (RIGHT_ALT_PRESSED | LEFT_CTRL_PRESSED))
+                type_CtrlAlt = "gG"; /* modifiers as in AltGr */
+            }
+          else if (wmsg.dwModifiers & (alt_modifier | meta_modifier)
+                   || (console_modifiers 
+                       & (RIGHT_WIN_PRESSED | RIGHT_WIN_PRESSED 
+                          | APPS_PRESSED | SCROLLLOCK_ON)))
+            {
+              /* pure Alt (or combination of Alt, Win, APPS, scrolllock */
+              type_CtrlAlt = "aA";
+            }
+          if (type_CtrlAlt)
+            {
+              /* Out of bound bitmap: */
+              SHORT r = VkKeyScanW( *b ), bitmap = 0x1FF;
+
+              FPRINTF_WM_CHARS((stderr, "VkKeyScanW %#06x %#04x\n", (int)r, 
+                               wParam));
+              if ((r & 0xFF) == wParam)
+                bitmap = r>>8; /* *b is reachable via simple interface */
+              if (*type_CtrlAlt == 'a') /* Simple Alt seen */
+                {
+                  if ((bitmap & ~1) == 0) /* 1: KBDSHIFT */
+                    {
+                      /* In "traditional" layouts, Alt without Ctrl does not 
+                         change the delivered character.  This detects this 
+                         situation; it is safe to report this as Alt-something 
+                          - as opposed to delivering the reported character 
+                          without modifiers. */
+                      if (legacy_alt_meta 
+                          && *b > 0x7f && ('A' <= wParam && wParam <= 'Z'))
+                        /* For backward-compatibility with older Emacsen, let
+                           this be processed by another branch below (which 
+                           would convert it to Alt-Latin char via wParam). */
+                        return 0;
+                    }
+                  else
+                    {
+                      hairy = 1;
+                    }
+                }
+              /* Check whether the delivered character(s) is accessible via 
+                 KBDCTRL | KBDALT ( | KBDSHIFT ) modifier mask (which is 7). */
+              else if ((bitmap & ~1) != 6)
+                {
+                  /* The character is not accessible via plain Ctrl-Alt(-Shift) 
+                     (which is, probably, same as AltGr) modifiers. 
+                     Either it was after a prefix key, or is combined with 
+                     modifier keys which we don't see, or there is an asymmetry
+                     between left-hand and right-hand modifiers, or other hairy
+                     stuff. */
+                  hairy = 1;
+                }
+              /* The best solution is to delegate these tough (but rarely 
+                 needed) choices to the user.  Temporarily (???), it is 
+                 implemented as C macros.
+               
+                 Essentially, there are 3 things to do: return 0 (handle to the
+                 legacy processing code [ignoring the character payload]; keep 
+                 some modifiers (so that they will be processed by the binding 
+                 system [on top of the character payload]; strip modifiers [so 
+                 that `self-insert' is going to be triggered with the character 
+                 payload]). 
+                 
+                 The default below should cover 99.9999% of cases: 
+                   (a) strip Alt- in the hairy case only;  
+                       (stripping = not ignoring) 
+                   (l) for lAlt-lCtrl, ignore the char in simple cases only;
+                   (g) for what looks like AltGr, ignore the modifiers;
+                   (d) for what looks like lCtrl-rCtrl-Alt (probably
+                       AltGr-rCtrl), ignore the character in simple cases only;
+                   (b) for other cases of Ctrl-Alt, ignore the character in
+                       simple cases only.
+
+                 Essentially, in all hairy cases, and in looks-like-AltGr case,
+                 we keep the character, ignoring the modifiers.  In all the
+                 other cases, we ignore the delivered character.
+                */
+#define S_TYPES_TO_IGNORE_CHARACTER_PAYLOAD "aldb"
+#define S_TYPES_TO_REPORT_CHARACTER_PAYLOAD_WITH_MODIFIERS ""
+              if (strchr(S_TYPES_TO_IGNORE_CHARACTER_PAYLOAD, 
+			 type_CtrlAlt[hairy]))
+                return 0;
+              /* if in neither list, report all the modifiers we see COMBINED 
+                 WITH the reported character */
+              if (strchr(S_TYPES_TO_REPORT_CHARACTER_PAYLOAD_WITH_MODIFIERS, 
+			  type_CtrlAlt[hairy]))
+                strip_ExtraMods = 0;
+            }
+        }
+      if (strip_ExtraMods)
+        wmsg.dwModifiers = wmsg.dwModifiers & shift_modifier;
+    
+      signal_user_input ();
+      while (count--)
+        {
+          FPRINTF_WM_CHARS((stderr, "unichar %#06x\n", *b));
+          my_post_msg (&wmsg, hwnd, WM_UNICHAR, *b++, lParam);
+        }
+      if (!ctrl_cnt) /* Process ALSO as ctrl */
+        return 1;
+      else
+          FPRINTF_WM_CHARS((stderr, "extra ctrl char\n"));
+      return -1;
+    } 
+  else if (is_dead >= 0) 
+    {
+      FPRINTF_WM_CHARS((stderr, "dead %#06x\n", is_dead));
+      after_deadkey = is_dead;
+      return 1;
+    } 
+  return 0;
+}
+
 /* Main window procedure */
 
 static LRESULT CALLBACK
@@ -2948,6 +3355,15 @@ w32_wnd_proc (HWND hwnd, UINT msg, WPARA
       /* Inform lisp thread of keyboard layout changes.  */
       my_post_msg (&wmsg, hwnd, msg, wParam, lParam);
 
+      /* The state of the finite automaton is separate per every input 
+         language environment (so it does not change when one switches 
+         to a different window with the same environment).  Moreover,
+         the experiments show that the state is not remembered when
+         one switches back to the pre-previous environment. */
+      after_deadkey = -1;
+
+      /* XXXX??? What follows is a COMPLETE misunderstanding of Windows! */
+      
       /* Clear dead keys in the keyboard state; for simplicity only
          preserve modifier key states.  */
       {
@@ -3007,7 +3423,6 @@ w32_wnd_proc (HWND hwnd, UINT msg, WPARA
       /* Synchronize modifiers with current keystroke.  */
       sync_modifiers ();
       record_keydown (wParam, lParam);
-      wParam = map_keypad_keys (wParam, (lParam & 0x1000000L) != 0);
 
       windows_translate = 0;
 
@@ -3117,6 +3532,46 @@ w32_wnd_proc (HWND hwnd, UINT msg, WPARA
 	    wParam = VK_NUMLOCK;
 	  break;
 	default:
+	  if (w32_unicode_gui) {	
+	    /* If this event generates characters or deadkeys, do not interpret 
+	       it as a "raw combination of modifiers and keysym".  Hide  
+	       deadkeys, and use the generated character(s) instead of the  
+	       keysym.   (Backward compatibility: exceptions for numpad keys 
+	       generating 0-9 . , / * - +, and for extra-Alt combined with a 
+	       non-Latin char.) 
+	       
+	       Try to not report modifiers which have effect on which 
+	       character or deadkey is generated.
+	       
+	       Example (contrived): if rightAlt-? generates f (on a Cyrillic 
+	       keyboard layout), and Ctrl, leftAlt do not affect the generated
+	       character, one wants to report Ctrl-leftAlt-f if the user 
+	       presses Ctrl-leftAlt-rightAlt-?. */
+	    int res; 
+#if 0
+	    /* Some of WM_CHAR may be fed to us directly, some are results of 
+	       TranslateMessage().  Using 0 as the first argument (in a 
+	       separate call) might help us distinguish these two cases.
+
+	       However, the keypress feeders would most probably expect the
+	       "standard" message pump, when TranslateMessage() is called on 
+	       EVERY KeyDown/Keyup event.  So they may feed us Down-Ctrl
+	       Down-FAKE Char-o and expect us to recognize it as Ctrl-o.
+	       Using 0 as the first argument would interfere with this.  */
+	    deliver_wm_chars (0, hwnd, msg, wParam, lParam, 1);
+#endif
+	    /* Processing the generated WM_CHAR messages *WHILE* we handle 
+	       KEYDOWN/UP event is the best choice, since withoug any fuss, 
+	       we know all 3 of: scancode, virtual keycode, and expansion. 
+	       (Additionally, one knows boundaries of expansion of different
+	       keypresses.) */
+	    res = deliver_wm_chars (1, hwnd, msg, wParam, lParam, 1);
+	    windows_translate = -( res != 0 );
+	    if (res > 0) /* Bound to character(s) or a deadkey */
+	      break;
+	    /* deliver_wm_chars() may make some branches after this vestigal */
+	  }
+          wParam = map_keypad_keys (wParam, (lParam & 0x1000000L) != 0);
 	  /* If not defined as a function key, change it to a WM_CHAR message. */
 	  if (wParam > 255 || !lispy_function_keys[wParam])
 	    {
@@ -3184,6 +3639,8 @@ w32_wnd_proc (HWND hwnd, UINT msg, WPARA
 	    }
 	}
 
+    if (windows_translate == -1)
+      break;
     translate:
       if (windows_translate)
 	{

[-- Attachment #3: w32fns.c-diff-v2-relative --]
[-- Type: text/plain, Size: 17600 bytes --]

--- w32fns.c-sent2	2015-07-01 02:56:30.787672000 -0700
+++ w32fns.c	2015-07-08 16:32:11.187197700 -0700
@@ -2932,6 +2932,15 @@ get_wm_chars (HWND aWnd, int *buf, int b
 #  define FPRINTF_WM_CHARS(ARG)	0
 #endif
 
+/* This is a heuristic only.  This is supposed to track the state of the
+   finite automaton in the language environment of Windows. 
+   
+   However, separate windows (if with the same different language 
+   environments!) should  have different values.  Moreover, switching to a 
+   non-Emacs window with the same language environment, and using (dead)keys 
+   there would change the value stored in the kernel, but not this value. */
+static int after_deadkey = 0;
+
 int
 deliver_wm_chars (int do_translate, HWND hwnd, UINT msg, UINT wParam, 
                   UINT lParam, int legacy_alt_meta)
@@ -2940,7 +2949,7 @@ deliver_wm_chars (int do_translate, HWND
      points to a keypress. 
      (However, the "old style" TranslateMessage() would deliver at most 16 of 
      them.)  Be on a safe side, and prepare to treat many more. */
-  int ctrl_cnt, buf[1024], count, is_dead;
+  int ctrl_cnt, buf[1024], count, is_dead, after_dead = (after_deadkey != -1);
 
   /* Since the keypress processing logic of Windows has a lot of state, it 
      is important to call TranslateMessage() for every keyup/keydown, AND
@@ -2956,7 +2965,8 @@ deliver_wm_chars (int do_translate, HWND
      So, with the usual message pump, the following call to TranslateMessage() 
      is not needed (and is going to be VERY harmful).  With Emacs' message 
      pump, the call is needed.  */
-  if (do_translate) {
+  if (do_translate) 
+    {
       MSG windows_msg = { hwnd, msg, wParam, lParam, 0, {0,0} };
 
       windows_msg.time = GetMessageTime ();
@@ -2970,13 +2980,22 @@ deliver_wm_chars (int do_translate, HWND
                           || modifier_set (VK_CONTROL), 
                         &ctrl_cnt, &is_dead, wParam, 
                         (lParam & 0x1000000L) != 0);
-  if (count) {
+  if (count) 
+    {
     W32Msg wmsg;
-    int *b = buf, strip_Alt = 1;
-
-    /* wParam is checked when converting CapsLock to Shift */
-    wmsg.dwModifiers = do_translate 
-	? w32_get_key_modifiers (wParam, lParam) : 0;
+      DWORD console_modifiers = construct_console_modifiers ();
+      int *b = buf, strip_Alt = 1, strip_ExtraMods = 1, hairy = 0;
+      char *type_CtrlAlt = NULL;
+
+      /*  XXXX In fact, there may be another case when we need to do the same: 
+               What happens if the string defined in the LIGATURES has length
+               0?  Probably, we will get count==0, but the state of the finite
+               automaton would reset to 0???  */
+     after_deadkey = -1;
+      
+      /* wParam is checked when converting CapsLock to Shift; this is a clone 
+         of w32_get_key_modifiers (). */
+      wmsg.dwModifiers = w32_kbd_mods_to_emacs (console_modifiers, wParam);
 
     /* What follows is just heuristics; the correct treatement requires 
        non-destructive ToUnicode(): 
@@ -2991,8 +3010,8 @@ deliver_wm_chars (int do_translate, HWND
        
        Example: assume that we know:
          (A) lCtrl+rCtrl+rAlt modifiers with VK_A key produce a Latin "f"
-             ("may be logical" with a JCUKEN-flavored Russian keyboard flavor);
-         (B) removing any one of lCtrl, rCtrl, rAlt changes the produced char;
+               ("may be logical" in JCUKEN-flavored Russian keyboard flavors);
+           (B) removing any of lCtrl, rCtrl, rAlt changes the produced char;
          (C) Win-modifier is not affecting the produced character 
              (this is the common case: happens with all "standard" layouts).
 
@@ -3000,7 +3019,7 @@ deliver_wm_chars (int do_translate, HWND
        What is the intent of the user?  We need to guess the intent to decide  
        which event to deliver to the application.
        
-       This looks like a reasonable logic: wince Win- modifier does not affect 
+         This looks like a reasonable logic: since Win- modifier doesn't affect
        the output string, the user was pressing Win for SOME OTHER purpose.
        So the user wanted to generate Win-SOMETHING event.  Now, what is
        something?  If one takes the mantra that "character payload is more 
@@ -3008,38 +3027,196 @@ deliver_wm_chars (int do_translate, HWND
        payload", then one should ignore lCtrl+rCtrl+rAlt, ignore VK_A, and
        assume that the user wanted to generate Win-f.
        
-       Unfortunately, without non-destructive ToUnicode(), checking (B) and (C)
-       is out of question.  So we use heuristics (hopefully, covering 99.9999%
-       of cases).
+         Unfortunately, without non-destructive ToUnicode(), checking (B),(C)
+         is out of question.  So we use heuristics (hopefully, covering 
+         99.9999% of cases).
      */
     
-    /* If ctrl-something delivers chars, ctrl and the rest should be hidden; 
-       so the consumer of key-event won't interpret it as an accelerator. */
-    if (wmsg.dwModifiers & ctrl_modifier)
-      wmsg.dwModifiers = wmsg.dwModifiers & shift_modifier;
-    /* In many keyboard layouts, (left) Alt is not changing the character.  
-       Unless we are in this situation, strip Alt/Meta. */
-    if (wmsg.dwModifiers & (alt_modifier | meta_modifier) 
-        /* If alt-something delivers non-ASCIIchars, alt should be hidden */
-        && count == 1 && *b < 0x10000) 
-      {
-        SHORT r = VkKeyScanW( *b );
+      /* Another thing to watch for is a possibility to use AltGr-* and 
+         Ctrl-Alt-* with different semantic.
 
-        FPRINTF_WM_CHARS((stderr, "VkKeyScanW %#06x %#04x\n", (int)r, wParam));
-        if ((r & 0xFF) == wParam && !(r & ~0x1FF)) 
-          {	
-            /* Char available without Alt modifier, so Alt is "on top" */
+         Background: the layout defining the KLLF_ALTGR bit are treated 
+         specially by the kernel: when VK_RMENU (=rightAlt, =AltGr) is pressed 
+         (released), a press (release) of VK_LCONTROL is emulated (unless Ctrl 
+         is already down).  As a result, any press/release of AltGr is seen 
+         by applications as a press/release of lCtrl AND rAlt.  This is 
+         applicable, in particular, to ToUnicode[Ex]().  (Keyrepeat is covered
+         the same way!)
+     
+           NOTE: it IS possible to see bare rAlt even with KLLF_ALTGR; but this
+           requires a good finger coordination: doing (physically)
+             Down-lCtrl Down-rAlt Up-lCtrl Down-a 
+           (doing quick enough, so that key repeat of rAlt [which would 
+           generate new "fake" Down-lCtrl events] does not happens before 'a' 
+           is down) results in no "fake" events, so the application will see 
+           only rAlt down when 'a' is pressed.  (However, fake Up-lCtrl WILL 
+           be generated when rAlt goes UP.)
+       
+           In fact, note also that KLLF_ALTGR does not prohibit construction of 
+           rCtrl-rAlt (just press them in this order!).
+     
+         Moreover: "traditional" layouts do not define distinct modifier-masks
+         for VK_LMENU and VK_RMENU (same for VK_L/RCONTROL).  Instead, they 
+         rely on the KLLF_ALTGR bit to make the behaviour of VK_LMENU and 
+         VK_RMENU distinct.  As a corollary, for such layouts, the produced 
+         character is the same for AltGr-* (=rAlt-*) and Ctrl-Alt-* (in any 
+         combination of handedness).  For description of masks, see
+     
+           http://search.cpan.org/~ilyaz/UI-KeyboardLayout/lib/UI/KeyboardLayout.pm#Keyboard_input_on_Windows,_Part_I:_what_is_the_kernel_doing?
+     
+         By default, Emacs was using these coincidences via the following
+         heuristics: it was treating: 
+          (*) keypresses with lCtrl-rAlt modifiers as if they are carrying 
+              ONLY the character payload (no matter what the actual keyboard 
+              was defining: if lCtrl-lAlt-b was delivering U+05df=beta, then 
+              Emacs saw [beta]; if lCtrl-lAlt-b was undefined in the layout, 
+              the keypress was completely ignored), and
+          (*) keypresses with the other combinations of handedness of Ctrl-Alt 
+              modifiers (e.g., lCtrl-lAlt) as if they NEVER carry a character 
+              payload (so they were reported "raw": if lCtrl-lAlt-b was 
+              delivering beta, then Emacs saw event [C-A-b], and not [beta]).
+         This worked good for "traditional" layouts: users could type both 
+         AltGr-x and Ctrl-Alt-x, and one was a character, another a bindable 
+         event.
+     
+         However, for layouts which deliver different characters for AltGr-x 
+         and lCtrl-lAlt-x, this scheme makes the latter character unaccessible 
+         in Emacs.  While it is easy to access functionality of [C-M-x] in 
+         Emacs by other means (for example, by the `controlify' prefix, or 
+         using lCtrl-rCtrl-x, or rCtrl-rAlt-x [in this order]), missing 
+         characters cannot be reconstructed without a tedious manual work. */
+     
+      /* These two cases are often going to be distinguishable, since at most 
+         one of these character is defined with KBDCTRL | KBDMENU modifier 
+         bitmap.  (This heuristic breaks if both lCtrl-lAlt- AND lCtrl-rAlt- 
+         are translated to modifier bitmaps distinct from KBDCTRL | KBDMENU, 
+         or in the cases when lCtrl-lAlt-* and lCtrl-rAlt-* are generally 
+         different, but lCtrl-lAlt-x and lCtrl-rAlt-x happen to deliver the 
+         same character.)
+     
+         So we have 2 chunks of info:
+           (A) is it lCtrl-rAlt-, or lCtrl-lAlt, or some other combination?
+           (B) is the delivered character defined with KBDCTRL | KBDMENU bits?
+         Basing on (A) and (B), we should decide whether to ignore the 
+         delivered character.  (Before, Emacs was completely ignoring (B), and 
+         was treating the 3-state of (A) as a bit.)  This means that we have 6 
+         bits of customization. 
+         
+         Additionally, a presence of two Ctrl down may be AltGr-rCtrl-.*/
+
+      /* Strip all non-Shift modifiers if: 
+        - more than one UTF-16 code point delivered (can't call VkKeyScanW ())
+        - or the character is a result of combining with a prefix key. */
+      if (!after_dead && count == 1 && *b < 0x10000)
+        {
+          if (console_modifiers & (RIGHT_ALT_PRESSED | LEFT_ALT_PRESSED)
+              && console_modifiers & (RIGHT_CTRL_PRESSED | LEFT_CTRL_PRESSED))
+            {
+              type_CtrlAlt = "bB";   /* generic bindable Ctrl-Alt- modifiers */
+              if (console_modifiers & (LEFT_CTRL_PRESSED | RIGHT_CTRL_PRESSED)
+                  == (LEFT_CTRL_PRESSED | RIGHT_CTRL_PRESSED))
+                 /* double-Ctrl: 
+                    e.g. AltGr-rCtrl on some layouts (in this order!) */
+                type_CtrlAlt = "dD";
+              else if (console_modifiers 
+              	       & (LEFT_CTRL_PRESSED | LEFT_ALT_PRESSED)
+                       == (LEFT_CTRL_PRESSED | LEFT_ALT_PRESSED))
+                type_CtrlAlt = "lL"; /* Ctrl-Alt- modifiers on the left */
+              else if (!NILP (Vw32_recognize_altgr)
+                       && (console_modifiers 
+                           & (RIGHT_ALT_PRESSED | LEFT_CTRL_PRESSED))
+                          == (RIGHT_ALT_PRESSED | LEFT_CTRL_PRESSED))
+                type_CtrlAlt = "gG"; /* modifiers as in AltGr */
+            }
+          else if (wmsg.dwModifiers & (alt_modifier | meta_modifier)
+                   || (console_modifiers 
+                       & (RIGHT_WIN_PRESSED | RIGHT_WIN_PRESSED 
+                          | APPS_PRESSED | SCROLLLOCK_ON)))
+            {
+              /* pure Alt (or combination of Alt, Win, APPS, scrolllock */
+              type_CtrlAlt = "aA";
+            }
+          if (type_CtrlAlt)
+            {
+              /* Out of bound bitmap: */
+              SHORT r = VkKeyScanW( *b ), bitmap = 0x1FF;
+
+              FPRINTF_WM_CHARS((stderr, "VkKeyScanW %#06x %#04x\n", (int)r, 
+                               wParam));
+              if ((r & 0xFF) == wParam)
+                bitmap = r>>8; /* *b is reachable via simple interface */
+              if (*type_CtrlAlt == 'a') /* Simple Alt seen */
+                {
+                  if ((bitmap & ~1) == 0) /* 1: KBDSHIFT */
+                    {
+                      /* In "traditional" layouts, Alt without Ctrl does not 
+                         change the delivered character.  This detects this 
+                         situation; it is safe to report this as Alt-something 
+                          - as opposed to delivering the reported character 
+                          without modifiers. */
             if (legacy_alt_meta 
                 && *b > 0x7f && ('A' <= wParam && wParam <= 'Z'))
 	      /* For backward-compatibility with older Emacsen, let
-	         this be processed by another branch below (which would convert 
-	         it to Alt-Latin char via wParam). */
+                           this be processed by another branch below (which 
+                           would convert it to Alt-Latin char via wParam). */
+                        return 0;
+                    }
+                  else
+                    {
+                      hairy = 1;
+                    }
+                }
+              /* Check whether the delivered character(s) is accessible via 
+                 KBDCTRL | KBDALT ( | KBDSHIFT ) modifier mask (which is 7). */
+              else if ((bitmap & ~1) != 6)
+                {
+                  /* The character is not accessible via plain Ctrl-Alt(-Shift) 
+                     (which is, probably, same as AltGr) modifiers. 
+                     Either it was after a prefix key, or is combined with 
+                     modifier keys which we don't see, or there is an asymmetry
+                     between left-hand and right-hand modifiers, or other hairy
+                     stuff. */
+                  hairy = 1;
+                }
+              /* The best solution is to delegate these tough (but rarely 
+                 needed) choices to the user.  Temporarily (???), it is 
+                 implemented as C macros.
+               
+                 Essentially, there are 3 things to do: return 0 (handle to the
+                 legacy processing code [ignoring the character payload]; keep 
+                 some modifiers (so that they will be processed by the binding 
+                 system [on top of the character payload]; strip modifiers [so 
+                 that `self-insert' is going to be triggered with the character 
+                 payload]). 
+                 
+                 The default below should cover 99.9999% of cases: 
+                   (a) strip Alt- in the hairy case only;  
+                       (stripping = not ignoring) 
+                   (l) for lAlt-lCtrl, ignore the char in simple cases only;
+                   (g) for what looks like AltGr, ignore the modifiers;
+                   (d) for what looks like lCtrl-rCtrl-Alt (probably
+                       AltGr-rCtrl), ignore the character in simple cases only;
+                   (b) for other cases of Ctrl-Alt, ignore the character in
+                       simple cases only.
+
+                 Essentially, in all hairy cases, and in looks-like-AltGr case,
+                 we keep the character, ignoring the modifiers.  In all the
+                 other cases, we ignore the delivered character.
+                */
+#define S_TYPES_TO_IGNORE_CHARACTER_PAYLOAD "aldb"
+#define S_TYPES_TO_REPORT_CHARACTER_PAYLOAD_WITH_MODIFIERS ""
+              if (strchr(S_TYPES_TO_IGNORE_CHARACTER_PAYLOAD, 
+			 type_CtrlAlt[hairy]))
               return 0;
-            strip_Alt = 0;
+              /* if in neither list, report all the modifiers we see COMBINED 
+                 WITH the reported character */
+              if (strchr(S_TYPES_TO_REPORT_CHARACTER_PAYLOAD_WITH_MODIFIERS, 
+			  type_CtrlAlt[hairy]))
+                strip_ExtraMods = 0;
           }
       }
-    if (strip_Alt)
-      wmsg.dwModifiers = wmsg.dwModifiers & ~(alt_modifier | meta_modifier);
+      if (strip_ExtraMods)
+        wmsg.dwModifiers = wmsg.dwModifiers & shift_modifier;
     
     signal_user_input ();
     while (count--)
@@ -3052,8 +3229,11 @@ deliver_wm_chars (int do_translate, HWND
     else
         FPRINTF_WM_CHARS((stderr, "extra ctrl char\n"));
     return -1;
-  } else if (is_dead >= 0) {
+    } 
+  else if (is_dead >= 0) 
+    {
       FPRINTF_WM_CHARS((stderr, "dead %#06x\n", is_dead));
+      after_deadkey = is_dead;
       return 1;
   }
   return 0;
@@ -3175,6 +3355,15 @@ w32_wnd_proc (HWND hwnd, UINT msg, WPARA
       /* Inform lisp thread of keyboard layout changes.  */
       my_post_msg (&wmsg, hwnd, msg, wParam, lParam);
 
+      /* The state of the finite automaton is separate per every input 
+         language environment (so it does not change when one switches 
+         to a different window with the same environment).  Moreover,
+         the experiments show that the state is not remembered when
+         one switches back to the pre-previous environment. */
+      after_deadkey = -1;
+
+      /* XXXX??? What follows is a COMPLETE misunderstanding of Windows! */
+      
       /* Clear dead keys in the keyboard state; for simplicity only
          preserve modifier key states.  */
       {

^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#19994: 25.0.50; Unicode keyboard input on Windows
  2015-07-09  0:02     ` Ilya Zakharevich
@ 2015-07-31  9:23       ` Eli Zaretskii
  2015-08-01  7:40         ` Eli Zaretskii
  0 siblings, 1 reply; 12+ messages in thread
From: Eli Zaretskii @ 2015-07-31  9:23 UTC (permalink / raw)
  To: Ilya Zakharevich; +Cc: 19994

> Date: Wed, 8 Jul 2015 17:02:59 -0700
> From: Ilya Zakharevich <ilya@math.berkeley.edu>
> Cc: 19994@debbugs.gnu.org
> 
> On Wed, Jul 01, 2015 at 03:07:12AM -0700, Ilya Zakharevich wrote:
> > On Wed, Mar 04, 2015 at 08:01:01PM +0200, Eli Zaretskii wrote:
> 
> > > I suggest, indeed, to clean up the code so we could commit it to the
> > > master branch.  That way, it will get wider testing, and we can fix
> 
> > I had no time to work on the code itself, but
> >   • I fixed the formatting,
> >   • I pumped up the docs,
> >   • I put in the suggested eassert().
> 
> The variant I sent was too primitive — it was not covering a (common?)
> usage case when (with AltGr-layouts) leftCtrl+rightCtrl was behaving
> differently than pressing AltGr:
>    • leftCtrl+rightCtrl would trigger C-M-key;
>    • altGr would enter the character payload.
> 
> This update
> 
>   (0) fixes two formatting-style omissions;
> 
>   (A) adds A LOAD of new comments;
>   (B) treats such important cases (as above) separately;
> 
>   (z) Marks a piece of old code which does not make any sense.
>         (see the last chunk in the relative patch)
> 
> Notes:
> 
>   • In (B), there are some decisions to make.  I encapsulate these
>     decisions into two strings.  For best result, these strings should
>     be user-customizable.  However, currently they are just put into
>     C #defines.
> 
>     When I sit on this more, and if these customizations turn out to
>     be useful, one can make them into Lisp variables.
> 
>   • There is a bug in the (old) Emacs code which prevents some cases
>     treated in (B) from being really useful.  I did not fix it yet.
> 
>     To see the bug:
>       ∘ switch to layout with AltGr;
>       ∘ assume that AltGr-s produces ß (as with US International);
>       ∘ pressing AltGr-rightControl-s produces Meta-ß;
>       ∘ pressing rightControl-AltGr-s produces C-M-s.
>     (I do not think this effect is intentional.)
> 
>   • And, BTW, is it documented anywhere that
>     leftControl-rightControl-key produces C-M-key?
> 
> I include two patches:
>   □   absolute (ignore the previous patches)
>   □   relative (with whitespace ignored) — for reading.

Thanks.  I committed this in your name, with a few minor stylistic
changes, and also fixed a few typos in the comments.  Sorry for a long
delay in doing that.

I also added a new variable, w32-use-fallback-wm-chars-method, which,
when non-nil, makes Emacs use the old code from before your changes.
This is meant to be a handy debugging aid, in case we discover some
issues with the new code.

Do you think there are any user-visible effects of your changes that
are worthy of mentioning in NEWS?  If so, please propose the text for
NEWS.

I leave it up to you to decide whether this bug should be closed, or
if there's something else to be done about it.

Thanks again for working on this.





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#19994: 25.0.50; Unicode keyboard input on Windows
  2015-07-31  9:23       ` Eli Zaretskii
@ 2015-08-01  7:40         ` Eli Zaretskii
  2015-08-02 14:42           ` Eli Zaretskii
  0 siblings, 1 reply; 12+ messages in thread
From: Eli Zaretskii @ 2015-08-01  7:40 UTC (permalink / raw)
  To: ilya; +Cc: 19994

> Date: Fri, 31 Jul 2015 12:23:00 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: 19994@debbugs.gnu.org
> 
> Thanks.  I committed this in your name, with a few minor stylistic
> changes, and also fixed a few typos in the comments.  Sorry for a long
> delay in doing that.
> 
> I also added a new variable, w32-use-fallback-wm-chars-method, which,
> when non-nil, makes Emacs use the old code from before your changes.
> This is meant to be a handy debugging aid, in case we discover some
> issues with the new code.
> 
> Do you think there are any user-visible effects of your changes that
> are worthy of mentioning in NEWS?  If so, please propose the text for
> NEWS.
> 
> I leave it up to you to decide whether this bug should be closed, or
> if there's something else to be done about it.

Here's one problem evidently caused by the new code: invoke "emacs -Q"
and type "M-x" after it starts => you will see "x" being inserted into
*scratch*.  This doesn't happen if w32-use-fallback-wm-chars-method is
non-nil.

This is a one-time problem: all the subsequent "M-x" are handled
correctly.  It sounds like some initialization somewhere is missing?

Could you please look into that ASAP?  TIA.





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#19994: 25.0.50; Unicode keyboard input on Windows
  2015-08-01  7:40         ` Eli Zaretskii
@ 2015-08-02 14:42           ` Eli Zaretskii
  2020-08-12 16:32             ` Stefan Kangas
  0 siblings, 1 reply; 12+ messages in thread
From: Eli Zaretskii @ 2015-08-02 14:42 UTC (permalink / raw)
  To: ilya; +Cc: 19994

> Date: Sat, 01 Aug 2015 10:40:05 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: 19994@debbugs.gnu.org
> 
> Here's one problem evidently caused by the new code: invoke "emacs -Q"
> and type "M-x" after it starts => you will see "x" being inserted into
> *scratch*.  This doesn't happen if w32-use-fallback-wm-chars-method is
> non-nil.
> 
> This is a one-time problem: all the subsequent "M-x" are handled
> correctly.  It sounds like some initialization somewhere is missing?

I've found that the simple change below fixes this problem.  I
committed it; if you feel it's not the right fix, please propose an
alternative.

Thanks.

commit 0afb8fab99951262e81d6095302de4c84d7e8847
Author: Eli Zaretskii <eliz@gnu.org>
Date:   Sun Aug 2 17:40:19 2015 +0300

    Fix handling of 1st keystroke on MS-Windows
    
    * src/w32fns.c (globals_of_w32fns): Initialize after_deadkey to -1.
    This is needed to correctly handle the session's first keystroke,
    if it has any modifiers.  (Bug#19994)

diff --git a/src/w32fns.c b/src/w32fns.c
index 1c72974..31d23c4 100644
--- a/src/w32fns.c
+++ b/src/w32fns.c
@@ -9442,6 +9442,8 @@ typedef USHORT (WINAPI * CaptureStackBackTrace_proc) (ULONG, ULONG, PVOID *,
   else
     w32_unicode_gui = 0;
 
+  after_deadkey = -1;
+
   /* MessageBox does not work without this when linked to comctl32.dll 6.0.  */
   InitCommonControls ();
 





^ permalink raw reply related	[flat|nested] 12+ messages in thread

* bug#19994: 25.0.50; Unicode keyboard input on Windows
  2015-08-02 14:42           ` Eli Zaretskii
@ 2020-08-12 16:32             ` Stefan Kangas
  0 siblings, 0 replies; 12+ messages in thread
From: Stefan Kangas @ 2020-08-12 16:32 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 19994-done, ilya

Eli Zaretskii <eliz@gnu.org> writes:

>> Date: Sat, 01 Aug 2015 10:40:05 +0300
>> From: Eli Zaretskii <eliz@gnu.org>
>> Cc: 19994@debbugs.gnu.org
>>
>> Here's one problem evidently caused by the new code: invoke "emacs -Q"
>> and type "M-x" after it starts => you will see "x" being inserted into
>> *scratch*.  This doesn't happen if w32-use-fallback-wm-chars-method is
>> non-nil.
>>
>> This is a one-time problem: all the subsequent "M-x" are handled
>> correctly.  It sounds like some initialization somewhere is missing?
>
> I've found that the simple change below fixes this problem.  I
> committed it; if you feel it's not the right fix, please propose an
> alternative.

It seems like the patch here was installed, an additional fix was
committed, and there has been no further progress within 5 years.

I'm therefore closing this bug report.

Best regards,
Stefan Kangas





^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-08-12 16:32 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-03-03 23:09 bug#19994: 25.0.50; Unicode keyboard input on Windows Ilya Zakharevich
2015-03-04 18:01 ` Eli Zaretskii
2015-03-06  0:43   ` Ilya Zakharevich
2015-03-06 10:52     ` Eli Zaretskii
2015-03-06 11:40       ` Ilya Zakharevich
2015-03-06 14:00         ` Eli Zaretskii
2015-07-01 10:07   ` Ilya Zakharevich
2015-07-09  0:02     ` Ilya Zakharevich
2015-07-31  9:23       ` Eli Zaretskii
2015-08-01  7:40         ` Eli Zaretskii
2015-08-02 14:42           ` Eli Zaretskii
2020-08-12 16:32             ` Stefan Kangas

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).