unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Ilya Zakharevich <ilya@math.berkeley.edu>
To: Eli Zaretskii <eliz@gnu.org>
Cc: 19994@debbugs.gnu.org
Subject: bug#19994: 25.0.50; Unicode keyboard input on Windows
Date: Wed, 1 Jul 2015 03:07:12 -0700	[thread overview]
Message-ID: <20150701100712.GA24175@math.berkeley.edu> (raw)
In-Reply-To: <83bnk8prqa.fsf@gnu.org>

On Wed, Mar 04, 2015 at 08:01:01PM +0200, Eli Zaretskii wrote:
> > Date: Tue, 3 Mar 2015 15:09:49 -0800
> > From: Ilya Zakharevich <nospam-abuse@ilyaz.org>
> > 
> > I’m working on a patch to make Unicode keyboard input to work properly on
> > Windows (in graphic mode).

> I suggest, indeed, to clean up the code so we could commit it to the
> master branch.  That way, it will get wider testing, and we can fix
> whatever problems it might cause.  Any deficiencies that don't cause
> regressions wrt the current code can be fixed later, or even not at
> all (if we decide them to not be important enough).

I had no time to work on the code itself, but
  • I fixed the formatting,
  • I pumped up the docs,
  • I put in the suggested eassert().

----------------

As it was before, the patch
  • defines two new static functions,
  • delays modification of wParam as late as needed (moves 1 LoC in
    w32_wnd_proc()), and
  • adds 8 LoC to w32_wnd_proc().
The call to these static functions is conditional on w32_unicode_gui.

Enjoy,
Ilya

--- w32fns.c-ini	2015-01-30 15:33:23.505201400 -0800
+++ w32fns.c	2015-07-01 02:56:30.787672000 -0700
@@ -2832,6 +2832,233 @@ post_character_message (HWND hwnd, UINT
   my_post_msg (&wmsg, hwnd, msg, wParam, lParam);
 }
 
+static int
+get_wm_chars (HWND aWnd, int *buf, int buflen, int ignore_ctrl, int ctrl, 
+              int *ctrl_cnt, int *is_dead, int vk, int exp)
+{
+  MSG msg;
+  /* If doubled is at the end, ignore it */
+  int i = buflen, doubled = 0, code_unit;
+
+  if (ctrl_cnt)
+    *ctrl_cnt = 0;
+  if (is_dead)
+    *is_dead = -1;
+  eassert(w32_unicode_gui);
+  while (buflen
+  	 /* Should be called only when w32_unicode_gui: */
+         && PeekMessageW(&msg, aWnd, WM_KEYFIRST, WM_KEYLAST, 
+         	      PM_NOREMOVE | PM_NOYIELD)
+         && (msg.message == WM_CHAR || msg.message == WM_SYSCHAR 
+             || msg.message == WM_DEADCHAR || msg.message == WM_SYSDEADCHAR 
+             || msg.message == WM_UNICHAR)) 
+    { 
+      /* We extract character payload, but in this call we handle only the 
+         characters which comes BEFORE the next keyup/keydown message. */
+      int dead;
+
+      GetMessageW(&msg, aWnd, msg.message, msg.message);
+      dead = (msg.message == WM_DEADCHAR || msg.message == WM_SYSDEADCHAR);
+      if (is_dead)
+        *is_dead = (dead ? msg.wParam : -1);
+      if (dead)
+        continue;
+      code_unit = msg.wParam;
+      if (doubled) 
+        { 
+          /* had surrogate */
+          if (msg.message == WM_UNICHAR 
+              || code_unit < 0xDC00 || code_unit > 0xDFFF) 
+            { /* Mismatched first surrogate.  
+                 Pass both code units as if they were two characters. */
+              *buf++ = doubled;
+              if (!--buflen)
+                return i; /* Drop the 2nd char if at the end of the buffer. */
+            } 
+          else /* see https://en.wikipedia.org/wiki/UTF-16 */
+            {
+              code_unit = (doubled << 10) + code_unit - 0x35FDC00;
+            }
+          doubled = 0;
+        } 
+      else if (code_unit >= 0xD800 && code_unit <= 0xDBFF) 
+        {    
+          /* Handle mismatched 2nd surrogate the same as a normal character. */
+          doubled = code_unit;
+          continue;
+        }
+
+      /* The only "fake" characters delivered by ToUnicode() or 
+         TranslateMessage() are: 
+         0x01 .. 0x1a for Ctrl-letter, Enter, Tab, Ctrl-Break, Esc, Backspace
+         0x00 and 0x1b .. 0x1f for Control- []\@^_ 
+         0x7f for Control-BackSpace
+         0x20 for Control-Space */
+      if (ignore_ctrl 
+          && (code_unit < 0x20 || code_unit == 0x7f 
+              || (code_unit == 0x20 && ctrl))) 
+        { 
+          /* Non-character payload in a WM_CHAR
+             (Ctrl-something pressed, see above).  Ignore, and report. */
+          if (ctrl_cnt)
+            *ctrl_cnt++;
+          continue;
+        }
+      /* Traditionally, Emacs would ignore the character payload of VK_NUMPAD* 
+         keys, and would treat them later via `function-key-map'.  In addition
+         to usual 102-key NUMPAD keys, this map also treats `kp-'-variants of
+         space, tab, enter, separator, equal.  TAB  and EQUAL, apparently, 
+         cannot be generated on Win-GUI branch.  ENTER is already handled 
+         by the code above.  According to `lispy_function_keys', kp_space is 
+         generated by not-extended VK_CLEAR.  (kp-tab !=  VK_OEM_NEC_EQUAL!). 
+       
+         We do similarly for backward-compatibility, but ignore only the
+         characters restorable later by `function-key-map'. */
+      if (code_unit < 0x7f 
+          && ((vk >= VK_NUMPAD0 && vk <= VK_DIVIDE) 
+              || (exp && ((vk >= VK_PRIOR && vk <= VK_DOWN) || 
+                     vk == VK_INSERT || vk == VK_DELETE || vk == VK_CLEAR))) 
+          && strchr("0123456789/*-+.,", code_unit))
+        continue;
+      *buf++ = code_unit;
+      buflen--;
+    }
+  return i - buflen;
+}
+
+#ifdef DBG_WM_CHARS
+#  define FPRINTF_WM_CHARS(ARG)	fprintf ARG
+#else
+#  define FPRINTF_WM_CHARS(ARG)	0
+#endif
+
+int
+deliver_wm_chars (int do_translate, HWND hwnd, UINT msg, UINT wParam, 
+                  UINT lParam, int legacy_alt_meta)
+{
+  /* An "old style" keyboard description may assign up to 125 UTF-16 code 
+     points to a keypress. 
+     (However, the "old style" TranslateMessage() would deliver at most 16 of 
+     them.)  Be on a safe side, and prepare to treat many more. */
+  int ctrl_cnt, buf[1024], count, is_dead;
+
+  /* Since the keypress processing logic of Windows has a lot of state, it 
+     is important to call TranslateMessage() for every keyup/keydown, AND
+     do it exactly once.  (The actual change of state is done by
+     ToUnicode[Ex](), which is called by TranslateMessage().  So one can
+     call ToUnicode[Ex]() instead.)
+     
+     The "usual" message pump calls TranslateMessage() for EVERY event.
+     Emacs calls TranslateMessage() very selectively (is it needed for doing 
+     some tricky stuff with Win95???  With newer Windows, selectiveness is,
+     most probably, not needed - and harms a lot). 
+     
+     So, with the usual message pump, the following call to TranslateMessage() 
+     is not needed (and is going to be VERY harmful).  With Emacs' message 
+     pump, the call is needed.  */
+  if (do_translate) {
+      MSG windows_msg = { hwnd, msg, wParam, lParam, 0, {0,0} };
+
+      windows_msg.time = GetMessageTime ();
+      TranslateMessage (&windows_msg);
+  }
+  count = get_wm_chars (hwnd, buf, sizeof(buf)/sizeof(*buf), 1,
+                        /* The message may have been synthesized by 
+                           who knows what; be conservative. */
+                        modifier_set (VK_LCONTROL) 
+                          || modifier_set (VK_RCONTROL) 
+                          || modifier_set (VK_CONTROL), 
+                        &ctrl_cnt, &is_dead, wParam, 
+                        (lParam & 0x1000000L) != 0);
+  if (count) {
+    W32Msg wmsg;
+    int *b = buf, strip_Alt = 1;
+
+    /* wParam is checked when converting CapsLock to Shift */
+    wmsg.dwModifiers = do_translate 
+	? w32_get_key_modifiers (wParam, lParam) : 0;
+
+    /* What follows is just heuristics; the correct treatement requires 
+       non-destructive ToUnicode(): 
+         http://search.cpan.org/~ilyaz/UI-KeyboardLayout/lib/UI/KeyboardLayout.pm#Can_an_application_on_Windows_accept_keyboard_events?_Part_IV:_application-specific_modifiers
+
+       What one needs to find is: 
+         * which of the present modifiers AFFECT the resulting char(s) 
+           (so should be stripped, since their EFFECT is "already
+            taken into account" in the string in buf), and 
+         * which modifiers are not affecting buf, so should be reported to
+           the application for further treatment.
+       
+       Example: assume that we know:
+         (A) lCtrl+rCtrl+rAlt modifiers with VK_A key produce a Latin "f"
+             ("may be logical" with a JCUKEN-flavored Russian keyboard flavor);
+         (B) removing any one of lCtrl, rCtrl, rAlt changes the produced char;
+         (C) Win-modifier is not affecting the produced character 
+             (this is the common case: happens with all "standard" layouts).
+
+       Suppose the user presses Win+lCtrl+rCtrl+rAlt modifiers with VK_A.
+       What is the intent of the user?  We need to guess the intent to decide  
+       which event to deliver to the application.
+       
+       This looks like a reasonable logic: wince Win- modifier does not affect 
+       the output string, the user was pressing Win for SOME OTHER purpose.
+       So the user wanted to generate Win-SOMETHING event.  Now, what is
+       something?  If one takes the mantra that "character payload is more 
+       important than the combination of keypresses which resulted in this 
+       payload", then one should ignore lCtrl+rCtrl+rAlt, ignore VK_A, and
+       assume that the user wanted to generate Win-f.
+       
+       Unfortunately, without non-destructive ToUnicode(), checking (B) and (C)
+       is out of question.  So we use heuristics (hopefully, covering 99.9999%
+       of cases).
+     */
+    
+    /* If ctrl-something delivers chars, ctrl and the rest should be hidden; 
+       so the consumer of key-event won't interpret it as an accelerator. */
+    if (wmsg.dwModifiers & ctrl_modifier)
+      wmsg.dwModifiers = wmsg.dwModifiers & shift_modifier;
+    /* In many keyboard layouts, (left) Alt is not changing the character.  
+       Unless we are in this situation, strip Alt/Meta. */
+    if (wmsg.dwModifiers & (alt_modifier | meta_modifier) 
+        /* If alt-something delivers non-ASCIIchars, alt should be hidden */
+        && count == 1 && *b < 0x10000) 
+      {
+        SHORT r = VkKeyScanW( *b );
+
+        FPRINTF_WM_CHARS((stderr, "VkKeyScanW %#06x %#04x\n", (int)r, wParam));
+        if ((r & 0xFF) == wParam && !(r & ~0x1FF)) 
+          {	
+            /* Char available without Alt modifier, so Alt is "on top" */
+            if (legacy_alt_meta 
+                && *b > 0x7f && ('A' <= wParam && wParam <= 'Z'))
+	      /* For backward-compatibility with older Emacsen, let
+	         this be processed by another branch below (which would convert 
+	         it to Alt-Latin char via wParam). */
+              return 0;
+            strip_Alt = 0;
+          }
+      }
+    if (strip_Alt)
+      wmsg.dwModifiers = wmsg.dwModifiers & ~(alt_modifier | meta_modifier);
+    
+    signal_user_input ();
+    while (count--)
+      {
+        FPRINTF_WM_CHARS((stderr, "unichar %#06x\n", *b));
+        my_post_msg (&wmsg, hwnd, WM_UNICHAR, *b++, lParam);
+      }
+    if (!ctrl_cnt) /* Process ALSO as ctrl */
+      return 1;
+    else
+        FPRINTF_WM_CHARS((stderr, "extra ctrl char\n"));
+    return -1;
+  } else if (is_dead >= 0) {
+      FPRINTF_WM_CHARS((stderr, "dead %#06x\n", is_dead));
+      return 1;
+  }
+  return 0;
+}
+
 /* Main window procedure */
 
 static LRESULT CALLBACK
@@ -3007,7 +3234,6 @@ w32_wnd_proc (HWND hwnd, UINT msg, WPARA
       /* Synchronize modifiers with current keystroke.  */
       sync_modifiers ();
       record_keydown (wParam, lParam);
-      wParam = map_keypad_keys (wParam, (lParam & 0x1000000L) != 0);
 
       windows_translate = 0;
 
@@ -3117,6 +3343,46 @@ w32_wnd_proc (HWND hwnd, UINT msg, WPARA
 	    wParam = VK_NUMLOCK;
 	  break;
 	default:
+	  if (w32_unicode_gui) {	
+	    /* If this event generates characters or deadkeys, do not interpret 
+	       it as a "raw combination of modifiers and keysym".  Hide  
+	       deadkeys, and use the generated character(s) instead of the  
+	       keysym.   (Backward compatibility: exceptions for numpad keys 
+	       generating 0-9 . , / * - +, and for extra-Alt combined with a 
+	       non-Latin char.) 
+	       
+	       Try to not report modifiers which have effect on which 
+	       character or deadkey is generated.
+	       
+	       Example (contrived): if rightAlt-? generates f (on a Cyrillic 
+	       keyboard layout), and Ctrl, leftAlt do not affect the generated
+	       character, one wants to report Ctrl-leftAlt-f if the user 
+	       presses Ctrl-leftAlt-rightAlt-?. */
+	    int res; 
+#if 0
+	    /* Some of WM_CHAR may be fed to us directly, some are results of 
+	       TranslateMessage().  Using 0 as the first argument (in a 
+	       separate call) might help us distinguish these two cases.
+
+	       However, the keypress feeders would most probably expect the
+	       "standard" message pump, when TranslateMessage() is called on 
+	       EVERY KeyDown/Keyup event.  So they may feed us Down-Ctrl
+	       Down-FAKE Char-o and expect us to recognize it as Ctrl-o.
+	       Using 0 as the first argument would interfere with this.  */
+	    deliver_wm_chars (0, hwnd, msg, wParam, lParam, 1);
+#endif
+	    /* Processing the generated WM_CHAR messages *WHILE* we handle 
+	       KEYDOWN/UP event is the best choice, since withoug any fuss, 
+	       we know all 3 of: scancode, virtual keycode, and expansion. 
+	       (Additionally, one knows boundaries of expansion of different
+	       keypresses.) */
+	    res = deliver_wm_chars (1, hwnd, msg, wParam, lParam, 1);
+	    windows_translate = -( res != 0 );
+	    if (res > 0) /* Bound to character(s) or a deadkey */
+	      break;
+	    /* deliver_wm_chars() may make some branches after this vestigal */
+	  }
+          wParam = map_keypad_keys (wParam, (lParam & 0x1000000L) != 0);
 	  /* If not defined as a function key, change it to a WM_CHAR message. */
 	  if (wParam > 255 || !lispy_function_keys[wParam])
 	    {
@@ -3184,6 +3450,8 @@ w32_wnd_proc (HWND hwnd, UINT msg, WPARA
 	    }
 	}
 
+    if (windows_translate == -1)
+      break;
     translate:
       if (windows_translate)
 	{





  parent reply	other threads:[~2015-07-01 10:07 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-03 23:09 bug#19994: 25.0.50; Unicode keyboard input on Windows Ilya Zakharevich
2015-03-04 18:01 ` Eli Zaretskii
2015-03-06  0:43   ` Ilya Zakharevich
2015-03-06 10:52     ` Eli Zaretskii
2015-03-06 11:40       ` Ilya Zakharevich
2015-03-06 14:00         ` Eli Zaretskii
2015-07-01 10:07   ` Ilya Zakharevich [this message]
2015-07-09  0:02     ` Ilya Zakharevich
2015-07-31  9:23       ` Eli Zaretskii
2015-08-01  7:40         ` Eli Zaretskii
2015-08-02 14:42           ` Eli Zaretskii
2020-08-12 16:32             ` Stefan Kangas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150701100712.GA24175@math.berkeley.edu \
    --to=ilya@math.berkeley.edu \
    --cc=19994@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).