unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#29837: UTF-16 char display problems and the macOS "character palette"
@ 2017-12-24 16:00 Alan Third
  2017-12-24 16:56 ` Eli Zaretskii
  0 siblings, 1 reply; 9+ messages in thread
From: Alan Third @ 2017-12-24 16:00 UTC (permalink / raw)
  To: 29837

[-- Attachment #1: Type: text/plain, Size: 1236 bytes --]

Hi, I’ve had a go at enabling the macOS character palette, which is
just a virtual keyboard that helps you to enter special characters,
emoji’s, etc.

It’s easy enough to bring it up (patch attached) but some special
characters are put into Emacs incorrectly. I think the problem is that
we have multi code‐point UTF‐16 characters, and when they are ‘typed’
into Emacs they are entered as individual 16 bit code‐points and are
therefore displayed as a series of blank spaces.

An example is '🢫' (RIGHTWARDS FRONT-TILTED SHADOWED WHITE ARROW). If I
enter it using C‐x 8 RET, it appears correctly, but if I use the
character palette it shows up as two blank spaces. Describe-char
reveals these to be HIGH SURROGATE-D83E and LOW SURROGATE-DCAB, in
that order.

I can’t work out if Emacs should be able to handle these multi
code‐point characters being entered from a ‘keyboard’ input or not. If
so, does anyone have any idea what I need to do?

(Another minor irritation is that some characters (like pointing
hands) seem to insert the desired character then follow up with
VARIATION SELECTOR-15. I assume this is supposed to tell us what
colour we want the hand? If so should it be displayed?)
-- 
Alan Third

[-- Attachment #2: 0001-Add-macOS-character-palette.patch --]
[-- Type: text/plain, Size: 2908 bytes --]

From ad16b98288abe91732217535e308ae445303ab59 Mon Sep 17 00:00:00 2001
From: Alan Third <alan@idiocy.org>
Date: Sun, 24 Dec 2017 15:40:03 +0000
Subject: [PATCH] Add macOS character-palette

---
 lisp/term/ns-win.el |  8 ++++++++
 src/nsfns.m         | 14 ++++++++++++++
 src/nsterm.m        |  7 ++++++-
 3 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/lisp/term/ns-win.el b/lisp/term/ns-win.el
index d512e8e506..7955ae0cb0 100644
--- a/lisp/term/ns-win.el
+++ b/lisp/term/ns-win.el
@@ -144,6 +144,8 @@ global-map
 (define-key global-map [?\s-z] 'undo)
 (define-key global-map [?\s-|] 'shell-command-on-region)
 (define-key global-map [s-kp-bar] 'shell-command-on-region)
+;; The key-chord below is C-s-SPC
+(define-key global-map [C-s-268632064] 'ns-do-show-character-palette)
 ;; (as in Terminal.app)
 (define-key global-map [s-right] 'ns-next-frame)
 (define-key global-map [s-left] 'ns-prev-frame)
@@ -575,6 +577,12 @@ ns-do-emacs-info-panel
   (interactive)
   (ns-emacs-info-panel))
 
+(declare-function ns-show-character-palette "nsfns.m" ())
+
+(defun ns-do-show-character-palette ()
+  (interactive)
+  (ns-show-character-palette))
+
 (defun ns-next-frame ()
   "Switch to next visible frame."
   (interactive)
diff --git a/src/nsfns.m b/src/nsfns.m
index 05605bf657..402771e2f8 100644
--- a/src/nsfns.m
+++ b/src/nsfns.m
@@ -3135,6 +3135,19 @@ The position is returned as a cons cell (X . Y) of the
                            (pt.y - screen.frame.origin.y)));
 }
 
+DEFUN ("ns-show-character-palette",
+       Fns_show_character_palette,
+       Sns_show_character_palette, 0, 0, 0,
+       doc: /* Show the macOS character palette.  */)
+       (void)
+{
+  struct frame *f = SELECTED_FRAME ();
+  EmacsView *view = FRAME_NS_VIEW (f);
+  [NSApp orderFrontCharacterPalette:view];
+
+  return Qnil;
+}
+
 /* ==========================================================================
 
     Class implementations
@@ -3326,6 +3339,7 @@ - (NSString *)panel: (id)sender userEnteredFilename: (NSString *)filename
   defsubr (&Sns_frame_restack);
   defsubr (&Sns_set_mouse_absolute_pixel_position);
   defsubr (&Sns_mouse_absolute_pixel_position);
+  defsubr (&Sns_show_character_palette);
   defsubr (&Sx_display_mm_width);
   defsubr (&Sx_display_mm_height);
   defsubr (&Sx_display_screens);
diff --git a/src/nsterm.m b/src/nsterm.m
index 07ac8f978f..65a9aac4a7 100644
--- a/src/nsterm.m
+++ b/src/nsterm.m
@@ -6284,11 +6284,16 @@ flag set (this is probably a bug in the OS).
 - (void)insertText: (id)aString
 {
   int code;
-  int len = [(NSString *)aString length];
+  int len;
   int i;
 
   NSTRACE ("[EmacsView insertText:]");
 
+  if ([aString isKindOfClass:[NSAttributedString class]])
+      aString = [aString string];
+
+  len = [(NSString *)aString length];
+
   if (NS_KEYLOG)
     NSLog (@"insertText '%@'\tlen = %d", aString, len);
   processingCompose = NO;
-- 
2.14.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* bug#29837: UTF-16 char display problems and the macOS "character palette"
  2017-12-24 16:00 bug#29837: UTF-16 char display problems and the macOS "character palette" Alan Third
@ 2017-12-24 16:56 ` Eli Zaretskii
  2017-12-24 18:23   ` Alan Third
  0 siblings, 1 reply; 9+ messages in thread
From: Eli Zaretskii @ 2017-12-24 16:56 UTC (permalink / raw)
  To: Alan Third; +Cc: 29837

> Date: Sun, 24 Dec 2017 16:00:53 +0000
> From: Alan Third <alan@idiocy.org>
> 
> It’s easy enough to bring it up (patch attached) but some special
> characters are put into Emacs incorrectly. I think the problem is that
> we have multi code‐point UTF‐16 characters, and when they are ‘typed’
> into Emacs they are entered as individual 16 bit code‐points and are
> therefore displayed as a series of blank spaces.
> 
> An example is '🢫' (RIGHTWARDS FRONT-TILTED SHADOWED WHITE ARROW). If I
> enter it using C‐x 8 RET, it appears correctly, but if I use the
> character palette it shows up as two blank spaces. Describe-char
> reveals these to be HIGH SURROGATE-D83E and LOW SURROGATE-DCAB, in
> that order.

You need to tell Emacs that keyboard input is in UTF-16.  Did you try
"C-x RET k"?

> (Another minor irritation is that some characters (like pointing
> hands) seem to insert the desired character then follow up with
> VARIATION SELECTOR-15. I assume this is supposed to tell us what
> colour we want the hand? If so should it be displayed?)

Emacs doesn't yet support variation selectors.  Patches to add that
are welcome (I guess it will need some change in our interface with
font back-ends?).





^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#29837: UTF-16 char display problems and the macOS "character palette"
  2017-12-24 16:56 ` Eli Zaretskii
@ 2017-12-24 18:23   ` Alan Third
  2017-12-24 18:57     ` Eli Zaretskii
  0 siblings, 1 reply; 9+ messages in thread
From: Alan Third @ 2017-12-24 18:23 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 29837

On Sun, Dec 24, 2017 at 06:56:29PM +0200, Eli Zaretskii wrote:
> > An example is '🢫' (RIGHTWARDS FRONT-TILTED SHADOWED WHITE ARROW). If I
> > enter it using C‐x 8 RET, it appears correctly, but if I use the
> > character palette it shows up as two blank spaces. Describe-char
> > reveals these to be HIGH SURROGATE-D83E and LOW SURROGATE-DCAB, in
> > that order.
> 
> You need to tell Emacs that keyboard input is in UTF-16.  Did you try
> "C-x RET k"?

I have now but I can’t find a utf-16 option that is ‘suitable’ for
keyboard input.

-- 
Alan Third





^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#29837: UTF-16 char display problems and the macOS "character palette"
  2017-12-24 18:23   ` Alan Third
@ 2017-12-24 18:57     ` Eli Zaretskii
  2017-12-24 19:28       ` Alan Third
  0 siblings, 1 reply; 9+ messages in thread
From: Eli Zaretskii @ 2017-12-24 18:57 UTC (permalink / raw)
  To: Alan Third; +Cc: 29837

> Date: Sun, 24 Dec 2017 18:23:21 +0000
> From: Alan Third <alan@idiocy.org>
> Cc: 29837@debbugs.gnu.org
> 
> > You need to tell Emacs that keyboard input is in UTF-16.  Did you try
> > "C-x RET k"?
> 
> I have now but I can’t find a utf-16 option that is ‘suitable’ for
> keyboard input.

What do you mean by "option" and by "suitable"?





^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#29837: UTF-16 char display problems and the macOS "character palette"
  2017-12-24 18:57     ` Eli Zaretskii
@ 2017-12-24 19:28       ` Alan Third
  2017-12-24 19:34         ` Eli Zaretskii
  0 siblings, 1 reply; 9+ messages in thread
From: Alan Third @ 2017-12-24 19:28 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 29837

On Sun, Dec 24, 2017 at 08:57:04PM +0200, Eli Zaretskii wrote:
> > Date: Sun, 24 Dec 2017 18:23:21 +0000
> > From: Alan Third <alan@idiocy.org>
> > Cc: 29837@debbugs.gnu.org
> > 
> > > You need to tell Emacs that keyboard input is in UTF-16.  Did you try
> > > "C-x RET k"?
> > 
> > I have now but I can’t find a utf-16 option that is ‘suitable’ for
> > keyboard input.
> 
> What do you mean by "option" and by "suitable"?

If I try to select utf-16 I get this

    set-keyboard-coding-system: Unsuitable coding system for keyboard: utf-16

and I used tab completion to find which other coding systems were
available but all the ones beginning utf-16 that I tried return the
same message.
-- 
Alan Third





^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#29837: UTF-16 char display problems and the macOS "character palette"
  2017-12-24 19:28       ` Alan Third
@ 2017-12-24 19:34         ` Eli Zaretskii
  2017-12-25 20:13           ` Philipp Stephani
  0 siblings, 1 reply; 9+ messages in thread
From: Eli Zaretskii @ 2017-12-24 19:34 UTC (permalink / raw)
  To: Alan Third; +Cc: 29837

> Date: Sun, 24 Dec 2017 19:28:07 +0000
> From: Alan Third <alan@idiocy.org>
> Cc: 29837@debbugs.gnu.org
> 
> If I try to select utf-16 I get this
> 
>     set-keyboard-coding-system: Unsuitable coding system for keyboard: utf-16
> 
> and I used tab completion to find which other coding systems were
> available but all the ones beginning utf-16 that I tried return the
> same message.

Oh, I now recollect that Handa-san said at some point that keyboard
input doesn't support UTF-16...

How do other macOS programs read UTF-16 keyboard input?  Maybe you
could use the same way to read the sequences, and then decode them
internally as UTF-16 using coding.c facilities, and feed them into the
Emacs event queue?  Just a thought.





^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#29837: UTF-16 char display problems and the macOS "character palette"
  2017-12-24 19:34         ` Eli Zaretskii
@ 2017-12-25 20:13           ` Philipp Stephani
  2017-12-25 21:07             ` Philipp Stephani
  2017-12-26  1:34             ` Alan Third
  0 siblings, 2 replies; 9+ messages in thread
From: Philipp Stephani @ 2017-12-25 20:13 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Alan Third, 29837

[-- Attachment #1: Type: text/plain, Size: 1456 bytes --]

Eli Zaretskii <eliz@gnu.org> schrieb am So., 24. Dez. 2017 um 20:35 Uhr:

> > Date: Sun, 24 Dec 2017 19:28:07 +0000
> > From: Alan Third <alan@idiocy.org>
> > Cc: 29837@debbugs.gnu.org
> >
> > If I try to select utf-16 I get this
> >
> >     set-keyboard-coding-system: Unsuitable coding system for keyboard:
> utf-16
> >
> > and I used tab completion to find which other coding systems were
> > available but all the ones beginning utf-16 that I tried return the
> > same message.
>
> Oh, I now recollect that Handa-san said at some point that keyboard
> input doesn't support UTF-16...
>
> How do other macOS programs read UTF-16 keyboard input?  Maybe you
> could use the same way to read the sequences, and then decode them
> internally as UTF-16 using coding.c facilities, and feed them into the
> Emacs event queue?  Just a thought.
>
>
IIUC Emacs receives the input as a single UTF-16 string (in insertText),
then iterates over the UTF-16 code units, converting each into an Emacs
event. That's wrong, no matter whether the input comes from the character
palette or from the keyboard; normal keyboard layouts just happen to not
contain non-BMP characters. The loop needs to account for surrogates.
As a small optimization (which is warranted because the function is
probably called on every keystroke), this should use [NSString
getCharacters:range:] to copy all the UTF-16 code units to a buffer first,
to avoid repeated calls to characterAtIndex.

[-- Attachment #2: Type: text/html, Size: 1955 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#29837: UTF-16 char display problems and the macOS "character palette"
  2017-12-25 20:13           ` Philipp Stephani
@ 2017-12-25 21:07             ` Philipp Stephani
  2017-12-26  1:34             ` Alan Third
  1 sibling, 0 replies; 9+ messages in thread
From: Philipp Stephani @ 2017-12-25 21:07 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Alan Third, 29837

[-- Attachment #1: Type: text/plain, Size: 1158 bytes --]

Philipp Stephani <p.stephani2@gmail.com> schrieb am Mo., 25. Dez. 2017 um
21:13 Uhr:

>
>
> Eli Zaretskii <eliz@gnu.org> schrieb am So., 24. Dez. 2017 um 20:35 Uhr:
>
>> > Date: Sun, 24 Dec 2017 19:28:07 +0000
>> > From: Alan Third <alan@idiocy.org>
>> > Cc: 29837@debbugs.gnu.org
>> >
>> > If I try to select utf-16 I get this
>> >
>> >     set-keyboard-coding-system: Unsuitable coding system for keyboard:
>> utf-16
>> >
>> > and I used tab completion to find which other coding systems were
>> > available but all the ones beginning utf-16 that I tried return the
>> > same message.
>>
>> Oh, I now recollect that Handa-san said at some point that keyboard
>> input doesn't support UTF-16...
>>
>> How do other macOS programs read UTF-16 keyboard input?  Maybe you
>> could use the same way to read the sequences, and then decode them
>> internally as UTF-16 using coding.c facilities, and feed them into the
>> Emacs event queue?  Just a thought.
>>
>>
> IIUC Emacs receives the input as a single UTF-16 string (in insertText) ...
>

On a somewhat related note, insertText: is itself deprecated and should be
replaced with insertText:replacementRange:.

[-- Attachment #2: Type: text/html, Size: 1883 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#29837: UTF-16 char display problems and the macOS "character palette"
  2017-12-25 20:13           ` Philipp Stephani
  2017-12-25 21:07             ` Philipp Stephani
@ 2017-12-26  1:34             ` Alan Third
  1 sibling, 0 replies; 9+ messages in thread
From: Alan Third @ 2017-12-26  1:34 UTC (permalink / raw)
  To: Philipp Stephani; +Cc: 29837

On Mon, Dec 25, 2017 at 08:13:55PM +0000, Philipp Stephani wrote:
> IIUC Emacs receives the input as a single UTF-16 string (in
> insertText), then iterates over the UTF-16 code units, converting
> each into an Emacs event. That's wrong, no matter whether the input
> comes from the character palette or from the keyboard; normal
> keyboard layouts just happen to not contain non-BMP characters. The
> loop needs to account for surrogates.

I finally came to this conclusion myself. I now know a lot more about
UTF‐16 than I did yesterday. :)

Wish I’d looked at my email earlier, though.

> As a small optimization (which is warranted because the function is
> probably called on every keystroke), this should use [NSString
> getCharacters:range:] to copy all the UTF-16 code units to a buffer
> first, to avoid repeated calls to characterAtIndex.

Presumably the vast majority of input will consist of just one code
unit, though?
-- 
Alan Third





^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-12-26  1:34 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-12-24 16:00 bug#29837: UTF-16 char display problems and the macOS "character palette" Alan Third
2017-12-24 16:56 ` Eli Zaretskii
2017-12-24 18:23   ` Alan Third
2017-12-24 18:57     ` Eli Zaretskii
2017-12-24 19:28       ` Alan Third
2017-12-24 19:34         ` Eli Zaretskii
2017-12-25 20:13           ` Philipp Stephani
2017-12-25 21:07             ` Philipp Stephani
2017-12-26  1:34             ` Alan Third

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).