From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Philipp Stephani Newsgroups: gmane.emacs.bugs Subject: bug#29837: UTF-16 char display problems and the macOS "character palette" Date: Mon, 25 Dec 2017 20:13:55 +0000 Message-ID: References: <20171224160053.GA71863@breton.holly.idiocy.org> <83bmiojc8y.fsf@gnu.org> <20171224182321.GA72021@breton.holly.idiocy.org> <834logj6nz.fsf@gnu.org> <20171224192807.GA73590@breton.holly.idiocy.org> <83zi67j4xe.fsf@gnu.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="001a114c563ee1d2b405612fce1d" X-Trace: blaine.gmane.org 1514232799 9764 195.159.176.226 (25 Dec 2017 20:13:19 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Mon, 25 Dec 2017 20:13:19 +0000 (UTC) Cc: Alan Third , 29837@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Mon Dec 25 21:13:14 2017 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eTZ7Z-0002A7-Dk for geb-bug-gnu-emacs@m.gmane.org; Mon, 25 Dec 2017 21:13:13 +0100 Original-Received: from localhost ([::1]:33552 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eTZ9Y-00027k-4C for geb-bug-gnu-emacs@m.gmane.org; Mon, 25 Dec 2017 15:15:16 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:52020) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eTZ9O-00026v-O5 for bug-gnu-emacs@gnu.org; Mon, 25 Dec 2017 15:15:08 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eTZ9K-0002wI-Ol for bug-gnu-emacs@gnu.org; Mon, 25 Dec 2017 15:15:06 -0500 Original-Received: from debbugs.gnu.org ([208.118.235.43]:43422) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eTZ9K-0002w1-Ll for bug-gnu-emacs@gnu.org; Mon, 25 Dec 2017 15:15:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1eTZ9K-0002zz-D3 for bug-gnu-emacs@gnu.org; Mon, 25 Dec 2017 15:15:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Philipp Stephani Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 25 Dec 2017 20:15:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 29837 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 29837-submit@debbugs.gnu.org id=B29837.151423285311454 (code B ref 29837); Mon, 25 Dec 2017 20:15:02 +0000 Original-Received: (at 29837) by debbugs.gnu.org; 25 Dec 2017 20:14:13 +0000 Original-Received: from localhost ([127.0.0.1]:52103 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eTZ8W-0002yg-OJ for submit@debbugs.gnu.org; Mon, 25 Dec 2017 15:14:12 -0500 Original-Received: from mail-qk0-f181.google.com ([209.85.220.181]:46385) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eTZ8V-0002yU-Ex for 29837@debbugs.gnu.org; Mon, 25 Dec 2017 15:14:11 -0500 Original-Received: by mail-qk0-f181.google.com with SMTP id b132so11909419qkc.13 for <29837@debbugs.gnu.org>; Mon, 25 Dec 2017 12:14:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=M0qatvkmks6KiU+Z1URsHz/TxE+W/eToVJEa5Tg2xLc=; b=ugYQsuOvJbBH/O2MNesblbjWElWEkMupHRrJWyM7tWjOnXIZ71s1QgLw+YygxpUqt8 wiD2uQxAtuGzOwCBEtWMZWKOUaql28qOt9BSpJsnGzZ+Z8+MsMeJnzv5mnGidx/x6CFC vCTFd9zWp9BtcIyoPav5kqUtPVdWvclpVQASPsCQEAzGNGBE2bUuKHHrWYsFZCCwTsUu K1BhI1xAyQHC9lJ/paPrF52vn5PSHoZtpN28xxzasmaAvvrehnC9UhNHoamwjJOUwWGD ANMKxq2A6lbAwbt9br9Xuv5dn062xoBXi+4uNahVb1dTBTaLCsOZB+keKhyseKASrZwH O9PQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=M0qatvkmks6KiU+Z1URsHz/TxE+W/eToVJEa5Tg2xLc=; b=pzuArMdfctJCW6LfbvTQM0trloPzY5cCLU9QpC5cW7RZ7lYQ/vH6g9P/W/B4iE9HF2 QApFc1pINcPxBq3wJHK4pCqc4pL7SIuqI81H+dVkKPCQ8AAp2irJfi8MkDK0SCdeplLR wmfAK7BZ/EqEgoK+uWgAlSXT8ISl0Rz3nYdNAxl3Lm6jtuleosx06qoB1922G2WZsYvO vOmqk6RpJFu9LQfNdGZDl4qRThV8hu4xSxHj+rVzLslauTwp9rWqTC/MgPh3obXfqo0x Gb5Y/5+SFLNnVPP7s5qJ2zVv6mcbrrrRhNle9RR+uX7/1djqlwrmDCBlwyFttB/OsGEl C1XA== X-Gm-Message-State: AKGB3mIjGLqdYi+g6PWdznkJbjsE6/JuZQ02140slJE6NNpsIWf5ysy6 z9RWzDUVUjJlIRt6KNkDbFa6yDDWaquNy0reeQ0= X-Google-Smtp-Source: ACJfBouUrOIBYZsMa8ChmwJ0JIoCuzI5D3C4mybhKkch04vBbjYXynnpnOgupqH5vvoRZ5TXl9GnvdmIfDv34KAuMsw= X-Received: by 10.55.10.7 with SMTP id 7mr30390475qkk.198.1514232845815; Mon, 25 Dec 2017 12:14:05 -0800 (PST) In-Reply-To: <83zi67j4xe.fsf@gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:141501 Archived-At: --001a114c563ee1d2b405612fce1d Content-Type: text/plain; charset="UTF-8" Eli Zaretskii schrieb am So., 24. Dez. 2017 um 20:35 Uhr: > > Date: Sun, 24 Dec 2017 19:28:07 +0000 > > From: Alan Third > > Cc: 29837@debbugs.gnu.org > > > > If I try to select utf-16 I get this > > > > set-keyboard-coding-system: Unsuitable coding system for keyboard: > utf-16 > > > > and I used tab completion to find which other coding systems were > > available but all the ones beginning utf-16 that I tried return the > > same message. > > Oh, I now recollect that Handa-san said at some point that keyboard > input doesn't support UTF-16... > > How do other macOS programs read UTF-16 keyboard input? Maybe you > could use the same way to read the sequences, and then decode them > internally as UTF-16 using coding.c facilities, and feed them into the > Emacs event queue? Just a thought. > > IIUC Emacs receives the input as a single UTF-16 string (in insertText), then iterates over the UTF-16 code units, converting each into an Emacs event. That's wrong, no matter whether the input comes from the character palette or from the keyboard; normal keyboard layouts just happen to not contain non-BMP characters. The loop needs to account for surrogates. As a small optimization (which is warranted because the function is probably called on every keystroke), this should use [NSString getCharacters:range:] to copy all the UTF-16 code units to a buffer first, to avoid repeated calls to characterAtIndex. --001a114c563ee1d2b405612fce1d Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


Eli Za= retskii <eliz@gnu.org> schrieb am= So., 24. Dez. 2017 um 20:35=C2=A0Uhr:
> Date: Sun, 24 Dec 2017 19:28:07 +0000
> From: Alan Third <alan@idiocy.org>
> Cc: 29837@d= ebbugs.gnu.org
>
> If I try to select utf-16 I get this
>
>=C2=A0 =C2=A0 =C2=A0set-keyboard-coding-system: Unsuitable coding syste= m for keyboard: utf-16
>
> and I used tab completion to find which other coding systems were
> available but all the ones beginning utf-16 that I tried return the > same message.

Oh, I now recollect that Handa-san said at some point that keyboard
input doesn't support UTF-16...

How do other macOS programs read UTF-16 keyboard input?=C2=A0 Maybe you
could use the same way to read the sequences, and then decode them
internally as UTF-16 using coding.c facilities, and feed them into the
Emacs event queue?=C2=A0 Just a thought.


IIUC Emacs receives the input as a single = UTF-16 string (in insertText), then iterates over the UTF-16 code units, co= nverting each into an Emacs event. That's wrong, no matter whether the = input comes from the character palette or from the keyboard; normal keyboar= d layouts just happen to not contain non-BMP characters. The loop needs to = account for surrogates.
As a small optimization (which is warrant= ed because the function is probably called on every keystroke), this should= use [NSString getCharacters:range:] to copy all the UTF-16 code units to a= buffer first, to avoid repeated calls to characterAtIndex.
--001a114c563ee1d2b405612fce1d--