From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Alan Third Newsgroups: gmane.emacs.bugs Subject: bug#29837: UTF-16 char display problems and the macOS "character palette" Date: Tue, 26 Dec 2017 01:34:23 +0000 Message-ID: <20171226013423.GB79310@breton.holly.idiocy.org> References: <20171224160053.GA71863@breton.holly.idiocy.org> <83bmiojc8y.fsf@gnu.org> <20171224182321.GA72021@breton.holly.idiocy.org> <834logj6nz.fsf@gnu.org> <20171224192807.GA73590@breton.holly.idiocy.org> <83zi67j4xe.fsf@gnu.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: blaine.gmane.org 1514251998 11858 195.159.176.226 (26 Dec 2017 01:33:18 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Tue, 26 Dec 2017 01:33:18 +0000 (UTC) User-Agent: Mutt/1.9.1 (2017-09-22) Cc: 29837@debbugs.gnu.org To: Philipp Stephani Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Tue Dec 26 02:33:14 2017 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eTe7E-0002my-P0 for geb-bug-gnu-emacs@m.gmane.org; Tue, 26 Dec 2017 02:33:12 +0100 Original-Received: from localhost ([::1]:45277 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eTe9D-0007KZ-CR for geb-bug-gnu-emacs@m.gmane.org; Mon, 25 Dec 2017 20:35:15 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:46825) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eTe94-0007Jw-5h for bug-gnu-emacs@gnu.org; Mon, 25 Dec 2017 20:35:07 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eTe91-0005fc-08 for bug-gnu-emacs@gnu.org; Mon, 25 Dec 2017 20:35:06 -0500 Original-Received: from debbugs.gnu.org ([208.118.235.43]:43575) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eTe90-0005fH-SR for bug-gnu-emacs@gnu.org; Mon, 25 Dec 2017 20:35:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1eTe90-0002BP-Iu for bug-gnu-emacs@gnu.org; Mon, 25 Dec 2017 20:35:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Alan Third Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Tue, 26 Dec 2017 01:35:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 29837 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 29837-submit@debbugs.gnu.org id=B29837.15142520768350 (code B ref 29837); Tue, 26 Dec 2017 01:35:02 +0000 Original-Received: (at 29837) by debbugs.gnu.org; 26 Dec 2017 01:34:36 +0000 Original-Received: from localhost ([127.0.0.1]:52256 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eTe8a-0002Ab-GE for submit@debbugs.gnu.org; Mon, 25 Dec 2017 20:34:36 -0500 Original-Received: from mail-wr0-f169.google.com ([209.85.128.169]:36330) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eTe8W-0002AM-Am for 29837@debbugs.gnu.org; Mon, 25 Dec 2017 20:34:32 -0500 Original-Received: by mail-wr0-f169.google.com with SMTP id u19so27295802wrc.3 for <29837@debbugs.gnu.org>; Mon, 25 Dec 2017 17:34:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to :user-agent; bh=XnaGZNsl1pPr4ryRYK8WRxVzjigNs5Eh3lHN2jZep9o=; b=SdlDMOf2Rm1aElKI3X1jZ1yFhAAqzFah+lhWZmjjumZfkDTDmvLoJ6YLxt0Tc2GhV3 vSzvJy1TSptwxw8cPMul+H9jgyFK94Z5RWjSw1L2k5+Vs5JMf8BmptYq7sbGnvQVVd1W Dl1mwOmC9mYGXSHsFNCEl7goCPr9vvLYp0FVBhx7b/c1rKYvV55+aOG0EI2Jw1YWXDmF 9KJFafStzF4M3nhKDl2Z+6xTWEfOvTJ5dj4adPTubuu6R5uz9YMi18MQiOcsj/NVs5WJ K7SGyDx1XVonhrpEaKn44IKb1q+jioXPYXncwWh/z0s3SsO4LNlDuw9B/bgaVJnTE02m 4pAg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition :content-transfer-encoding:in-reply-to:user-agent; bh=XnaGZNsl1pPr4ryRYK8WRxVzjigNs5Eh3lHN2jZep9o=; b=DxI5ZkctO8MjoQGjymztSwsGMX0lEXJhdaaLCTuumZWa84Ebp26A13p9Z/mqAaMK9c THfPeDBnXqV6rFK6jSAUiLo6foQcCWZMSojhWWmWRMUsGWq5/6L949I04lfInxw3FKqe MliXxT+1qv3W54CeCLUtwJz9uPXCF3d8lIwxDmP7QlN9LzqhVwKEt0kRPqmpRFDAfiv6 hHKEYk8h+XZOLyvby4cGQTfaoHl4ZjjjHH8nY1J+dneAjDPlKSHgVC3CwCNZmifZvcZv lgFvb4huWj0FuRMGW+WSVuFo2lRG91tUTWklkuybWS4Rh+jzEX4khpbBj/xL/SDEODnW yGOA== X-Gm-Message-State: AKGB3mKqknnllPMQXLvJ/orAom34wheC5CbG+QVvoDIl58n7ANqJTQNv mEShg8CzrDX2nK4vyph4VIk= X-Google-Smtp-Source: ACJfBoutKjyXjPd3jFiQhSaP3JXfIGhY3IJPmlmhAtaqiFg1ZnfOSEr5x34uRz/kiZU/ld1Zt2L8gQ== X-Received: by 10.223.153.72 with SMTP id x66mr25925665wrb.209.1514252066469; Mon, 25 Dec 2017 17:34:26 -0800 (PST) Original-Received: from breton.holly.idiocy.org (ip6-2001-08b0-03f8-8129-e50b-ef10-9192-e044.holly.idiocy.org. [2001:8b0:3f8:8129:e50b:ef10:9192:e044]) by smtp.gmail.com with ESMTPSA id k25sm38070088wrk.11.2017.12.25.17.34.25 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 25 Dec 2017 17:34:25 -0800 (PST) Content-Disposition: inline In-Reply-To: X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:141505 Archived-At: On Mon, Dec 25, 2017 at 08:13:55PM +0000, Philipp Stephani wrote: > IIUC Emacs receives the input as a single UTF-16 string (in > insertText), then iterates over the UTF-16 code units, converting > each into an Emacs event. That's wrong, no matter whether the input > comes from the character palette or from the keyboard; normal > keyboard layouts just happen to not contain non-BMP characters. The > loop needs to account for surrogates. I finally came to this conclusion myself. I now know a lot more about UTF‐16 than I did yesterday. :) Wish I’d looked at my email earlier, though. > As a small optimization (which is warranted because the function is > probably called on every keystroke), this should use [NSString > getCharacters:range:] to copy all the UTF-16 code units to a buffer > first, to avoid repeated calls to characterAtIndex. Presumably the vast majority of input will consist of just one code unit, though? -- Alan Third