From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Philipp Stephani Newsgroups: gmane.emacs.devel Subject: Re: [PATCH] Allow inserting non-BMP characters Date: Tue, 26 Dec 2017 10:35:42 +0000 Message-ID: References: <20171225210115.13789-1-phst@google.com> <83d132hz9e.fsf@gnu.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="001a114c563ee4f20d05613bd8bd" X-Trace: blaine.gmane.org 1514284449 11770 195.159.176.226 (26 Dec 2017 10:34:09 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Tue, 26 Dec 2017 10:34:09 +0000 (UTC) Cc: phst@google.com, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Dec 26 11:34:05 2017 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eTmYd-0002gy-7j for ged-emacs-devel@m.gmane.org; Tue, 26 Dec 2017 11:34:03 +0100 Original-Received: from localhost ([::1]:35382 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eTmab-0003tF-WC for ged-emacs-devel@m.gmane.org; Tue, 26 Dec 2017 05:36:06 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:52850) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eTmaU-0003sM-9i for emacs-devel@gnu.org; Tue, 26 Dec 2017 05:35:59 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eTmaT-0002LK-3c for emacs-devel@gnu.org; Tue, 26 Dec 2017 05:35:58 -0500 Original-Received: from mail-qk0-x230.google.com ([2607:f8b0:400d:c09::230]:44019) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eTmaQ-0002J1-A8; Tue, 26 Dec 2017 05:35:54 -0500 Original-Received: by mail-qk0-x230.google.com with SMTP id j137so25256709qke.10; Tue, 26 Dec 2017 02:35:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=+ruULkZNMs6um3BPQJwKq1gZKwy3aUonifZ6J8j+ydQ=; b=TnxtkiTG+u/aKZ/vlgjRKG4mNTNO5DZv3zBlaWlEUeTNtZRR6wJcHcP24jJfzTXmDZ kZoXb/zJ6z2tnkyiljh9ewfQVw6aQULxVMtsPm/p/v06IrNP0tfHFhywLbx9BvGzi9Y+ vKnbxiDgirgJRD00itRVQ+4lRPpokMgpsZTLgj2QX3wQF3zncZCpTdwqy+x5HB2QQ8VP USy0APbExQI5tV5Ogg8nIUxAE+gAquRUepyaqBtBSXR30Xx/4BxCWr0UAS+OOfGETZMu ZeAmXor6OgdnzYQUWoVEvSc3vb6+lLsQfwo+EDtjqPrsHEiwX0rXIo27ogW6KBRsf3p6 dYrQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=+ruULkZNMs6um3BPQJwKq1gZKwy3aUonifZ6J8j+ydQ=; b=hIWB2I2cVjxE5otecv9nE7wJsMcaS5zNGXRNquxrvJ6SQj2MG3bRMgfGvfhlfhCSJ+ Y6lIabtJ18SmJKxcVUYhfPgOLZRwsrmVc93/gzgcwqyKCtVC25vAqhQTTrsveLBh6kA+ kRiB1WVdE+RAq0vR4U6N9Wh9qVQ1RlVD7lKrr1xmoJpGHhpB/un49xdHpwNZgqGHpud1 AW5+FaQbAzffIQZ7HJ7ztknHa6jd645jD1wA3Fwn89wJ/WduYjYYMC6iBkNp+RmytRuF H6zpJTG9laOhMj2a4c7cQR1YrRXZQdipZ6MOhPti9vR0c5J2a+8/VxttcNQMij7p3hbA myGA== X-Gm-Message-State: AKGB3mKgR5iga6JcGJ3FxF0LfRux04uV3ZOFk46DJIxYRQqPgM7+DBDm 90QuCZo8QB+yxzWH9cdbCd8StIi6oHQ7lG4NCTxgrA== X-Google-Smtp-Source: ACJfBovGqkQ2S8fKeWUYk36mpHQglnBKB4ghUSgib0pnr9p9hzsa9Jr5NG6RjtXpMgD8cJh4Z/dguZUl8wxGXXdJCRo= X-Received: by 10.55.10.7 with SMTP id 7mr32076790qkk.198.1514284553399; Tue, 26 Dec 2017 02:35:53 -0800 (PST) In-Reply-To: <83d132hz9e.fsf@gnu.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400d:c09::230 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:221417 Archived-At: --001a114c563ee4f20d05613bd8bd Content-Type: text/plain; charset="UTF-8" Eli Zaretskii schrieb am Di., 26. Dez. 2017 um 05:46 Uhr: > > From: Philipp Stephani > > Date: Mon, 25 Dec 2017 22:01:15 +0100 > > Cc: Philipp Stephani > > > > +/* Return the Unicode code point for the given UTF-16 surrogates. */ > > + > > +INLINE int > > +surrogates_to_codepoint (int low, int high) > > +{ > > + eassert (char_low_surrogate_p (low)); > > + eassert (char_high_surrogate_p (high)); > > + return 0x10000 + (low - 0xDC00) + ((high - 0xD800) * 0x400); > > +} > > + > > /* Data type for Unicode general category. > > Suggest to move surrogates_to_codepoint to coding.c, and then use the > macros UTF_16_HIGH_SURROGATE_P and UTF_16_LOW_SURROGATE_P defined > there. Hmm, I'd rather go the other way round and remove these macros later. They are macros, thus worse than functions, and don't seem to be correct either (what about a value such as 0x11DC00?). > Also, a single-liner sounds like too little to justify a > function, so maybe make all of that macros in coding.h, and include > the latter in nsterm.m. > No new macros please if we can avoid it. Functions are strictly better. I don't care much whether they are in character.h or coding.h, but char_surrogate_p is already in character.h. > > > + USE_SAFE_ALLOCA; > > + unichar *utf16_buffer; > > + SAFE_NALLOCA (utf16_buffer, 1, len); > > Maximum length of a UTF-16 sequence is known in advance, so why do you > need SAFE_NALLOCA here? Couldn't you use a buffer of fixed length > instead? > > The text being inserted can be arbitrarily long. Even single characters (i.e. extended grapheme clusters) can be arbitrarily long. --001a114c563ee4f20d05613bd8bd Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


Eli Za= retskii <eliz@gnu.org> schrieb am= Di., 26. Dez. 2017 um 05:46=C2=A0Uhr:
> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Mon, 25 Dec 2017 22:01:15 +0100
> Cc: Philipp Stephani <phst@google.com>
>
> +/* Return the Unicode code point for the given UTF-16 surrogates.=C2= =A0 */
> +
> +INLINE int
> +surrogates_to_codepoint (int low, int high)
> +{
> +=C2=A0 eassert (char_low_surrogate_p (low));
> +=C2=A0 eassert (char_high_surrogate_p (high));
> +=C2=A0 return 0x10000 + (low - 0xDC00) + ((high - 0xD800) * 0x400); > +}
> +
>=C2=A0 /* Data type for Unicode general category.

Suggest to move surrogates_to_codepoint to coding.c, and then use the
macros UTF_16_HIGH_SURROGATE_P and UTF_16_LOW_SURROGATE_P defined
there.

Hmm, I'd rather go the other way= round and remove these macros later. They are macros, thus worse than func= tions, and don't seem to be correct either (what about a value such as = 0x11DC00?).
=C2=A0
=C2=A0 A= lso, a single-liner sounds like too little to justify a
function, so maybe make all of that macros in coding.h, and include
the latter in nsterm.m.

No new macros p= lease if we can avoid it. Functions are strictly better.
I don= 9;t care much whether they are in character.h or coding.h, but char_surroga= te_p is already in character.h.
=C2=A0

> +=C2=A0 USE_SAFE_ALLOCA;
> +=C2=A0 unichar *utf16_buffer;
> +=C2=A0 SAFE_NALLOCA (utf16_buffer, 1, len);

Maximum length of a UTF-16 sequence is known in advance, so why do you
need SAFE_NALLOCA here?=C2=A0 Couldn't you use a buffer of fixed length=
instead?


The text being inserted can be arbitra= rily long. Even single characters (i.e. extended grapheme clusters) can be = arbitrarily long.
--001a114c563ee4f20d05613bd8bd--