From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!.POSTED!not-for-mail
From: Philipp Stephani <p.stephani2@gmail.com>
Newsgroups: gmane.emacs.devel
Subject: Re: [PATCH] Allow inserting non-BMP characters
Date: Tue, 26 Dec 2017 10:35:42 +0000
Message-ID: <CAArVCkSMeQcjxz0CCsjaOU55e7g=AwsE+dU9LDCajye6JzujeA@mail.gmail.com>
References: <CAArVCkRx8p_vaFKJ_kXRuoZCKVBSYr=94RJANGpU0NXvkEZv6A@mail.gmail.com>
	<20171225210115.13789-1-phst@google.com> <83d132hz9e.fsf@gnu.org>
NNTP-Posting-Host: blaine.gmane.org
Mime-Version: 1.0
Content-Type: multipart/alternative; boundary="001a114c563ee4f20d05613bd8bd"
X-Trace: blaine.gmane.org 1514284449 11770 195.159.176.226 (26 Dec 2017 10:34:09 GMT)
X-Complaints-To: usenet@blaine.gmane.org
NNTP-Posting-Date: Tue, 26 Dec 2017 10:34:09 +0000 (UTC)
Cc: phst@google.com, emacs-devel@gnu.org
To: Eli Zaretskii <eliz@gnu.org>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Dec 26 11:34:05 2017
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by blaine.gmane.org with esmtp (Exim 4.84_2)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1eTmYd-0002gy-7j
	for ged-emacs-devel@m.gmane.org; Tue, 26 Dec 2017 11:34:03 +0100
Original-Received: from localhost ([::1]:35382 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1eTmab-0003tF-WC
	for ged-emacs-devel@m.gmane.org; Tue, 26 Dec 2017 05:36:06 -0500
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:52850)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <p.stephani2@gmail.com>) id 1eTmaU-0003sM-9i
	for emacs-devel@gnu.org; Tue, 26 Dec 2017 05:35:59 -0500
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <p.stephani2@gmail.com>) id 1eTmaT-0002LK-3c
	for emacs-devel@gnu.org; Tue, 26 Dec 2017 05:35:58 -0500
Original-Received: from mail-qk0-x230.google.com ([2607:f8b0:400d:c09::230]:44019)
	by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)
	(Exim 4.71) (envelope-from <p.stephani2@gmail.com>)
	id 1eTmaQ-0002J1-A8; Tue, 26 Dec 2017 05:35:54 -0500
Original-Received: by mail-qk0-x230.google.com with SMTP id j137so25256709qke.10;
	Tue, 26 Dec 2017 02:35:54 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
	h=mime-version:references:in-reply-to:from:date:message-id:subject:to
	:cc; bh=+ruULkZNMs6um3BPQJwKq1gZKwy3aUonifZ6J8j+ydQ=;
	b=TnxtkiTG+u/aKZ/vlgjRKG4mNTNO5DZv3zBlaWlEUeTNtZRR6wJcHcP24jJfzTXmDZ
	kZoXb/zJ6z2tnkyiljh9ewfQVw6aQULxVMtsPm/p/v06IrNP0tfHFhywLbx9BvGzi9Y+
	vKnbxiDgirgJRD00itRVQ+4lRPpokMgpsZTLgj2QX3wQF3zncZCpTdwqy+x5HB2QQ8VP
	USy0APbExQI5tV5Ogg8nIUxAE+gAquRUepyaqBtBSXR30Xx/4BxCWr0UAS+OOfGETZMu
	ZeAmXor6OgdnzYQUWoVEvSc3vb6+lLsQfwo+EDtjqPrsHEiwX0rXIo27ogW6KBRsf3p6
	dYrQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:mime-version:references:in-reply-to:from:date
	:message-id:subject:to:cc;
	bh=+ruULkZNMs6um3BPQJwKq1gZKwy3aUonifZ6J8j+ydQ=;
	b=hIWB2I2cVjxE5otecv9nE7wJsMcaS5zNGXRNquxrvJ6SQj2MG3bRMgfGvfhlfhCSJ+
	Y6lIabtJ18SmJKxcVUYhfPgOLZRwsrmVc93/gzgcwqyKCtVC25vAqhQTTrsveLBh6kA+
	kRiB1WVdE+RAq0vR4U6N9Wh9qVQ1RlVD7lKrr1xmoJpGHhpB/un49xdHpwNZgqGHpud1
	AW5+FaQbAzffIQZ7HJ7ztknHa6jd645jD1wA3Fwn89wJ/WduYjYYMC6iBkNp+RmytRuF
	H6zpJTG9laOhMj2a4c7cQR1YrRXZQdipZ6MOhPti9vR0c5J2a+8/VxttcNQMij7p3hbA
	myGA==
X-Gm-Message-State: AKGB3mKgR5iga6JcGJ3FxF0LfRux04uV3ZOFk46DJIxYRQqPgM7+DBDm
	90QuCZo8QB+yxzWH9cdbCd8StIi6oHQ7lG4NCTxgrA==
X-Google-Smtp-Source: ACJfBovGqkQ2S8fKeWUYk36mpHQglnBKB4ghUSgib0pnr9p9hzsa9Jr5NG6RjtXpMgD8cJh4Z/dguZUl8wxGXXdJCRo=
X-Received: by 10.55.10.7 with SMTP id 7mr32076790qkk.198.1514284553399; Tue,
	26 Dec 2017 02:35:53 -0800 (PST)
In-Reply-To: <83d132hz9e.fsf@gnu.org>
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
	recognized.
X-Received-From: 2607:f8b0:400d:c09::230
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel/>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Original-Sender: "Emacs-devel" <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Xref: news.gmane.org gmane.emacs.devel:221417
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/221417>

--001a114c563ee4f20d05613bd8bd
Content-Type: text/plain; charset="UTF-8"

Eli Zaretskii <eliz@gnu.org> schrieb am Di., 26. Dez. 2017 um 05:46 Uhr:

> > From: Philipp Stephani <p.stephani2@gmail.com>
> > Date: Mon, 25 Dec 2017 22:01:15 +0100
> > Cc: Philipp Stephani <phst@google.com>
> >
> > +/* Return the Unicode code point for the given UTF-16 surrogates.  */
> > +
> > +INLINE int
> > +surrogates_to_codepoint (int low, int high)
> > +{
> > +  eassert (char_low_surrogate_p (low));
> > +  eassert (char_high_surrogate_p (high));
> > +  return 0x10000 + (low - 0xDC00) + ((high - 0xD800) * 0x400);
> > +}
> > +
> >  /* Data type for Unicode general category.
>
> Suggest to move surrogates_to_codepoint to coding.c, and then use the
> macros UTF_16_HIGH_SURROGATE_P and UTF_16_LOW_SURROGATE_P defined
> there.


Hmm, I'd rather go the other way round and remove these macros later. They
are macros, thus worse than functions, and don't seem to be correct either
(what about a value such as 0x11DC00?).


>   Also, a single-liner sounds like too little to justify a
> function, so maybe make all of that macros in coding.h, and include
> the latter in nsterm.m.
>

No new macros please if we can avoid it. Functions are strictly better.
I don't care much whether they are in character.h or coding.h, but
char_surrogate_p is already in character.h.


>
> > +  USE_SAFE_ALLOCA;
> > +  unichar *utf16_buffer;
> > +  SAFE_NALLOCA (utf16_buffer, 1, len);
>
> Maximum length of a UTF-16 sequence is known in advance, so why do you
> need SAFE_NALLOCA here?  Couldn't you use a buffer of fixed length
> instead?
>
>
The text being inserted can be arbitrarily long. Even single characters
(i.e. extended grapheme clusters) can be arbitrarily long.

--001a114c563ee4f20d05613bd8bd
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><br><br><div class=3D"gmail_quote"><div dir=3D"ltr">Eli Za=
retskii &lt;<a href=3D"mailto:eliz@gnu.org">eliz@gnu.org</a>&gt; schrieb am=
 Di., 26. Dez. 2017 um 05:46=C2=A0Uhr:<br></div><blockquote class=3D"gmail_=
quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1=
ex">&gt; From: Philipp Stephani &lt;<a href=3D"mailto:p.stephani2@gmail.com=
" target=3D"_blank">p.stephani2@gmail.com</a>&gt;<br>
&gt; Date: Mon, 25 Dec 2017 22:01:15 +0100<br>
&gt; Cc: Philipp Stephani &lt;<a href=3D"mailto:phst@google.com" target=3D"=
_blank">phst@google.com</a>&gt;<br>
&gt;<br>
&gt; +/* Return the Unicode code point for the given UTF-16 surrogates.=C2=
=A0 */<br>
&gt; +<br>
&gt; +INLINE int<br>
&gt; +surrogates_to_codepoint (int low, int high)<br>
&gt; +{<br>
&gt; +=C2=A0 eassert (char_low_surrogate_p (low));<br>
&gt; +=C2=A0 eassert (char_high_surrogate_p (high));<br>
&gt; +=C2=A0 return 0x10000 + (low - 0xDC00) + ((high - 0xD800) * 0x400);<b=
r>
&gt; +}<br>
&gt; +<br>
&gt;=C2=A0 /* Data type for Unicode general category.<br>
<br>
Suggest to move surrogates_to_codepoint to coding.c, and then use the<br>
macros UTF_16_HIGH_SURROGATE_P and UTF_16_LOW_SURROGATE_P defined<br>
there.</blockquote><div><br></div><div>Hmm, I&#39;d rather go the other way=
 round and remove these macros later. They are macros, thus worse than func=
tions, and don&#39;t seem to be correct either (what about a value such as =
0x11DC00?).</div><div>=C2=A0</div><blockquote class=3D"gmail_quote" style=
=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">=C2=A0 A=
lso, a single-liner sounds like too little to justify a<br>
function, so maybe make all of that macros in coding.h, and include<br>
the latter in nsterm.m.<br></blockquote><div><br></div><div>No new macros p=
lease if we can avoid it. Functions are strictly better.</div><div>I don=
9;t care much whether they are in character.h or coding.h, but char_surroga=
te_p is already in character.h.</div><div>=C2=A0</div><blockquote class=3D"=
gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-=
left:1ex">
<br>
&gt; +=C2=A0 USE_SAFE_ALLOCA;<br>
&gt; +=C2=A0 unichar *utf16_buffer;<br>
&gt; +=C2=A0 SAFE_NALLOCA (utf16_buffer, 1, len);<br>
<br>
Maximum length of a UTF-16 sequence is known in advance, so why do you<br>
need SAFE_NALLOCA here?=C2=A0 Couldn&#39;t you use a buffer of fixed length=
<br>
instead?<br>
<br></blockquote><div><br></div><div>The text being inserted can be arbitra=
rily long. Even single characters (i.e. extended grapheme clusters) can be =
arbitrarily long.</div></div></div>

--001a114c563ee4f20d05613bd8bd--