From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!.POSTED!not-for-mail
From: Eli Zaretskii <eliz@gnu.org>
Newsgroups: gmane.emacs.devel
Subject: Re: [PATCH] Allow inserting non-BMP characters
Date: Tue, 26 Dec 2017 18:11:18 +0200
Message-ID: <834lodii55.fsf@gnu.org>
References: <CAArVCkRx8p_vaFKJ_kXRuoZCKVBSYr=94RJANGpU0NXvkEZv6A@mail.gmail.com>
	<20171225210115.13789-1-phst@google.com> <83d132hz9e.fsf@gnu.org>
	<CAArVCkSMeQcjxz0CCsjaOU55e7g=AwsE+dU9LDCajye6JzujeA@mail.gmail.com>
Reply-To: Eli Zaretskii <eliz@gnu.org>
NNTP-Posting-Host: blaine.gmane.org
X-Trace: blaine.gmane.org 1514304567 1938 195.159.176.226 (26 Dec 2017 16:09:27 GMT)
X-Complaints-To: usenet@blaine.gmane.org
NNTP-Posting-Date: Tue, 26 Dec 2017 16:09:27 +0000 (UTC)
Cc: phst@google.com, emacs-devel@gnu.org
To: Philipp Stephani <p.stephani2@gmail.com>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Dec 26 17:09:23 2017
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by blaine.gmane.org with esmtp (Exim 4.84_2)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1eTrn7-0008MI-Mb
	for ged-emacs-devel@m.gmane.org; Tue, 26 Dec 2017 17:09:21 +0100
Original-Received: from localhost ([::1]:47117 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1eTrp3-00035R-87
	for ged-emacs-devel@m.gmane.org; Tue, 26 Dec 2017 11:11:21 -0500
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:54659)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <eliz@gnu.org>) id 1eTrow-000353-Vz
	for emacs-devel@gnu.org; Tue, 26 Dec 2017 11:11:15 -0500
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <eliz@gnu.org>) id 1eTrow-0005Vk-2g
	for emacs-devel@gnu.org; Tue, 26 Dec 2017 11:11:14 -0500
Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:54611)
	by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <eliz@gnu.org>)
	id 1eTroq-0005Rr-Gr; Tue, 26 Dec 2017 11:11:08 -0500
Original-Received: from [176.228.60.248] (port=3660 helo=home-c4e4a596f7)
	by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256)
	(Exim 4.82) (envelope-from <eliz@gnu.org>)
	id 1eTrop-0000cC-Rk; Tue, 26 Dec 2017 11:11:08 -0500
In-reply-to: <CAArVCkSMeQcjxz0CCsjaOU55e7g=AwsE+dU9LDCajye6JzujeA@mail.gmail.com>
	(message from Philipp Stephani on Tue, 26 Dec 2017 10:35:42 +0000)
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-Received-From: 2001:4830:134:3::e
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel/>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Original-Sender: "Emacs-devel" <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Xref: news.gmane.org gmane.emacs.devel:221418
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/221418>

> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Tue, 26 Dec 2017 10:35:42 +0000
> Cc: emacs-devel@gnu.org, phst@google.com
> 
>  Suggest to move surrogates_to_codepoint to coding.c, and then use the
>  macros UTF_16_HIGH_SURROGATE_P and UTF_16_LOW_SURROGATE_P defined
>  there.
> 
> Hmm, I'd rather go the other way round and remove these macros later. They are macros, thus worse than
> functions,

I don't think we have a policy to prefer inline functions to macros,
and I don't think we should have such a policy.  We use inline
functions when that's necessary, but we don't in general prefer them.
They have their own problems, see the comments in lisp.h for some of
that.

> and don't seem to be correct either (what about a value such as 0x11DC00?).

??? They care correct for UTF-16 sequences, which are 16-bit numbers.
If you need to augment them by testing the high-order bits to be zero
in your case, that's okay, but I don't see any need for introducing
similar but different functionality.

> No new macros please if we can avoid it. Functions are strictly better.

Sorry, I disagree.  Each has its advantages, and on balance I find
macros to be slightly better, certainly not worse.  There's no need to
avoid them in C.

> I don't care much whether they are in character.h or coding.h, but char_surrogate_p is already in character.h.

char_surrogate_p should have used the coding.h macros as well.

>  > +  USE_SAFE_ALLOCA;
>  > +  unichar *utf16_buffer;
>  > +  SAFE_NALLOCA (utf16_buffer, 1, len);
> 
>  Maximum length of a UTF-16 sequence is known in advance, so why do you
>  need SAFE_NALLOCA here?  Couldn't you use a buffer of fixed length
>  instead?
> 
> The text being inserted can be arbitrarily long. Even single characters (i.e. extended grapheme clusters) can
> be arbitrarily long.

Yes, but why do you first copy the input into a separate buffer?  Why
not convert each UTF-16 sequence separately, as you go through the
loop?