From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: [PATCH] Allow inserting non-BMP characters Date: Tue, 26 Dec 2017 18:11:18 +0200 Message-ID: <834lodii55.fsf@gnu.org> References: <20171225210115.13789-1-phst@google.com> <83d132hz9e.fsf@gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: blaine.gmane.org X-Trace: blaine.gmane.org 1514304567 1938 195.159.176.226 (26 Dec 2017 16:09:27 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Tue, 26 Dec 2017 16:09:27 +0000 (UTC) Cc: phst@google.com, emacs-devel@gnu.org To: Philipp Stephani Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Dec 26 17:09:23 2017 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eTrn7-0008MI-Mb for ged-emacs-devel@m.gmane.org; Tue, 26 Dec 2017 17:09:21 +0100 Original-Received: from localhost ([::1]:47117 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eTrp3-00035R-87 for ged-emacs-devel@m.gmane.org; Tue, 26 Dec 2017 11:11:21 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:54659) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eTrow-000353-Vz for emacs-devel@gnu.org; Tue, 26 Dec 2017 11:11:15 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eTrow-0005Vk-2g for emacs-devel@gnu.org; Tue, 26 Dec 2017 11:11:14 -0500 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:54611) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eTroq-0005Rr-Gr; Tue, 26 Dec 2017 11:11:08 -0500 Original-Received: from [176.228.60.248] (port=3660 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1eTrop-0000cC-Rk; Tue, 26 Dec 2017 11:11:08 -0500 In-reply-to: (message from Philipp Stephani on Tue, 26 Dec 2017 10:35:42 +0000) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:221418 Archived-At: > From: Philipp Stephani > Date: Tue, 26 Dec 2017 10:35:42 +0000 > Cc: emacs-devel@gnu.org, phst@google.com > > Suggest to move surrogates_to_codepoint to coding.c, and then use the > macros UTF_16_HIGH_SURROGATE_P and UTF_16_LOW_SURROGATE_P defined > there. > > Hmm, I'd rather go the other way round and remove these macros later. They are macros, thus worse than > functions, I don't think we have a policy to prefer inline functions to macros, and I don't think we should have such a policy. We use inline functions when that's necessary, but we don't in general prefer them. They have their own problems, see the comments in lisp.h for some of that. > and don't seem to be correct either (what about a value such as 0x11DC00?). ??? They care correct for UTF-16 sequences, which are 16-bit numbers. If you need to augment them by testing the high-order bits to be zero in your case, that's okay, but I don't see any need for introducing similar but different functionality. > No new macros please if we can avoid it. Functions are strictly better. Sorry, I disagree. Each has its advantages, and on balance I find macros to be slightly better, certainly not worse. There's no need to avoid them in C. > I don't care much whether they are in character.h or coding.h, but char_surrogate_p is already in character.h. char_surrogate_p should have used the coding.h macros as well. > > + USE_SAFE_ALLOCA; > > + unichar *utf16_buffer; > > + SAFE_NALLOCA (utf16_buffer, 1, len); > > Maximum length of a UTF-16 sequence is known in advance, so why do you > need SAFE_NALLOCA here? Couldn't you use a buffer of fixed length > instead? > > The text being inserted can be arbitrarily long. Even single characters (i.e. extended grapheme clusters) can > be arbitrarily long. Yes, but why do you first copy the input into a separate buffer? Why not convert each UTF-16 sequence separately, as you go through the loop?