From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: master 02bca34: Utilize new string decoding feature in GTK native input Date: Sat, 19 Feb 2022 14:36:43 +0200 Message-ID: <838ru7yqw4.fsf@gnu.org> References: <83czjjyzao.fsf@gnu.org> <87y227mal9.fsf@yahoo.com> Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="16264"; mail-complaints-to="usenet@ciao.gmane.io" Cc: emacs-devel@gnu.org To: Po Lu Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sat Feb 19 13:40:47 2022 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nLP2d-000470-G0 for ged-emacs-devel@m.gmane-mx.org; Sat, 19 Feb 2022 13:40:47 +0100 Original-Received: from localhost ([::1]:40580 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nLP2c-0008I1-0a for ged-emacs-devel@m.gmane-mx.org; Sat, 19 Feb 2022 07:40:46 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:39976) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nLOyj-0006jy-3o for emacs-devel@gnu.org; Sat, 19 Feb 2022 07:36:48 -0500 Original-Received: from [2001:470:142:3::e] (port=57358 helo=fencepost.gnu.org) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nLOyi-0007rc-On; Sat, 19 Feb 2022 07:36:44 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=WKWRtEa2fUkOeQlpdx9l4EdeuZ1Xt6JEu3YSg2YUa9U=; b=r8KHqQFKw4Kj Q/x904tpiRIsWeLfySD7+rIQuKTlq6gvEMsCaVU91uiwcd4+MtH9unpfMhw7RBPspcWUiV4GwZDbE ElyYaEGBx5jwk4iTrCVP/tr7thZvyzfZ6FI2QX76hNREAJsTeSQKr7vBh1gFWYG4HiS5ngy3bt66F V3Fplrker4HBSEZZ2ov5J84zoyRjgkjXRfIgFWjnmmqv9R3M0oc74gLnMa/lsmiWhVivdTTjuoD+r pqLSV8ue1kRRYjz8IBayWwkmQcjzuIDqPFMAxUQj/7XuUYPftawrKYOf2CpCL9xPLvmsaivC0dqNg xgh89rTW3PFDYSsyznNxDA==; Original-Received: from [87.69.77.57] (port=1748 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nLOyi-0002fP-49; Sat, 19 Feb 2022 07:36:44 -0500 In-Reply-To: <87y227mal9.fsf@yahoo.com> (message from Po Lu on Sat, 19 Feb 2022 18:09:38 +0800) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:286464 Archived-At: > From: Po Lu > Cc: emacs-devel@gnu.org > Date: Sat, 19 Feb 2022 18:09:38 +0800 > > Eli Zaretskii writes: > > > Is this a good idea? Consing a string when we process input increases > > GC pressure, and what issues does this change solve as a > > counter-weight for that disadvantage? Is g_utf8_to_ucs4 a problematic > > API or something? > > No, but some input method modules don't always return valid UTF-8 like > they're supposed to, thereby causing crashes in g_utf8_to_ucs4_fast. > > I should have explained that in the commit message. You can still explain that in a comment to the code. > > But in general, decoding UTF-8 encoded C string is better done without > > consing a string and then using the coding.c stuff. After all, if the > > original string is 100% guaranteed to be in UTF-8, the decoding is > > almost trivial. > > It's supposedly guaranteed, but some input method modules break that > guarantee. And what do we want to do with those invalid UTF-8 sequences? The way you did it will produce raw bytes for them -- is that really TRT in this case? In any case, at the very least consider using decode_string_utf_8 instead of consing a Lisp string and then using the "usual" decoding stuff -- decode_string_utf_8 will cons only one string.