From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Gregory Heytings Newsgroups: gmane.emacs.devel Subject: Re: Unicode confusables and reordering characters considered harmful Date: Wed, 03 Nov 2021 11:31:37 +0000 Message-ID: <11d5fecb443892de13b1@heytings.org> References: <875ytag0hb.fsf@yahoo.com> <87zgqmd5np.fsf@mat.ucm.es> <83wnlqk3rn.fsf@gnu.org> <72dd5c2a-42c7-b12e-05ed-e93adbd89727@gmail.com> <83ilxajyhw.fsf@gnu.org> <83fssejxf8.fsf@gnu.org> <835ytajsv2.fsf@gnu.org> <11d5fecb44af1d388b7f@heytings.org> <11d5fecb449846dc0851@heytings.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="15076"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Eli Zaretskii , emacs-devel@gnu.org, =?UTF-8?Q?Cl=C3=A9ment_Pit-Claudel?= , Stefan Monnier To: Stefan Kangas Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Wed Nov 03 12:32:16 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1miEV6-0003hO-27 for ged-emacs-devel@m.gmane-mx.org; Wed, 03 Nov 2021 12:32:16 +0100 Original-Received: from localhost ([::1]:49312 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1miEV4-0000Oo-TI for ged-emacs-devel@m.gmane-mx.org; Wed, 03 Nov 2021 07:32:14 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:57874) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1miEUY-0008B3-Ey for emacs-devel@gnu.org; Wed, 03 Nov 2021 07:31:42 -0400 Original-Received: from heytings.org ([95.142.160.155]:50896) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1miEUW-0005Hy-0l; Wed, 03 Nov 2021 07:31:42 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=heytings.org; s=20210101; t=1635939097; bh=LQg7vw6bT1EgpZwJSvOs1N6Nn9h/kfBS2SmsGOSenoo=; h=Date:From:To:cc:Subject:In-Reply-To:Message-ID:References:From; b=CT+7kZV9OFzopDKzU1qi7DXXhyAgiMVO+W9xYoGu+3TryaCa0iYKKdLd0OjKIksnI SovaiK9HKT5OLUUg8abQo95f/Om+cRNum2131+if1MrwfPowIH79H0us0lbGxwbTZo f4gHPGADHYrGN6IIvbWV7YSrB6esrtrCaVygS0cunKl1wy7unwsUvVoIOmwNd7QKY/ MSdKOFy53s3XGPjn4O2piyhnjlFqSBrnvfG0qqw1A7KTWfjREeRpsGRXKdYr6XPugu J9ey+C2yopkHbYtZPAlgD6gtG0a10GOvwPRiVU9cWIwHsZgpIHxaCmAfJEXcpdVmGa YBMvDeZwPhxdA== In-Reply-To: Received-SPF: pass client-ip=95.142.160.155; envelope-from=gregory@heytings.org; helo=heytings.org X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:278565 Archived-At: >> There's some data that shows that this is extremely rare in general: >> the Rust Security Response WG analyzed the 70322 crates and found only >> 5 in which these codepoints were present (see [1]). That's ~0.01 %. >> >> Moreover such highlighting does not make the source code or text >> unreadable, even in those few legitimate cases. > > Depending on how you define it, there is at least one major world > language (Arabic) that has a RTL script, and other major languages such > as Urdu, Farsi and Hebrew also use it (and a couple of others too). So > I think we should consider to what extent your proposal might hurt users > of such languages. > > Are these characters important to write comments and strings in any of > those languages? Will your proposal make it harder to type in such > languages? If yes, are there less invasive solutions? > Thanks for your comments! AFAIK, these specific characters are not necessary to write comments and strings in these languages. Here are two random file which use RTL strings and comments, and in which these characters are not used: https://raw.githubusercontent.com/01walid/goarabic/master/stringutils_test.go https://raw.githubusercontent.com/AbdullahDiaa/garabic/main/garabic.go