From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Unicode confusables and reordering characters considered harmful Date: Wed, 03 Nov 2021 15:44:08 +0200 Message-ID: <837ddpicbb.fsf@gnu.org> References: <875ytag0hb.fsf@yahoo.com> <87zgqmd5np.fsf@mat.ucm.es> <83wnlqk3rn.fsf@gnu.org> <72dd5c2a-42c7-b12e-05ed-e93adbd89727@gmail.com> <83ilxajyhw.fsf@gnu.org> <83fssejxf8.fsf@gnu.org> <835ytajsv2.fsf@gnu.org> <11d5fecb44af1d388b7f@heytings.org> <11d5fecb449846dc0851@heytings.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="8598"; mail-complaints-to="usenet@ciao.gmane.io" Cc: gregory@heytings.org, emacs-devel@gnu.org, cpitclaudel@gmail.com, monnier@iro.umontreal.ca To: Stefan Kangas Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Wed Nov 03 14:45:46 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1miGaI-00021a-IH for ged-emacs-devel@m.gmane-mx.org; Wed, 03 Nov 2021 14:45:46 +0100 Original-Received: from localhost ([::1]:50358 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1miGaH-0005tf-Ei for ged-emacs-devel@m.gmane-mx.org; Wed, 03 Nov 2021 09:45:45 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:39356) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1miGYo-0004M1-2r for emacs-devel@gnu.org; Wed, 03 Nov 2021 09:44:16 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:51970) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1miGYm-00077B-52; Wed, 03 Nov 2021 09:44:12 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From: Date; bh=pV9xM3xlwaMnqTXW/kxyg/wT3ZXLkHynlTz2A6rioNw=; b=MCMb8iOE6bRZvHms1DiO Thg0/hyghlYZRdGKAuR07PQaEwPfnvHRo7/nXxbhYMh/3JiA7CRpBaafhN4Ifr+DhB+eT0oXKYiLh awA4gnCkc0iUTC0u79+VfB+LC4xPwVYwP3NqVp31FEPBlLc7gJ8pcueghUbglOHGJChwVtCspxg0U 9cH5P37P97Uo08GUnSamiiYt3V3+pTjC1Kt9cdM8EprS3pcKhK04Zc8fdBzA1MnKjRfotl+WA0yUt GCM4kM17DHwlX3GYzbdg8bbT7cZ1rHACh7c2g3U3neEcoNxaSY0N4R+3jAdOcEf5geeJco3xrEAA7 FTEKqZHNvPx0mg==; Original-Received: from [87.69.77.57] (port=1371 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1miGYl-0001Nd-K6; Wed, 03 Nov 2021 09:44:11 -0400 In-Reply-To: (message from Stefan Kangas on Wed, 3 Nov 2021 12:19:58 +0100) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:278579 Archived-At: > From: Stefan Kangas > Date: Wed, 3 Nov 2021 12:19:58 +0100 > Cc: Eli Zaretskii , > Clément Pit-Claudel , > Stefan Monnier , > Emacs developers > > Depending on how you define it, there is at least one major world > language (Arabic) that has a RTL script, and other major languages > such as Urdu, Farsi and Hebrew also use it (and a couple of others > too). So I think we should consider to what extent your proposal > might hurt users of such languages. > > Are these characters important to write comments and strings in any of > those languages? Yes, definitely. Especially when the comments mix RTL characters with ASCII punctuation and separators (which have "weak" directionality, and change their actual directionality depending on the surrounding strong directional text). This happens quite frequently, because comments can include arithmetic operators and other similar symbol and punctuation characters. Without the formatting controls, this could make comments and strings almost unreadable in some cases. > Will your proposal make it harder to type in such languages? Yes, in some cases. > If yes, are there less invasive solutions? Yes: detect the situations where the use of these controls is suspicious. For example, the current implementation of bidi-find-overridden-directionality detects when characters that normally have left-to-right directionality (example: 'a') are forced to behave as strong right-to-left characters instead -- this is something "normal" human-readable text should rarely if ever need to do, and OTOH its potential to confuse is very high.