From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Unicode confusables and reordering characters considered harmful, a simple solution Date: Sat, 06 Nov 2021 17:34:47 +0200 Message-ID: <8335o9b8mg.fsf@gnu.org> References: <83v91aibe7.fsf@gnu.org> <87o872s0wf.fsf_-_@db48x.net> <83lf25gm1j.fsf@gnu.org> <83ee7xgio2.fsf@gnu.org> <87fssdrp54.fsf@db48x.net> <831r3xgfz3.fsf@gnu.org> <87v918qx37.fsf@db48x.net> <83o870fjqg.fsf@gnu.org> <87k0hnqr1v.fsf@db48x.net> <83ee7vdped.fsf@gnu.org> <83a6ijdnzv.fsf@gnu.org> <834k8qer8j.fsf@gnu.org> <831r3uelbn.fsf@gnu.org> <87mtmhwflk.fsf@turtle-trading.net> Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="40249"; mail-complaints-to="usenet@ciao.gmane.io" Cc: emacs-devel@gnu.org To: Benjamin Riefenstahl Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sat Nov 06 16:36:48 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1mjNkN-000AHQ-9L for ged-emacs-devel@m.gmane-mx.org; Sat, 06 Nov 2021 16:36:47 +0100 Original-Received: from localhost ([::1]:56054 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mjNkL-0003G7-QT for ged-emacs-devel@m.gmane-mx.org; Sat, 06 Nov 2021 11:36:45 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:44828) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mjNih-0001qg-Qa for emacs-devel@gnu.org; Sat, 06 Nov 2021 11:35:03 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:45366) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mjNih-00053L-3i; Sat, 06 Nov 2021 11:35:03 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=1vngdqBbKCx2uAEmmQZoelBRfX8DWuX5wDf3pt3oBjM=; b=TKBk8KXtsLPy db1Y+5olElzdyg9Wh2qQbHI3TVUt3j4z3PnkRgwsdsTYy3TXPtRmR1a08U+SIu8z+p8F1ymd+xaw1 kYsalasDZ9BxLWnTwy1Gf9xh58cXuhNJfudxi2RrLFC+6fYStOL0LRY/SWPrPO1N2fl5LsoV540GI 5t13eVjxVYPGC0PI/LdjQ9xdZQSRulz7+GiryEl1gPZtbTFwrDOVlghDqBfm2fcpihNKT0RnhLTU7 zmVqVZ7jd6HryQs62uwKx12DZtUvmNyBSspk3x2JBU1GIF0lL99zDGobToyApgIld9p1cSGMQMjPD d8m+sv3uA4VxAD+He0NSug==; Original-Received: from [87.69.77.57] (port=2882 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mjNig-0006je-MY; Sat, 06 Nov 2021 11:35:02 -0400 In-Reply-To: <87mtmhwflk.fsf@turtle-trading.net> (message from Benjamin Riefenstahl on Sat, 06 Nov 2021 14:58:31 +0100) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:278874 Archived-At: > From: Benjamin Riefenstahl > Date: Sat, 06 Nov 2021 14:58:31 +0100 > > Eli Zaretskii writes: > > The Unicode Bidirectional Algorithm (UBA) mandates > > (https://unicode.org/reports/tr9/#X8): > > > > X8. All explicit directional embeddings, overrides and isolates are > > completely terminated at the end of each paragraph. > > > > [...] > > > > So when the UBA says "at the end of each paragraph", it means in > > practice at EOL, since all the other paragraph separators are rarely > > if ever used in human-readable text. (And Emacs, of course, > > implements that rule.) > > Should the end of a comment or string in source code then also qualify > as the end of a paragraph in this sense? It could be, but the way the UBA is implemented in Emacs makes that very hard to do, if not impossible. And that's even before you consider comment styles which make that hard even in principle. For example: /* This is the beginning of a comment, */ /* and this is its continuation. */