From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Vasilij Schneidermann Newsgroups: gmane.emacs.devel Subject: Unicode confusables and reordering characters considered harmful Date: Tue, 2 Nov 2021 13:57:20 +0100 Message-ID: Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="4/Nm7pUeA9rrj3mx" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="1684"; mail-complaints-to="usenet@ciao.gmane.io" To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Tue Nov 02 14:12:51 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1mhtat-0000GZ-MC for ged-emacs-devel@m.gmane-mx.org; Tue, 02 Nov 2021 14:12:51 +0100 Original-Received: from localhost ([::1]:40318 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mhtas-0008Lb-JI for ged-emacs-devel@m.gmane-mx.org; Tue, 02 Nov 2021 09:12:50 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:46468) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mhtM5-0001s9-DL for emacs-devel@gnu.org; Tue, 02 Nov 2021 08:57:33 -0400 Original-Received: from mout-p-201.mailbox.org ([80.241.56.171]:58326) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_CHACHA20_POLY1305:256) (Exim 4.90_1) (envelope-from ) id 1mhtM3-0005BN-A1 for emacs-devel@gnu.org; Tue, 02 Nov 2021 08:57:33 -0400 Original-Received: from smtp102.mailbox.org (smtp102.mailbox.org [80.241.60.233]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-384) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-201.mailbox.org (Postfix) with ESMTPS id 4Hk90k1sTczQk78 for ; Tue, 2 Nov 2021 13:57:26 +0100 (CET) X-Virus-Scanned: amavisd-new at heinlein-support.de Content-Disposition: inline X-Rspamd-Queue-Id: 1705A569 Received-SPF: pass client-ip=80.241.56.171; envelope-from=mail@vasilij.de; helo=mout-p-201.mailbox.org X-Spam_score_int: -25 X-Spam_score: -2.6 X-Spam_bar: -- X-Spam_report: (-2.6 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:278472 Archived-At: --4/Nm7pUeA9rrj3mx Content-Type: text/plain; charset=utf-8 Content-Disposition: inline There's a paper going around that demonstrates how two Unicode features can be used to trick source code auditors into misinterpreting program logic. The authors have suggested that language specifications should be amended, implementations should warn or raise errors and editor tooling should display visual warnings. Both issues are tracked as CVE-2021-42574 and CVE-2021-42694. The first issue is about bidirectional reordering characters. If bidi text rendering is not needed, it's easy enough to work around with `(setq-default bidi-display-reordering nil)`. Some people already make use of this to speed up redisplay. Maybe there's a better solution, such as automatically detecting whether the user is working with a RTL script and only then enable bidi text rendering. The second issue is about mixed-script confusable characters. Emacs does not appear to have a workaround for that. I've come across the uni-confusables package in GNU ELPA, but it merely sets up character tables. The only mention of confusables I can find in the Emacs sources is for `help-uni-confusables` which contains a much smaller list for quotation marks, used in help buffers and elisp buffers. A possible solution would be to implement the Unicode confusables algorithm and expose it in the uni-confusables package. Vasilij https://trojansource.codes/ https://www.trojansource.codes/trojan-source.pdf https://github.com/nickboucher/trojan-source https://krebsonsecurity.com/2021/11/trojan-source-bug-threatens-the-security-of-all-code/ https://unicode.org/reports/tr39/#Confusable_Detection --4/Nm7pUeA9rrj3mx Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEE0dAcySl3bqM8O17WFmfJg6zCifoFAmGBNa0ACgkQFmfJg6zC ifpv0wf/aft+6dmh8T99YYzbzd2Lf3kdjZH4YQcpwWrgORX0vfzBXAqWvm+YW+1u VxLXkzogXdLFb6+BMnKARtQth5hDRGf/zMj6GfTnCIf3nOnTRlgEG/8GaKLrkxke fXAd7uS2UeHRn4cJ5yqdxyM6gLazstYXW/MvLH7iOI+dP8HZArbltPUmiBL7R9IW tBjJJMdYSWeaW1lBI268FZBOiCkEOmc/hpC5AxUE1o+0jeVkJrnA169Jhx2Ff9LA P0E6r3RFLlj70qioZ6Tot1n+5JOstKm3Iq8/EuTBGd/uL0V8Mrc8FN69Ix9Wnty2 9PgfyZItl8n4ec2Q4kc1/gDMPyV6vw== =AHNQ -----END PGP SIGNATURE----- --4/Nm7pUeA9rrj3mx--