From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Gregory Heytings Newsgroups: gmane.emacs.devel Subject: Re: Unicode confusables and reordering characters considered harmful, a simple solution Date: Mon, 08 Nov 2021 19:58:56 +0000 Message-ID: <7cc91c798e37e63cc6fa@heytings.org> References: <831r3yjqo9.fsf@gnu.org> <83v91aibe7.fsf@gnu.org> <87o872s0wf.fsf_-_@db48x.net> <83lf25gm1j.fsf@gnu.org> <83ee7xgio2.fsf@gnu.org> <87fssdrp54.fsf@db48x.net> <831r3xgfz3.fsf@gnu.org> <87v918qx37.fsf@db48x.net> <83o870fjqg.fsf@gnu.org> <87k0hnqr1v.fsf@db48x.net> <83ee7vdped.fsf@gnu.org> <83a6ijdnzv.fsf@gnu.org> <834k8qer8j.fsf@gnu.org> <831r3uelbn.fsf@gnu.org> <5ad1d47cbdc0838d598c@heytings.org> <83k0hlblvm.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; format=flowed; charset=us-ascii Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="28221"; mail-complaints-to="usenet@ciao.gmane.io" Cc: cpitclaudel@gmail.com, stefan@marxist.se, yuri.v.khan@gmail.com, db48x@db48x.net, monnier@iro.umontreal.ca, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Mon Nov 08 21:02:26 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1mkAqX-000746-PZ for ged-emacs-devel@m.gmane-mx.org; Mon, 08 Nov 2021 21:02:26 +0100 Original-Received: from localhost ([::1]:51164 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mkAqW-0003ha-Di for ged-emacs-devel@m.gmane-mx.org; Mon, 08 Nov 2021 15:02:24 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:41320) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mkAnG-0001dD-Ve for emacs-devel@gnu.org; Mon, 08 Nov 2021 14:59:03 -0500 Original-Received: from heytings.org ([95.142.160.155]:59016) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mkAnF-0003Nf-0T; Mon, 08 Nov 2021 14:59:02 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=heytings.org; s=20210101; t=1636401537; bh=TtMzNVLvu61/2FAeGDL3GbYWUfFTFnz6c2n9wbr5Umw=; h=Date:From:To:cc:Subject:In-Reply-To:Message-ID:References:From; b=bVT9HidAMJTtEPApBpYemsufxN3p2MuUkPFGg2L4kOjNdE5eq3n6I1TlorCI3f1Iy lWcBviHrNkUeZxGQCZOHVQexhiq0XEP5fCMxqtaGN8R44juhBEknaX6asm+L8quKyE tzcygh+JlK5xC83npOUyRjtu0GcYmQrXS8EaRfRBVTbj2WI0KDS9iSpyDiUxwiXkGt Dh36ZRlImK0kbyveWoiHuKR80KTe++gSVPC5AW7G7JEkgn2Ld8semrosXVkxvCtaGu 7R62ebqFo3BLKskrKmOyKC2mzWYoJG4fbrRItzz7WHqjy+H2UAC4RD8M5zz17ooFW7 siZNYJ4cSr6NQ== In-Reply-To: <83k0hlblvm.fsf@gnu.org> Received-SPF: pass client-ip=95.142.160.155; envelope-from=gregory@heytings.org; helo=heytings.org X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:279059 Archived-At: >> In fact, it did not take me much time to create a case that your >> algorithm doesn't detect (and AFAIU cannot detect without also >> displaying warnings about many legitimate uses). I attach the example >> code, how that code is displayed by Emacs, and how that code would be >> displayed with the patch I proposed. > > Thanks, I've now enhanced the code which detects suspiciously reordered > source to cover this kind of cases as well. I didn't see any legitimate > uses flagged after the change, but if you can find any such cases, > please show them and I will take a look. > Clearly, you failed to understand the meaning of my post. It did *not* mean: Your algorithm could be improved. It meant: Your algorithm cannot be trusted. It took less than 24 hours (after your commit) to a non-malevolent actor to find a way to escape the detection algorithm you implemented and which you claimed was the proper solution to the problem pointed to by the "Trojan Source" paper. Your slightly improved algorithm will evidently not resist longer if an actually malevolent actor tries to find a way to escape it (and of course they won't tell you when and how they did it). So I'll say it one more time: The only proper solution to that problem is to highlight, by default, these control characters in prog-mode and its descendants. That's the only 100% foolproof solution that guarantees that such constructs will never be missed, and this is what about 99.99% Emacs users need. The remaining 0.01% are those who: 1. Use RTL languages in their source code, AND 2. Use these reordering control characters in their source code, AND 3. Would find such highlighted characters annoying. Those few users can turn that highlighting option off, either globally or by turning the minor mode off in this or that buffer. >>> The right balance is where the percent of false positives is very low. >> >> IMO, that's not the right balance: the right balance is where the >> percentage of false negatives is zero. > > If you need zero false negatives, and don't care about the level of > noise (i.e. false positives), you have the features for that already: > customize glyphless-char-display-control to show the control characters > as acronyms or hex codes. > Again you clearly fail to understand what I said. The problem has nothing to do with me, the problem is, as the "Trojan Source" paper rightly explains, what the default settings of various available editors are. Claiming that asking every Emacs user (except the few users mentioned above) to set an obscure configuration option (which is only mentioned once, in passing, in the manual) is a solution to that problem is just wrong. Anyway, it's now clear that this problem will remain unfixed in Emacs. Given this, I can only applaud the Rust developers when they took the decision to ban these control characters from Rust code files. If editors cannot be trusted to do a proper job on this matter, compilers should do it, and I hope that a similar solution will soon be adopted in other compilers. And I leave this discussion with this post.