From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Daniel Brooks Newsgroups: gmane.emacs.devel Subject: Re: Unicode confusables and reordering characters considered harmful, a simple solution Date: Fri, 05 Nov 2021 17:54:37 -0700 Message-ID: <87v916p0he.fsf@db48x.net> References: <83v91aibe7.fsf@gnu.org> <87o872s0wf.fsf_-_@db48x.net> <83lf25gm1j.fsf@gnu.org> <83ee7xgio2.fsf@gnu.org> <87fssdrp54.fsf@db48x.net> <831r3xgfz3.fsf@gnu.org> <87v918qx37.fsf@db48x.net> <83o870fjqg.fsf@gnu.org> <87k0hnqr1v.fsf@db48x.net> <83ee7vdped.fsf@gnu.org> <83a6ijdnzv.fsf@gnu.org> <834k8qer8j.fsf@gnu.org> <831r3uelbn.fsf@gnu.org> <5ad1d47cbdc0838d598c@heytings.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="30500"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) Cc: cpitclaudel@gmail.com, Stefan Kangas , yuri.v.khan@gmail.com, monnier@iro.umontreal.ca, Eli Zaretskii , emacs-devel@gnu.org To: Gregory Heytings Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sat Nov 06 01:56:17 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1mjA0G-0007jd-QM for ged-emacs-devel@m.gmane-mx.org; Sat, 06 Nov 2021 01:56:17 +0100 Original-Received: from localhost ([::1]:40884 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mjA0D-0001Sm-My for ged-emacs-devel@m.gmane-mx.org; Fri, 05 Nov 2021 20:56:13 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:53760) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mj9yn-0000ec-0q for emacs-devel@gnu.org; Fri, 05 Nov 2021 20:54:45 -0400 Original-Received: from smtp-out-4.mxes.net ([198.205.123.69]:56843) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mj9yk-00061p-Sx for emacs-devel@gnu.org; Fri, 05 Nov 2021 20:54:44 -0400 Original-Received: from Customer-MUA (mua.mxes.net [10.0.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.mxes.net (Postfix) with ESMTPSA id 4HmJmt2x0cz3c8v; Fri, 5 Nov 2021 20:54:38 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mxes.net; s=mta; t=1636160079; bh=hjIAvDCb4w17MlZUJ+Pu5fPz3Agi4Ma/envxYBTxQyk=; h=From:To:Subject:References:Date:In-Reply-To:Message-ID: MIME-Version:Content-Type; b=JAlypNLOMo85fJmQMnr7umoLuXszWdOMHqmtQccwPjNn7iuPoRrZIfG3KyBF6lvHQ 5Oy0jGoJkndFg7abJ9mXVXo0bjREr9bgY+mrBSzPmM/sdZsZgT6ISnVGqnFVQKooA2 QYDFhudOd3t/JTZuAt1RgeryYnf6lBZ8YgAnRwDo= Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAABGdBTUEAALGOfPtRkwAAABJQ TFRFpKfbdou67PD6JjJgAwUWXGSeIcyLHgAAAkZJREFUOI1VU8Fy6yAMxLi+Q13fCZ3cnQL3dqTc 7RD+/1feStDXVnXHDuvVSivZTMba2GPdw3gyCGcMAFxTyrTd9dwGoxHiZX9PmRFUHYAQlGGtXY+F Uk0SJOxgJiUEnH1qkitT9D+pQub7qGAmUbR6bu3CvI96Yv6QqkBBMrsyfZccr1/RDXGDTLf4P7ZY glVxe2V+/ACXWO1gvDO9/gDRpFFVmPluvLcmBjd5H6d8DEte+Pbk4rcY/Fa5tLKLOtCZsuQKYhpa LOkYDT7hESya7/WIET3lfQBqX0pwFtbI832Is0ayMUR9B+12xjgPCQ089cfwkCkX6L5TPmRelJTh zMS0Sz1PyjLAMCUWjcmgQLWQMds+e3aaauZDf9dU9A2/8kPVF2odCUoMKHkfjJR+mbgC+DRiycw5 3XSqGe6HmhN/AWjHypkAXOAFW5EiuA1ge2GiZuMb0s1fSEXcATeLUfbyEY2L8yPOmdSsdghQXx3K pz2eoeXuYvMCINVFDrCdNfVUp4eJ6cSEbjbgFjBEvonGGTrgv9cHjAc8aVgSAPoxaONbzfwhDIhR at7IIS7fAGiDSwIA9alhhTBzfA7YM2FY6eMwayrIGK8FDFmshmUA43WqhFtpvoqG9HHaJ7fqtgTz 8EWVkgZgtsylFliHDgk0MB7KAEC45C/rgnGvanNLXyzOeTzcT2nw/N44gfrtYXRQLoz9Q3TgmJRx 2Mx/Q51qzpm+l3m8z2SWBqC5+PZXAtNYlGFf/gKfHfjFkDT4x7od7R+w3Ls+ZdQBuQAAAABJRU5E rkJggg== In-Reply-To: <5ad1d47cbdc0838d598c@heytings.org> (Gregory Heytings's message of "Fri, 05 Nov 2021 23:33:39 +0000") X-Sent-To: Received-SPF: none client-ip=198.205.123.69; envelope-from=db48x@db48x.net; helo=smtp-out-4.mxes.net X-Spam_score_int: -25 X-Spam_score: -2.6 X-Spam_bar: -- X-Spam_report: (-2.6 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_PASS=-0.001, SPF_NONE=0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:278825 Archived-At: Gregory Heytings writes: > This "I consider" is the problem of your approach. Malevolent actors > are always more inventive, and will find a way to escape the safety > net you created. Absolutely. > The cases you consider suspicious are cases where the directionality > of one or more characters is overridden by reordering control > characters, but this is not what the "Trojan Source" paper is about. > The problem it points to is much broader, it's about using these > invisible control characters to make the source code appear different > to a human reader and to a compiler. Specifically reordering the source so that something which is inside of a comment or string appears to be outside of it, or visa versa. However, as you say arbitrary rearrangement is on the table. The paper specifically mentions that the line can be treated as an anagram, and the characters rearranged into an arbitrary order. It would be fun to find a nice example where one enum variant was substituted for another, with no string or comment on the line to supply the necessary characters. It would require enum variants whose names are anagrams=E2=80=A6 > In fact, it did not take me much time to create a case that your > algorithm doesn't detect (and AFAIU cannot detect without also > displaying warnings about many legitimate uses). I attach the example > code, how that code is displayed by Emacs, and how that code would be > displayed with the patch I proposed. > > #define is_restricted_user(user) \ > !strcmp (user, "root") ? 0 : \ > !strcmp (user, "admin") ? 0 : \ > !strcmp (user, "superuser=E2=80=AE=E2=81=A6? 0 : 1=E2=81=A9 =E2=81=A6") I love this example. I think that it can be detected though. As the paper says, we should be on the lookout for unterminated overrides. This example has a LEFT-TO-RIGHT ISOLATE that is left unterminated by a POP DIRECTIONAL ISOLATE; it thus applies long enough to hit the string delimiter. Personally I don=E2=80=99t mind detecting these sorts of errors, as long as= we recognize that we cannot reliably do so unless we also know the syntax of the language; not every language terminates a string the same way. Imagine this were Perl, and we were manipulating not a double=E2=80=93quoted string but a q{}, a qx{}, or worse: a regex match (m//). Recall that regex matches can use arbitrary punctuation characters as delimiters; m[] is just as valid as m//. But perhaps it would suffice to find isolates which are only terminated by a newline character. db48x