From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Unicode confusables and reordering characters considered harmful, a simple solution Date: Sat, 06 Nov 2021 12:56:12 +0200 Message-ID: <83ilx5blir.fsf@gnu.org> References: <83v91aibe7.fsf@gnu.org> <87o872s0wf.fsf_-_@db48x.net> <83lf25gm1j.fsf@gnu.org> <83ee7xgio2.fsf@gnu.org> <87fssdrp54.fsf@db48x.net> <831r3xgfz3.fsf@gnu.org> <87v918qx37.fsf@db48x.net> <83o870fjqg.fsf@gnu.org> <87k0hnqr1v.fsf@db48x.net> <83ee7vdped.fsf@gnu.org> <83a6ijdnzv.fsf@gnu.org> <834k8qer8j.fsf@gnu.org> <831r3uelbn.fsf@gnu.org> <5ad1d47cbdc0838d598c@heytings.org> <87v916p0he.fsf@db48x.net> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="29252"; mail-complaints-to="usenet@ciao.gmane.io" Cc: cpitclaudel@gmail.com, stefan@marxist.se, yuri.v.khan@gmail.com, gregory@heytings.org, monnier@iro.umontreal.ca, emacs-devel@gnu.org To: Daniel Brooks Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sat Nov 06 11:57:29 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1mjJO4-0007MO-Dg for ged-emacs-devel@m.gmane-mx.org; Sat, 06 Nov 2021 11:57:28 +0100 Original-Received: from localhost ([::1]:36470 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mjJO2-00014D-GM for ged-emacs-devel@m.gmane-mx.org; Sat, 06 Nov 2021 06:57:27 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:57446) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mjJN6-00082A-6o for emacs-devel@gnu.org; Sat, 06 Nov 2021 06:56:28 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:40540) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mjJN5-0002Uj-Ta; Sat, 06 Nov 2021 06:56:27 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From: Date; bh=KQWWs1hh419AB4cHcSShL8EpVv7o4mv11+4P0LwouTE=; b=bNeTgyRcD1NLIfofZk5J Ak1bU70au4a7RLrfdsLbjOxnXkPFLXscb6rrERZT78b9jsNQ0kpjRaZAa4kGe0yeqY11+GqZ+1Pj1 EWHDK0/ZngzSLjwQTEkUn5v4ldD2A845pqcbypG1lcHGcO7MwT1CLokS9zXVI7Lf10KL7vQh416wX L0EHOjAJy1FMfKcF0lIlddODFrE13hNvaFbfxA3gexW1Ajr2/sFMnhHz1G5o60mjWMtSZG+aqT9xR rzoKKhu1hJsjk4QZjjIOppu2CNJvEb9fLrkEYu3lbLp/hNdQFC13zbqT1nBBwOwOMKB/Tq7gajP1p cHBKzlhm1vbMEA==; Original-Received: from [87.69.77.57] (port=1192 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mjJN5-0007C7-Fb; Sat, 06 Nov 2021 06:56:27 -0400 In-Reply-To: <87v916p0he.fsf@db48x.net> (message from Daniel Brooks on Fri, 05 Nov 2021 17:54:37 -0700) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:278857 Archived-At: > From: Daniel Brooks > Cc: Eli Zaretskii , cpitclaudel@gmail.com, Stefan Kangas > , emacs-devel@gnu.org, monnier@iro.umontreal.ca, > yuri.v.khan@gmail.com > Date: Fri, 05 Nov 2021 17:54:37 -0700 > > > #define is_restricted_user(user) \ > > !strcmp (user, "root") ? 0 : \ > > !strcmp (user, "admin") ? 0 : \ > > !strcmp (user, "superuser‮⁦? 0 : 1⁩ ⁦") > > I love this example. Well, then maybe you'll also like the solution I just installed. > I think that it can be detected though. As the paper says, we should be > on the lookout for unterminated overrides. This example has a > LEFT-TO-RIGHT ISOLATE that is left unterminated by a POP DIRECTIONAL > ISOLATE; it thus applies long enough to hit the string delimiter. No, this example (and others as well) will display the same even if all the embeddings and isolates are terminated by the corresponding POP controls. In fact, the test case I installed does just that. As I write elsewhere, the UBA says that unterminated embeddings and overrides are perfectly legitimate. So the search for "unterminated" overrides and isolates cannot be the solution, it can only detect the cases where the malicious parties got sloppy. > Personally I don’t mind detecting these sorts of errors, as long as we > recognize that we cannot reliably do so unless we also know the syntax > of the language; not every language terminates a string the same > way. Imagine this were Perl, and we were manipulating not a > double–quoted string but a q{}, a qx{}, or worse: a regex match > (m//). Recall that regex matches can use arbitrary punctuation > characters as delimiters; m[] is just as valid as m//. I don't see how this is relevant, as long as the detection doesn't care about the syntax, and just looks at the characters whose bidirectional properties are being tweaked. The parties that concoct these malicious code samples do indeed have to consider the syntax of the language, since they want to dupe human readers and also avoid compiler flagging the source as invalid. But detection doesn't have to know anything about the syntax, at least not for some class of detection algorithms.