From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Unicode confusables and reordering characters considered harmful, a simple solution Date: Fri, 05 Nov 2021 10:31:39 +0200 Message-ID: <838ry3dmvo.fsf@gnu.org> References: <83wnlqk3rn.fsf@gnu.org> <72dd5c2a-42c7-b12e-05ed-e93adbd89727@gmail.com> <83ilxajyhw.fsf@gnu.org> <83fssejxf8.fsf@gnu.org> <835ytajsv2.fsf@gnu.org> <831r3yjqo9.fsf@gnu.org> <83v91aibe7.fsf@gnu.org> <87o872s0wf.fsf_-_@db48x.net> <83lf25gm1j.fsf@gnu.org> <83ee7xgio2.fsf@gnu.org> <87fssdrp54.fsf@db48x.net> <831r3xgfz3.fsf@gnu.org> <87v918qx37.fsf@db48x.net> <83o870fjqg.fsf@gnu.org> <87k0hnqr1v.fsf@db48x.net> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="12226"; mail-complaints-to="usenet@ciao.gmane.io" Cc: cpitclaudel@gmail.com, emacs-devel@gnu.org, stefan@marxist.se, monnier@iro.umontreal.ca, yuri.v.khan@gmail.com To: Daniel Brooks Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Fri Nov 05 09:32:59 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1miueh-0002yc-7G for ged-emacs-devel@m.gmane-mx.org; Fri, 05 Nov 2021 09:32:59 +0100 Original-Received: from localhost ([::1]:48564 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1miuef-0002G4-N9 for ged-emacs-devel@m.gmane-mx.org; Fri, 05 Nov 2021 04:32:57 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:47332) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1miudk-0001TU-H9 for emacs-devel@gnu.org; Fri, 05 Nov 2021 04:32:00 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:46168) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1miudi-0007Ji-D1; Fri, 05 Nov 2021 04:31:58 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From: Date; bh=n+yosDTrc3/QkHkN5S0GBXgDVOiogi3LHOH3P1QbIGc=; b=Q0vx4/CVwphIttaPK2MB 7KG66izWiOahUpjLwWYxktn9UwddxcMFYRdMtlZg30DAozYsDhZ5xGsyAbOLBU2hUBCu+hQHpRlRq 165euAJqDzK+6md95eYnKMDj8K1zVPBx2fiWwS0/pwqnPKVYYv6tQi49xgODC6oWafk30zsPOdhGT vCe0BHDKOYylreBMotvayRmX0pzcv7/JHBew2biIqi9szS+efVCBWiqZOuARcMvMGi2w5gBneSCqv JZFvbjhSY+PP83kr7XVDQp7VHyvVmtlGuOOpH/MvUpnLy0yU7KNQdXPrxg3jbQM/AGNhCtJKi026H 5rabhkZCLTMRWA==; Original-Received: from [87.69.77.57] (port=3588 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1miudh-000078-Re; Fri, 05 Nov 2021 04:31:58 -0400 In-Reply-To: <87k0hnqr1v.fsf@db48x.net> (message from Daniel Brooks on Thu, 04 Nov 2021 19:23:08 -0700) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:278750 Archived-At: > From: Daniel Brooks > Cc: cpitclaudel@gmail.com, yuri.v.khan@gmail.com, stefan@marxist.se, > monnier@iro.umontreal.ca, emacs-devel@gnu.org > Date: Thu, 04 Nov 2021 19:23:08 -0700 > > Eli Zaretskii writes: > > > Then this visual noise will get in the way of people's reading those > > comments and strings, and, for strings, will make it very hard to > > understand what will be presented to the user when those strings are > > output in some UI. > > > >> That’s where the problem is. > > > > No, the problem is elsewhere entirely: it's in the punctuation > > characters unrelated to strings and comments whose directionality is > > overridden, and which thus display in places that cause incorrect > > visual interpretation of the program during a casual read. > > Look at the examples again. In many of them, all of the bidi override > characters are inside a string or comment. Not relevant to the point I was trying to make. (And what about those cases where the directional controls are outside the comments or strings?) > When that is the case, these characters are only a problem if they > cause characters that are inside the string or comment to appear to > be outside of it, by reordering those characters relative to the > syntactic markers for the string or comment. In other examples these > characters are _outside_ the string or comment. > > Unless Emacs has specific knowledge of the language syntax, showing the > characters is the only sure way to know if there is a problem or not. The command I installed achieves this without requiring any knowledge of the language syntax. So no, yours is not the only way. > > You misunderstand the cause. The mere presence of these characters is > > NOT the root cause. These characters are legitimate and helpful when > > used as intended. See TUTORIAL.he for a pertinent example. > > Please don’t presume to tell me what I do or don’t understand. Yes, > there are use cases which are not harmful, but as I have said it must be > up to either the programmer or the compiler to answer that > question. Emacs doesn’t know the syntax of every programming language. Emacs should do a good job of not crying wolf too much, or else the programmer will turn off these safety nets. The feature you propose as THE solution for the issue flags each and every use of these characters, the absolute majority of which is completely legitimate. That is bad for safety/security related warnings: if they have too low signal-to-noise ratio, people will disable them and lose all the safety. > >> Furthermore, I have not suggested that showing the characters needs to > >> preclude any other form of highlighting. If you wish to develop some > >> additional way of warning the developer, please do so. > > > > We are talking about what should be in Emacs. What you suggest > > shouldn't. > > No other suggested feature will be useful to me. This one will. I > suggest to you that you do not know what all users want. I submit that users who'd want your feature indeed don't know what they want. They are perhaps alarmed by the brouhaha around this issue, whose details they don't understand, but that is all. > > Since the Rust compiler evidently does this when it finds these > > characters inside comments (and probably also inside strings), IMNSHO > > this is a terrible misfeature, because it means code that uses those > > controls in legitimate ways cannot be compiled without tweaking > > non-default options. That's a cop-out, not the way to flag the > > problematic cases. > > Your conclusion here is incorrect. Rust has choosen a fast strategy, > where they implement a broad error today (well, four days ago) knowing > that it does not prevent them from introducing a more refined error or > set of errors later. Then let's withdraw our approval of what they did until they do introduce those more refined set of errors, shall we? For now, their cure is worse than the disease, because it will fail completely legitimate programs out of fear of the illegitimate ones, which might never come. > Rust also has a very flexible annotation system that allows the > programmer to annotate specific statements and language items. If a use > of these characters is determined to be legitimate, the programmer can > annotate the comment, or the function the comment is in, so that this > error is disabled. IME, programmers don't like to do stuff that doesn't directly help them, and will do anything to evade that. Especially in the Free Software world, where usually there's no boss telling them what to do. > > I think this is terrible. At best, it only tells you that something > > non-trivial goes on here (but what exactly?). At worst, it looks like > > corruption of the source. And while in the malicious case treating > > that as corruption is not such a bad idea, all the valid uses of these > > characters will also look like corruption. Which means the cure is > > probably worse than the disease, because the malicious cases are a > > tiny fraction of the valid ones. > > I cannot believe that you really think this. It shows up with exactly the > same highlighting that your recently–introduced > highlight-confusing-reorderings function uses. In those few examples, carefully chosen to include only the malicious reordering, yes. But try it on legitimate uses of those control characters, and you will see that highlight-confusing-reorderings doesn't highlight anything (barring bugs), unlike your proposal that does. And that's the main point I'm trying to make: features such as this one cannot afford crying wolf too much. > Yours doesn’t even work with `next-error`. It wasn't supposed to. It was supposed to be similar to flyspell-mode, which also "doesn't work" with next-error. Of course, if we decide that next-error should be able to find such places, we can always add that (emacs 29 is still very far from a release, and we have ample time for that), but I doubt it would be a good idea, because next-error is about messages emitted by compilers, and this is not a compiler-based feature. That said, if the new command doesn't help you, you are free not to use it, of course. Hopefully, people who are really interested in finding the maliciously reordered code will.