From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Lars Ingebrigtsen Newsgroups: gmane.emacs.devel Subject: Re: bidi-string-strip-control-characters Date: Thu, 20 Jan 2022 10:29:26 +0100 Message-ID: <8735li69hl.fsf@gnus.org> References: <83ee52rcar.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="20198"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux) Cc: emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Thu Jan 20 11:43:15 2022 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nAUuQ-000552-GG for ged-emacs-devel@m.gmane-mx.org; Thu, 20 Jan 2022 11:43:14 +0100 Original-Received: from localhost ([::1]:54526 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nAUuN-00009j-A5 for ged-emacs-devel@m.gmane-mx.org; Thu, 20 Jan 2022 05:43:11 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:44672) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nATlA-0004OX-Ia for emacs-devel@gnu.org; Thu, 20 Jan 2022 04:29:39 -0500 Original-Received: from [2a01:4f9:2b:f0f::2] (port=36628 helo=quimby.gnus.org) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nATl8-0001WI-QZ; Thu, 20 Jan 2022 04:29:36 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnus.org; s=20200322; h=Content-Type:MIME-Version:Message-ID:In-Reply-To:Date: References:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender: Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=xnlgmQD7ugHWzuf1QO1zFqSDG1lMw3Z8/Xbknrlmkjk=; b=UiQ0Dbn741Y96ewry22prE2KhV 4L7DvaiwVPm61T/v8OJghyEl5QRWlKl4PmPsCutfKT7N/flARdNO2pYUnVauPhKeWTFveWXbzmwAS OTA9M5DKZIkiprPZ6ihMdVc42KtrQ3qOFuMbYfVryLqz6I6KiTNEM4UI1QURPPv/SEOs=; Original-Received: from [84.212.220.105] (helo=giant) by quimby.gnus.org with esmtpsa (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nATl3-0003qO-7A; Thu, 20 Jan 2022 10:29:31 +0100 X-Now-Playing: DMX Krew's _We Are DMX_: "The Glass Room (Extended Version)" In-Reply-To: <83ee52rcar.fsf@gnu.org> (Eli Zaretskii's message of "Thu, 20 Jan 2022 11:23:08 +0200") X-Host-Lookup-Failed: Reverse DNS lookup failed for 2a01:4f9:2b:f0f::2 (failed) Received-SPF: pass client-ip=2a01:4f9:2b:f0f::2; envelope-from=larsi@gnus.org; helo=quimby.gnus.org X-Spam_score_int: -35 X-Spam_score: -3.6 X-Spam_bar: --- X-Spam_report: (-3.6 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:285028 Archived-At: Eli Zaretskii writes: > Lars, I'm not sure I understand the purpose of this function. Can you > explain? Like the NEWS item says, it's for cases where you want to ensure that there's no bidiness going on. > The way it is currently used is also strange, to say the least: you > apply it to a string made of a single character, so either it does > nothing to the string, or it will return an empty string. So the > following code will present the user with a riddle: > > (textsec-email-address-header-suspicious-p > "Lars Ingebrigtsen ") > "Disallowed character: `' (#x202e, RIGHT-TO-LEFT OVERRIDE)" > > The empty string between quotes is the riddle. Well... perhaps not optimal, but not really a riddle. But the function will probably be used elsewhere in textsec, too, but I haven't gotten round to auditing all the strings yet. > I think I understand the original problem: displaying a literal U+202E > there will mess up the text on display, but if that is the reason, the > right way is not to remove the character, it is to append to it the > necessary bidi controls to prevent the messup (and make the appended > controls be invisible). > > Here's an example: > > (insert (format "Disallowed character: `%s' (#x202e, RIGHT-TO-LEFT OVERRIDE)" > (concat (string ?\x202e) > (propertize (string ?\x202c ?\x200e) 'invisible t)))) > > This displays the RLO character, but doesn't mess up the description > after it. The display is identical to the one we have now, though: "Disallowed character: `' (#x202e, RIGHT-TO-LEFT OVERRIDE)" So still a riddle. But removing the bidi chars is "obviously correct" (and impervious to future attacks) for somebody that's not that familiar with the bidi machinery, so I prefer to remove the chars instead here. > We do something like that in descr-text.el, so I guess we need to > factor out that code and use it here. Isn't that bidi-string-mark-left-to-right? I forget. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no