From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Daniel Brooks Newsgroups: gmane.emacs.devel Subject: Re: Unicode confusables and reordering characters considered harmful, a simple solution Date: Wed, 03 Nov 2021 12:54:31 -0700 Message-ID: <87fssdrp54.fsf@db48x.net> References: <875ytag0hb.fsf@yahoo.com> <87zgqmd5np.fsf@mat.ucm.es> <83wnlqk3rn.fsf@gnu.org> <72dd5c2a-42c7-b12e-05ed-e93adbd89727@gmail.com> <83ilxajyhw.fsf@gnu.org> <83fssejxf8.fsf@gnu.org> <835ytajsv2.fsf@gnu.org> <831r3yjqo9.fsf@gnu.org> <83v91aibe7.fsf@gnu.org> <87o872s0wf.fsf_-_@db48x.net> <83lf25gm1j.fsf@gnu.org> <83ee7xgio2.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="16844"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) Cc: cpitclaudel@gmail.com, emacs-devel@gnu.org, stefan@marxist.se, monnier@iro.umontreal.ca, Yuri Khan To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Wed Nov 03 20:55:39 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1miMME-0004AF-Bo for ged-emacs-devel@m.gmane-mx.org; Wed, 03 Nov 2021 20:55:38 +0100 Original-Received: from localhost ([::1]:57172 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1miMMD-0002OR-A5 for ged-emacs-devel@m.gmane-mx.org; Wed, 03 Nov 2021 15:55:37 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:53106) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1miMLG-00011k-Rk for emacs-devel@gnu.org; Wed, 03 Nov 2021 15:54:38 -0400 Original-Received: from smtp-out-4.mxes.net ([2605:d100:2f:10::315]:60430) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1miMLE-0007AT-RE for emacs-devel@gnu.org; Wed, 03 Nov 2021 15:54:38 -0400 Original-Received: from Customer-MUA (mua.mxes.net [10.0.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.mxes.net (Postfix) with ESMTPSA id 4HkyCX40XBz3cD7; Wed, 3 Nov 2021 15:54:32 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mxes.net; s=mta; t=1635969273; bh=oATtoke8EvWc8Ezmzb8WGwY3PtJ8Xko2B6UYcUKw6vI=; h=From:To:Subject:References:Date:In-Reply-To:Message-ID: MIME-Version:Content-Type; b=I+sqv04kOumUeeS467ot0vW6EPMcS6hYi++t2x1s+8fvBF1ECa9ovCN+gCQhXtW3o emSFlph7MFoRRsfgnUrKYj7k3GXR/kntY/mTZ+ha0LtOCN/NYo+H1WVuS0uJNzStw7 JaP+8wXuLlQ7FW1iFzAfMg+3iDUAuLVIZM5AC9EU= Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAABGdBTUEAALGOfPtRkwAAABJQ TFRFpKfbdou67PD6JjJgAwUWXGSeIcyLHgAAAkZJREFUOI1VU8Fy6yAMxLi+Q13fCZ3cnQL3dqTc 7RD+/1feStDXVnXHDuvVSivZTMba2GPdw3gyCGcMAFxTyrTd9dwGoxHiZX9PmRFUHYAQlGGtXY+F Uk0SJOxgJiUEnH1qkitT9D+pQub7qGAmUbR6bu3CvI96Yv6QqkBBMrsyfZccr1/RDXGDTLf4P7ZY glVxe2V+/ACXWO1gvDO9/gDRpFFVmPluvLcmBjd5H6d8DEte+Pbk4rcY/Fa5tLKLOtCZsuQKYhpa LOkYDT7hESya7/WIET3lfQBqX0pwFtbI832Is0ayMUR9B+12xjgPCQ089cfwkCkX6L5TPmRelJTh zMS0Sz1PyjLAMCUWjcmgQLWQMds+e3aaauZDf9dU9A2/8kPVF2odCUoMKHkfjJR+mbgC+DRiycw5 3XSqGe6HmhN/AWjHypkAXOAFW5EiuA1ge2GiZuMb0s1fSEXcATeLUfbyEY2L8yPOmdSsdghQXx3K pz2eoeXuYvMCINVFDrCdNfVUp4eJ6cSEbjbgFjBEvonGGTrgv9cHjAc8aVgSAPoxaONbzfwhDIhR at7IIS7fAGiDSwIA9alhhTBzfA7YM2FY6eMwayrIGK8FDFmshmUA43WqhFtpvoqG9HHaJ7fqtgTz 8EWVkgZgtsylFliHDgk0MB7KAEC45C/rgnGvanNLXyzOeTzcT2nw/N44gfrtYXRQLoz9Q3TgmJRx 2Mx/Q51qzpm+l3m8z2SWBqC5+PZXAtNYlGFf/gKfHfjFkDT4x7od7R+w3Ls+ZdQBuQAAAABJRU5E rkJggg== In-Reply-To: <83ee7xgio2.fsf@gnu.org> (Eli Zaretskii's message of "Wed, 03 Nov 2021 21:09:49 +0200") X-Sent-To: Received-SPF: none client-ip=2605:d100:2f:10::315; envelope-from=db48x@db48x.net; helo=smtp-out-4.mxes.net X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, SPF_HELO_PASS=-0.001, SPF_NONE=0.001 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:278619 Archived-At: Eli Zaretskii writes: >> From: Yuri Khan >> Date: Thu, 4 Nov 2021 01:45:17 +0700 >> Cc: Daniel Brooks , Cl=C3=A9ment Pit-Claudel ,=20 >> Stefan Kangas , Stefan Monnier ,=20 >> Emacs developers >>=20 >> On Thu, 4 Nov 2021 at 00:56, Eli Zaretskii wrote: >>=20 >> > The problem with these remappings is that you then get to somehow >> > discern between the remapped characters and the real characters which >> > look identically on display. >>=20 >> Real characters are fontified as whichever syntax unit they belong to. >> Remapped characters are fontified as whitespace-space-face or >> whitespace-hspace-face depending on whether you add them to >> whitespace-space-regexp or whitespace-hspace-regexp. > > I just used what Daniel posted, and that doesn't display the remapped > characters in any distinct face. Gotta tinker? Yea, I intend to tinker in order to add a new category that has it=E2=80=99= s own face and can be toggled on and off separately and so on. I haven=E2=80=99t actually started yet though. > Do you read Hebrew? Those characters look like line noise there, > whereas the text with the default display is perfectly readable, and > most people won't even know these controls are there (as intended). My suggestion is to only enable it by default in _programming modes_. It should remain disabled in ordinary prose like a TUTORIAL file. > What for? The absolute majority of people won't have any idea what is > the effect of each of these controls, and how it differs from others. > Even I many times need to talk myself through their effect on display. > The UBA spec weighs in at more than 30 pages of highly technical text, > and I don't expect people to memorize it by heart. I totally agree, but I think that this is not very relevant. The whole point is for a programmer who is unaware of BiDi in general to go =E2=80=9C= WTF=E2=80=BD=E2=80=9D when these characters show up in a source file one day, so that they can have something to ask questions about. `what-cursor-position' will show the face, once a face is available, and it also shows the name of the character. Both are good ways for the user to find more information, and in principle we could have it show other information as well. We could pull a description from the Unicode database perhaps, or just add extra help messages for individual characters. Now that I think about it, maybe we should just show the docstring for the face right there next to the name. That would save me a step from time to time, if nothing else. db48x