From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail
From: Eli Zaretskii <eliz@gnu.org>
Newsgroups: gmane.emacs.devel
Subject: Re: Unicode confusables and reordering characters considered harmful
Date: Wed, 03 Nov 2021 19:24:29 +0200
Message-ID: <83pmrhgnjm.fsf@gnu.org>
References: <YYE1sEv6yS1bBUcu@odonien.localdomain>
 <8b09eed8-36dd-61f5-2a8f-8525122df98c@gmail.com>
 <CAHiT=DHQN34ba5pYvdLy7kWb_02G4SuWmDxkL4P66BhXNX3B5A@mail.gmail.com>
Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214";
	logging-data="778"; mail-complaints-to="usenet@ciao.gmane.io"
Cc: emacs-devel@gnu.org
To: Reini Urban <reini.urban@gmail.com>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Wed Nov 03 18:28:50 2021
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane-mx.org
Original-Received: from lists.gnu.org ([209.51.188.17])
	by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
	(Exim 4.92)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>)
	id 1miK4A-000AUw-MI
	for ged-emacs-devel@m.gmane-mx.org; Wed, 03 Nov 2021 18:28:50 +0100
Original-Received: from localhost ([::1]:33546 helo=lists1p.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>)
	id 1miK49-0008FZ-Qk
	for ged-emacs-devel@m.gmane-mx.org; Wed, 03 Nov 2021 13:28:49 -0400
Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:40024)
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@gnu.org>) id 1miK03-0000UB-LE
 for emacs-devel@gnu.org; Wed, 03 Nov 2021 13:24:36 -0400
Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:33008)
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@gnu.org>)
 id 1miK02-0000Eh-Dd; Wed, 03 Nov 2021 13:24:35 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date:
 mime-version; bh=Dww9m90/qaE5u3PMUD/D8e+dS3Vsrdn4ZLdbVjURnyM=; b=gNK/vURB8fqY
 rpUSYrHT/SxVtkvfYeJvvCo19HIue4pa0ooSv4SHyp7WfGYc99rFyzwxNLvnF5MCstamZczF08tTA
 hbUgXl2JyWpNdYqOjLRcf9DxQQqiFPJaqu/ovGEZ0M2U2Y5HHSnhdMlv2m7+2hQsBcjRagkj558+l
 5xkEmmtjSxU7hCKevdfMLynu2BI2BYJv3wlE93hMRL9eNQvwTIvcVj8BHGshxDo++zjG6LA6k3Cqp
 /+80QjfAobcRzeUGsvtLvi8gM4mAvpMLlneDgCqtvvbWBqkdhteifyMoJDcGPxVF/Qs87sLqwzrb9
 l+cLL5WVtfcNeltgRfk7yQ==;
Original-Received: from [87.69.77.57] (port=2961 helo=home-c4e4a596f7)
 by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@gnu.org>)
 id 1miJzy-0001MG-IA; Wed, 03 Nov 2021 13:24:32 -0400
In-Reply-To: <CAHiT=DHQN34ba5pYvdLy7kWb_02G4SuWmDxkL4P66BhXNX3B5A@mail.gmail.com>
 (message from Reini Urban on Wed, 3 Nov 2021 16:07:51 +0100)
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
 <mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <https://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
 <mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org
Original-Sender: "Emacs-devel"
 <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>
Xref: news.gmane.io gmane.emacs.devel:278591
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/278591>

> From: Reini Urban <reini.urban@gmail.com>
> Date: Wed, 3 Nov 2021 16:07:51 +0100
> 
> The issue is that libc, the C standard committee, linux and most others are ignoring the unicode identifier
> security guidelines.
> Identifiers must be identifiable, but strings should not be touched.
> 
> Identifiers are all names, pathnames, variable names, user names, ... but not arbitrary strings.
> IDE's are just one place to fix it (that's why glib does it), but the core is more important.
> 
> The ones who do care about, like java (the compiler), my cperl (the compiler and runtime, because it is
> dynamic), rust (the compiler), glib (the library), do follow these guidelines.
> All C compilers and most others are insecure. Linux Filesystems are insecure. The old APPLE Filesystem
> was secure, the new is again insecure.
> Also the libc's cannot deal with de-normalized characters at all. grep, sed, coreutils all have outstanding
> unorm patches, because libunicode is too slow. Because it iterates over the string via callbacks.
> 
> In short you need to normalize each identifier, check for proper XID_Start/XID_Continue, 
> check your document for mixed scripts (several combinations are allowed, several disallowed, 
> HAN unification did a good job, but greek vs cyrillic is the worst), and forbid bidi changes.

I'm not sure I follow: the examples in the original paper which
sparked all this brouhaha didn't touch any identifiers.  All the
identifiers in those examples were perfectly compliant with the
Unicode guidelines, AFAIR.  What the examples did was insert
directional format controls so as to reorder _punctuation_ characters,
in a way that changes the visual appearance and the interpreted
semantics of the code.  All of the format controls were inserted
within whitespace, not inside any identifiers.

So I'm not sure how what you tell is relevant to the issue at hand;
could you perhaps explain?