From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail
From: Richard Stallman <rms@gnu.org>
Newsgroups: gmane.emacs.devel
Subject: Re: Can watermarking Unicode text using invisible differences sneak
 through Emacs, or can Emacs detect it?
Date: Mon, 07 Feb 2022 00:11:28 -0500
Message-ID: <E1nGwJE-000834-21@fencepost.gnu.org>
References: <E1nA2O7-0005jJ-RT@fencepost.gnu.org> <87sftk49ih.fsf@yahoo.com>
 <ac1b33c3ee4372818f4f081ae0e83fb2@webmail.orcon.net.nz>
 <837dawt0h4.fsf@gnu.org> <E1nAlIo-00039J-V9@fencepost.gnu.org>
 <838rv9plyf.fsf@gnu.org> <E1nB8A3-0002U0-JI@fencepost.gnu.org>
 <837dasntoj.fsf@gnu.org> <E1nBr3H-0002nk-HU@fencepost.gnu.org>
 <834k5tl4a9.fsf@gnu.org> <87mtjkt6m9.fsf@gmail.com> <83ilu8htws.fsf@gnu.org>
 <E1nCZ9R-0005I0-5n@fencepost.gnu.org>
 <3E718CA2-889F-4AEE-B79C-EB3A221D1CB2@gnu.org>
 <E1nDQvp-0005vd-Sz@fencepost.gnu.org> <83o83wc7gs.fsf@gnu.org>
 <E1nE1eT-0006oH-WB@fencepost.gnu.org> <8335l5brov.fsf@gnu.org>
 <E1nENtO-000603-J6@fencepost.gnu.org> <83mtjc838i.fsf@gnu.org>
 <E1nElMz-0003cv-CT@fencepost.gnu.org> <83zgna7hyd.fsf@gnu.org>
 <E1nF6nB-0006D1-WE@fencepost.gnu.org> <83ee4l78rw.fsf@gnu.org>
 <E1nFTf2-0001mh-6s@fencepost.gnu.org> <E1nFpdn-00053c-5X@fencepost.gnu.org>
 <83tudf2h4z.fsf@gnu.org> <E1nGYvh-00058o-9o@fencepost.gnu.org>
 <83sfswz834.fsf@gnu.org>
Reply-To: rms@gnu.org
Content-Type: text/plain; charset=Utf-8
Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214";
	logging-data="38734"; mail-complaints-to="usenet@ciao.gmane.io"
Cc: psainty@orcon.net.nz, luangruo@yahoo.com, kevin.legouguec@gmail.com,
 emacs-devel@gnu.org
To: Eli Zaretskii <eliz@gnu.org>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Mon Feb 07 06:15:16 2022
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane-mx.org
Original-Received: from lists.gnu.org ([209.51.188.17])
	by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
	(Exim 4.92)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>)
	id 1nGwMu-0009or-2w
	for ged-emacs-devel@m.gmane-mx.org; Mon, 07 Feb 2022 06:15:16 +0100
Original-Received: from localhost ([::1]:47480 helo=lists1p.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>)
	id 1nGwMs-0003HA-FY
	for ged-emacs-devel@m.gmane-mx.org; Mon, 07 Feb 2022 00:15:14 -0500
Original-Received: from eggs.gnu.org ([209.51.188.92]:55876)
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <rms@gnu.org>) id 1nGwJG-0002OH-8B
 for emacs-devel@gnu.org; Mon, 07 Feb 2022 00:11:30 -0500
Original-Received: from [2001:470:142:3::e] (port=44012 helo=fencepost.gnu.org)
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <rms@gnu.org>)
 id 1nGwJF-00069e-T5; Mon, 07 Feb 2022 00:11:29 -0500
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=Date:References:Subject:In-Reply-To:To:From:
 mime-version; bh=DpTLzjhynro19ofvVlIVRsdIryT5msfQ72WeLpxHhR4=; b=VlEPz/KqTQ1U
 5X81J3pahJXrtyiEWMbbUtJaW56OsOjwCc00Bw8PWdn0eaJrDoOUXvRZWLAsUvBhtk2KJPr+Dwv1x
 f6rxjyF/VljynPJpkRTqw7oJXmw9cKkekjHabLYHJrsCwDAF0pyYEsLmLOZYw97sjCUWULCYYv+x1
 AOoCHmBW6DXoLKUjCfpHyAygQX52ZEW67t4jrbtevMyDeQR9OSvizCtJWjOx56oU/7Pd/3Sgyq4Rp
 fu3xnY4+7pFSWGdtdAQfPBKMDh3HgybOSHXpK8OWXXH/gPVOeR+DgJb5wV3zjUwltUlig/jF7TjjZ
 oKG7jBf9ckedTuiuTlUbuw==;
Original-Received: from rms by fencepost.gnu.org with local (Exim 4.90_1)
 (envelope-from <rms@gnu.org>)
 id 1nGwJE-000834-21; Mon, 07 Feb 2022 00:11:29 -0500
In-Reply-To: <83sfswz834.fsf@gnu.org> (message from Eli Zaretskii on Sun, 06
 Feb 2022 10:56:47 +0200)
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
 <mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <https://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
 <mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org
Original-Sender: "Emacs-devel"
 <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>
Xref: news.gmane.io gmane.emacs.devel:286014
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/286014>

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > So you mean we should create a database of ASCII characters that
  > approximate the combining diacriticals?  But if so, how is it better
  > than having a database of complete characters and their ASCII
  > equivalents, like we have now in latin1-disp.el?

I think there are only around 20 diacritics.  There must be hundreds
of letters-with-diacritics.  The method I've proposed can handle
everything automatically, given a table about the 20-odd diacritics.
That's a great simplification from a table of hundreds of elements,
set up by hand.

  >  but a database of complete characters makes it easier to
  > make sure the results are optimal, because you see the original
  > complete character and the complete equivalent,

I don't follow you here.  In particular, what does "complete
equivalent" mean?  Concretely how would a result be "less than
optimal"?  Can you illustrate with an example?

  > I think reasonable appearance is more important than memory
  > consumption in this case,

What makes an appearance more or less reasonable when we're talking
about replacing one character with two or three that express
_symbolically_ which character it is?  I don't get it.

  > You can use ucs-normalize-NFKD-string for the job of
  > ucs-normalize-NFD-string as well:

  >   (append (ucs-normalize-NFKD-string "ã") nil) => (97 771)

Great!  That does most of the job, I think.

  > (I used 'append' here to make it evident that the result of the
  > decomposition is 2 characters, not one, since the Emacs display will
  > by default combine them into the same glyph as the original non-ASCII
  > character,

Not on a Linux console, I think.  When I have f and i in the buffer,
Emacs does not convert them into a ligature.  The only time it has to
try to deal with a ligature is when there is a Unicode ligature
code point in the buffer.

-- 
Dr Richard Stallman (https://stallman.org)
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)