From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Richard Stallman Newsgroups: gmane.emacs.devel Subject: Re: Can watermarking Unicode text using invisible differences sneak through Emacs, or can Emacs detect it? Date: Mon, 07 Feb 2022 00:11:28 -0500 Message-ID: References: <87sftk49ih.fsf@yahoo.com> <837dawt0h4.fsf@gnu.org> <838rv9plyf.fsf@gnu.org> <837dasntoj.fsf@gnu.org> <834k5tl4a9.fsf@gnu.org> <87mtjkt6m9.fsf@gmail.com> <83ilu8htws.fsf@gnu.org> <3E718CA2-889F-4AEE-B79C-EB3A221D1CB2@gnu.org> <83o83wc7gs.fsf@gnu.org> <8335l5brov.fsf@gnu.org> <83mtjc838i.fsf@gnu.org> <83zgna7hyd.fsf@gnu.org> <83ee4l78rw.fsf@gnu.org> <83tudf2h4z.fsf@gnu.org> <83sfswz834.fsf@gnu.org> Reply-To: rms@gnu.org Content-Type: text/plain; charset=Utf-8 Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="38734"; mail-complaints-to="usenet@ciao.gmane.io" Cc: psainty@orcon.net.nz, luangruo@yahoo.com, kevin.legouguec@gmail.com, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Mon Feb 07 06:15:16 2022 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nGwMu-0009or-2w for ged-emacs-devel@m.gmane-mx.org; Mon, 07 Feb 2022 06:15:16 +0100 Original-Received: from localhost ([::1]:47480 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nGwMs-0003HA-FY for ged-emacs-devel@m.gmane-mx.org; Mon, 07 Feb 2022 00:15:14 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:55876) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nGwJG-0002OH-8B for emacs-devel@gnu.org; Mon, 07 Feb 2022 00:11:30 -0500 Original-Received: from [2001:470:142:3::e] (port=44012 helo=fencepost.gnu.org) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nGwJF-00069e-T5; Mon, 07 Feb 2022 00:11:29 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=Date:References:Subject:In-Reply-To:To:From: mime-version; bh=DpTLzjhynro19ofvVlIVRsdIryT5msfQ72WeLpxHhR4=; b=VlEPz/KqTQ1U 5X81J3pahJXrtyiEWMbbUtJaW56OsOjwCc00Bw8PWdn0eaJrDoOUXvRZWLAsUvBhtk2KJPr+Dwv1x f6rxjyF/VljynPJpkRTqw7oJXmw9cKkekjHabLYHJrsCwDAF0pyYEsLmLOZYw97sjCUWULCYYv+x1 AOoCHmBW6DXoLKUjCfpHyAygQX52ZEW67t4jrbtevMyDeQR9OSvizCtJWjOx56oU/7Pd/3Sgyq4Rp fu3xnY4+7pFSWGdtdAQfPBKMDh3HgybOSHXpK8OWXXH/gPVOeR+DgJb5wV3zjUwltUlig/jF7TjjZ oKG7jBf9ckedTuiuTlUbuw==; Original-Received: from rms by fencepost.gnu.org with local (Exim 4.90_1) (envelope-from ) id 1nGwJE-000834-21; Mon, 07 Feb 2022 00:11:29 -0500 In-Reply-To: <83sfswz834.fsf@gnu.org> (message from Eli Zaretskii on Sun, 06 Feb 2022 10:56:47 +0200) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:286014 Archived-At: [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > So you mean we should create a database of ASCII characters that > approximate the combining diacriticals? But if so, how is it better > than having a database of complete characters and their ASCII > equivalents, like we have now in latin1-disp.el? I think there are only around 20 diacritics. There must be hundreds of letters-with-diacritics. The method I've proposed can handle everything automatically, given a table about the 20-odd diacritics. That's a great simplification from a table of hundreds of elements, set up by hand. > but a database of complete characters makes it easier to > make sure the results are optimal, because you see the original > complete character and the complete equivalent, I don't follow you here. In particular, what does "complete equivalent" mean? Concretely how would a result be "less than optimal"? Can you illustrate with an example? > I think reasonable appearance is more important than memory > consumption in this case, What makes an appearance more or less reasonable when we're talking about replacing one character with two or three that express _symbolically_ which character it is? I don't get it. > You can use ucs-normalize-NFKD-string for the job of > ucs-normalize-NFD-string as well: > (append (ucs-normalize-NFKD-string "ã") nil) => (97 771) Great! That does most of the job, I think. > (I used 'append' here to make it evident that the result of the > decomposition is 2 characters, not one, since the Emacs display will > by default combine them into the same glyph as the original non-ASCII > character, Not on a Linux console, I think. When I have f and i in the buffer, Emacs does not convert them into a ligature. The only time it has to try to deal with a ligature is when there is a Unicode ligature code point in the buffer. -- Dr Richard Stallman (https://stallman.org) Chief GNUisance of the GNU Project (https://gnu.org) Founder, Free Software Foundation (https://fsf.org) Internet Hall-of-Famer (https://internethalloffame.org)