From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: "T.V Raman" Newsgroups: gmane.emacs.devel Subject: Re: Can watermarking Unicode text using invisible differences sneak through Emacs, or can Emacs detect it? Date: Wed, 19 Jan 2022 09:36:56 -0800 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=gb18030 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="26200"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux) Cc: emacs-devel@gnu.org To: Richard Stallman Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Wed Jan 19 18:44:05 2022 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nAF09-0006ip-Ec for ged-emacs-devel@m.gmane-mx.org; Wed, 19 Jan 2022 18:44:05 +0100 Original-Received: from localhost ([::1]:60770 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nAF08-0002Lj-DN for ged-emacs-devel@m.gmane-mx.org; Wed, 19 Jan 2022 12:44:04 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:49090) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nAEtN-0001tt-IL for emacs-devel@gnu.org; Wed, 19 Jan 2022 12:37:06 -0500 Original-Received: from [2607:f8b0:4864:20::632] (port=41687 helo=mail-pl1-x632.google.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1nAEtL-0003aL-RE for emacs-devel@gnu.org; Wed, 19 Jan 2022 12:37:05 -0500 Original-Received: by mail-pl1-x632.google.com with SMTP id e8so2856544plh.8 for ; Wed, 19 Jan 2022 09:37:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version:content-transfer-encoding; bh=w4HNJyawGKtA6xd7PbnoJct1W13k6ZRqArPogCAGW5c=; b=dXIpwzm5GvcwH1HmVBI++pJvX7CJICBTY8j+ltb0xeoN/joVDoHxaOfos1yO7GZE+R NNu/8jtUI2eVOOpu4IRbYXdIwKnOLyMvrnWy/Wf+kbaLYZOaAHzdyR8Fp4456ePzKgnA joRRimY1mgKNR7tTw948EewGvS9rh18C6Fw0SB2+ZXQRoKHlXLjbv+ssUifhpl2OMQn7 e/WTsXAj69Y3wuQ/TUs7dnVAWWH1koxtMX/0dZHV1VHXdyG8G4nHsVNXbLbxGwSIcg+/ Z8U99PZrMjnUN22K9/2ygHo02pI6YEZI5gH1XoXVHFBb0i6ninAuW8YR0Bq3B8CgnnbA YoDw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:references:date:in-reply-to :message-id:user-agent:mime-version:content-transfer-encoding; bh=w4HNJyawGKtA6xd7PbnoJct1W13k6ZRqArPogCAGW5c=; b=JHSIxyODgOEL4n/E7SBSj/l7x2yn3QmSnQaalKCN2edidoRMbpR/5HRGN2IuJmmwax oQMVnJb75PZI4mdDu0hBEpsI0MUd9sVzf5j+fkacQcmPg3k4ZqiXXY/tHADid21GpU23 Rd6k9KuVal2iUfmmjueby/wK/eD9NKePqpgnBV8/PCq0t4F0S5Do9fUmPhiJzxmjEcNy pqbnQp99nLwTm3IVJ8wURaipTatDzQXr9U3yQbIMNEI3Xdc6MRZK9tu+Nz68UZfQI/LY WgSrv/Ln5j5FdU4cTlliCAMTQNJ2XhGWcPeM5DpQh9wlaFVe6fWMDMYzahcF2jYK8UGI Gseg== X-Gm-Message-State: AOAM530BXYDt62yCnyNKetCEaCBrh2cA75+LBKZVNJtucMEmqIPJk2ih KDdHJ1IFF/R+sBKvfDblYwq31GQ9YshBGg== X-Google-Smtp-Source: ABdhPJz4FQg0WIV+1ZrGB4H5gM5aaWl/NICtHHqfe1qErEHFHqrUm89pI/4V2t9oVJOJ4hMBBV/GvQ== X-Received: by 2002:a17:90b:1186:: with SMTP id gk6mr5512005pjb.127.1642613820557; Wed, 19 Jan 2022 09:37:00 -0800 (PST) Original-Received: from raman-glaptop (c-24-4-174-65.hsd1.ca.comcast.net. [24.4.174.65]) by smtp.gmail.com with ESMTPSA id d1sm233447pfu.206.2022.01.19.09.36.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 Jan 2022 09:36:59 -0800 (PST) In-Reply-To: (Richard Stallman's message of "Tue, 18 Jan 2022 23:15:59 -0500") X-Host-Lookup-Failed: Reverse DNS lookup failed for 2607:f8b0:4864:20::632 (failed) Received-SPF: pass client-ip=2607:f8b0:4864:20::632; envelope-from=raman@google.com; helo=mail-pl1-x632.google.com X-Spam_score_int: -167 X-Spam_score: -16.8 X-Spam_bar: ---------------- X-Spam_report: (-16.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, ENV_AND_HDR_SPF_MATCH=-0.5, PDS_HP_HELO_NORDNS=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:284996 Archived-At: Richard Stallman writes: This is indeed worrysome and has been around for a while. There is an even more insidious form of this hack where unicode chars that "appear like english letters" can be used --and a quick visual scan will miss it -- the trick is often used by spammers in domain-names within URLs as an example. As an example, there are Cyrillic letters that "look like" Roman letters. > [[[ To any NSA and FBI agents reading my email: please consider ]]] > [[[ whether defending the US Constitution against all enemies, ]]] > [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > > There is a thread now about confusables. > > I read this, > > Unicode allows user tracking by means of invisible text marking. Any > string can be converted into its binary form and then recoded into a > string of zero-width characters, which can then be invisibly inserted > into the text. If the text is posted elsewhere, the zero-width > character string can be extracted and the process reversed to figure > out the identity of the person who copied it. > > which seems ot be about a special case of confusables, and it makes me > wonder whether Emacs does, or could, show users when Unicode confusion > occurs, or prevent or fix it somehow. > > First, is that issue of invisible characters real? > > Second, does Emacs do anything now such that these tricks > won't succeed? > > If the problem exists in Emacs now, could we prevent it? I see a few > ways to try. I don't know whether they would work well. > > * Indicate the different encodings on the screen somehow. > > * Canonicalize such seqences (perhaps when reading text into Emacs), > so that different encodings of the same text become identical. > > * Use a stand-alone canonicalizer program. --=20 Thanks, --Raman(I Search, I Find, I Misplace, I Research) =817=A94 Id: kg:/m/0285kf1 =950=DC8