From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Stefan Kangas Newsgroups: gmane.emacs.devel Subject: Re: Unicode confusables and reordering characters considered harmful Date: Wed, 3 Nov 2021 12:19:58 +0100 Message-ID: References: <875ytag0hb.fsf@yahoo.com> <87zgqmd5np.fsf@mat.ucm.es> <83wnlqk3rn.fsf@gnu.org> <72dd5c2a-42c7-b12e-05ed-e93adbd89727@gmail.com> <83ilxajyhw.fsf@gnu.org> <83fssejxf8.fsf@gnu.org> <835ytajsv2.fsf@gnu.org> <11d5fecb44af1d388b7f@heytings.org> <11d5fecb449846dc0851@heytings.org> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="40449"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Eli Zaretskii , =?UTF-8?Q?Cl=C3=A9ment_Pit=2DClaudel?= , Stefan Monnier , Emacs developers To: Gregory Heytings Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Wed Nov 03 12:21:12 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1miEKN-000ADD-9H for ged-emacs-devel@m.gmane-mx.org; Wed, 03 Nov 2021 12:21:11 +0100 Original-Received: from localhost ([::1]:44562 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1miEKL-0004y9-Ik for ged-emacs-devel@m.gmane-mx.org; Wed, 03 Nov 2021 07:21:09 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:55574) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1miEJS-00040f-Iu for emacs-devel@gnu.org; Wed, 03 Nov 2021 07:20:14 -0400 Original-Received: from mail-pg1-f173.google.com ([209.85.215.173]:46836) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1miEJP-0002we-Kv; Wed, 03 Nov 2021 07:20:13 -0400 Original-Received: by mail-pg1-f173.google.com with SMTP id m21so2057175pgu.13; Wed, 03 Nov 2021 04:20:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=rImGhTwBcrK9UX+S10kBDFa8GWJ4DWG2JGX6XDsVvZw=; b=w0/8FiATCCwyhAVXKgXA0cOoDZZglMVj4e6L9ryCrZNIkFXQc0kfIewbv+wbSOREGI AoaZQyhggZcLWdWzKVH6uUGp4zmlVuG359nEuCi0AKq5b1j3r1thpjiLE8HON28vU+c8 Aj7+1qGiPd8HegF87hfFV75uvXeCI/gTHkD4sJkzQt1Ik2FIJ2c4p1r1mp3BhOJJ4Khr v3QgszsACkM5X/lf2vOys96R6LK3rkE3M7uw36cLNuismz8gBnLU84E1+ua6f+zbROSQ 7E+98sLgkzZ0nLjdEZRnozBgE8ZPUKfVSn7+0603gynTyuNY4+/bQHotDWWctjeARmqS WP3A== X-Gm-Message-State: AOAM530MacrSp5CGqszqqjABwTwyyS19DMJXumIL5XMPmcSfFSUMIDxd Nvnj8QoKOBxYg7PrJ0jC3m1NMjqY5MR8WBTVFq94pBcX X-Google-Smtp-Source: ABdhPJzHtjZ+eNJsKA68RZIZvIcg3hRCo8x+dvGcDtOUeDHy80FRfFxusnwNh7kyRcpa6Ag1fMeDQnnO00h88hpBF6o= X-Received: by 2002:a63:370c:: with SMTP id e12mr32424225pga.359.1635938409889; Wed, 03 Nov 2021 04:20:09 -0700 (PDT) In-Reply-To: <11d5fecb449846dc0851@heytings.org> Received-SPF: pass client-ip=209.85.215.173; envelope-from=stefankangas@gmail.com; helo=mail-pg1-f173.google.com X-Spam_score_int: -13 X-Spam_score: -1.4 X-Spam_bar: - X-Spam_report: (-1.4 / 5.0 requ) BAYES_00=-1.9, FREEMAIL_FORGED_FROMDOMAIN=0.249, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.249, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:278564 Archived-At: Gregory Heytings writes: > There's some data that shows that this is extremely rare in general: the > Rust Security Response WG analyzed the 70322 crates and found only 5 in > which these codepoints were present (see [1]). That's ~0.01 %. > > Moreover such highlighting does not make the source code or text > unreadable, even in those few legitimate cases. Depending on how you define it, there is at least one major world language (Arabic) that has a RTL script, and other major languages such as Urdu, Farsi and Hebrew also use it (and a couple of others too). So I think we should consider to what extent your proposal might hurt users of such languages. Are these characters important to write comments and strings in any of those languages? Will your proposal make it harder to type in such languages? If yes, are there less invasive solutions? The Rust data point is relevant, but in my opinion not sufficient to outweigh the above considerations. But even if that wasn't the case, we would still need to consider languages like C, Fortran, PHP, JavaScript, etc. We are, after all, talking about hundreds of millions of native speakers of the mentioned languages, a certain proportion of which will be Emacs users interested in writing strings and comments in their own language.