From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Daniel Brooks Newsgroups: gmane.emacs.devel Subject: Re: character sets as they relate to =?utf-8?B?4oCcUmF34oCd?= string literals for elisp Date: Mon, 04 Oct 2021 13:49:53 -0700 Message-ID: <87a6jotszy.fsf@db48x.net> References: <4209edd83cfee7c84b2d75ebfcd38784fa21b23c.camel@crossproduct.net> <87v92ft9z6.fsf@db48x.net> <87o885tyle.fsf@db48x.net> <83k0it6lu5.fsf@gnu.org> <87k0isu7hz.fsf_-_@db48x.net> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="37090"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) Cc: emacs-devel@gnu.org To: Stefan Monnier , Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Mon Oct 04 22:51:40 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1mXUvz-0009Gs-7P for ged-emacs-devel@m.gmane-mx.org; Mon, 04 Oct 2021 22:51:39 +0200 Original-Received: from localhost ([::1]:47496 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mXUvs-0001lB-S2 for ged-emacs-devel@m.gmane-mx.org; Mon, 04 Oct 2021 16:51:32 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:51000) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mXUuO-0000vb-Lj for emacs-devel@gnu.org; Mon, 04 Oct 2021 16:50:00 -0400 Original-Received: from smtp-out-4.mxes.net ([2605:d100:2f:10::315]:52164) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mXUuM-0004Bi-HO for emacs-devel@gnu.org; Mon, 04 Oct 2021 16:50:00 -0400 Original-Received: from Customer-MUA (mua.mxes.net [10.0.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.mxes.net (Postfix) with ESMTPSA id 4HNXsG52lJz3c91; Mon, 4 Oct 2021 16:49:54 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mxes.net; s=mta; t=1633380595; bh=7Sf5lbjEkQ1pRxFjQRFQWvYRSbZD3u5Vm9DWlkAMzQ4=; h=From:To:Subject:References:Date:In-Reply-To:Message-ID: MIME-Version:Content-Type; b=b9VGz8g+XOAxmeDPhAwKpH86nM+ikoR3V2zorVyBpEiLQrxdFtneIbk0YJROG9Umu dYLNZPxCN6YtvN8p4yWqjSnh39Y4EdeoqwNsQ+gYuvZ0Jlbllu8aft19xMqA3aXNQK p2WJ0jzzhw3eZPFkhnGAb+LmFjEmaz0+Ld9fXprQ= Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAABGdBTUEAALGOfPtRkwAAABJQ TFRFpKfbdou67PD6JjJgAwUWXGSeIcyLHgAAAkZJREFUOI1VU8Fy6yAMxLi+Q13fCZ3cnQL3dqTc 7RD+/1feStDXVnXHDuvVSivZTMba2GPdw3gyCGcMAFxTyrTd9dwGoxHiZX9PmRFUHYAQlGGtXY+F Uk0SJOxgJiUEnH1qkitT9D+pQub7qGAmUbR6bu3CvI96Yv6QqkBBMrsyfZccr1/RDXGDTLf4P7ZY glVxe2V+/ACXWO1gvDO9/gDRpFFVmPluvLcmBjd5H6d8DEte+Pbk4rcY/Fa5tLKLOtCZsuQKYhpa LOkYDT7hESya7/WIET3lfQBqX0pwFtbI832Is0ayMUR9B+12xjgPCQ089cfwkCkX6L5TPmRelJTh zMS0Sz1PyjLAMCUWjcmgQLWQMds+e3aaauZDf9dU9A2/8kPVF2odCUoMKHkfjJR+mbgC+DRiycw5 3XSqGe6HmhN/AWjHypkAXOAFW5EiuA1ge2GiZuMb0s1fSEXcATeLUfbyEY2L8yPOmdSsdghQXx3K pz2eoeXuYvMCINVFDrCdNfVUp4eJ6cSEbjbgFjBEvonGGTrgv9cHjAc8aVgSAPoxaONbzfwhDIhR at7IIS7fAGiDSwIA9alhhTBzfA7YM2FY6eMwayrIGK8FDFmshmUA43WqhFtpvoqG9HHaJ7fqtgTz 8EWVkgZgtsylFliHDgk0MB7KAEC45C/rgnGvanNLXyzOeTzcT2nw/N44gfrtYXRQLoz9Q3TgmJRx 2Mx/Q51qzpm+l3m8z2SWBqC5+PZXAtNYlGFf/gKfHfjFkDT4x7od7R+w3Ls+ZdQBuQAAAABJRU5E rkJggg== In-Reply-To: (Stefan Monnier's message of "Mon, 04 Oct 2021 12:34:19 -0400") X-Sent-To: Received-SPF: none client-ip=2605:d100:2f:10::315; envelope-from=db48x@db48x.net; helo=smtp-out-4.mxes.net X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, SPF_HELO_PASS=-0.001, SPF_NONE=0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:276270 Archived-At: Eli Zaretskii writes: >> From: Daniel Brooks >> Cc: emacs-devel@gnu.org, rms@gnu.org, anna@crossproduct.net >> Date: Mon, 04 Oct 2021 08:36:40 -0700 >>=20 >> Eli Zaretskii writes: >>=20 >> > We can only do this much. We don't develop any terminal emulators >> > here, except the two built into Emacs. >>=20 >> I was referring broadly to the whole GNU project, not trying to assign >> the work specifically to the Emacs project. :) > > Then this is not necessarily the best place to raise these issues. I was replying directly to RMS concerning his statement about non=E2=80=93a= scii characters. RMS is known to have opinions with a wider scope than will fit in any single mailing list, and I was responding in kind. I apologize for using =E2=80=9Cwe=E2=80=9D so broadly without thinking; it is= certainly the kind of thing that is confusing, so I should have been much more explicit. >> Suppose our hypothetical contributor wanted to contribute a new mode >> with this type of code in it: >>=20 >> (defun =E6=97=A5=E6=9C=AC () (message "=E6=97=A5=E6=9C=AC")) > > It would be very inconvenient to have such code. Absolutely! Possibly almost as inconvenient as having to learn some English in order to develop the thing. But it doesn=E2=80=99t answer my question. I see that prolog-mode only gets a few commits per year (9 last year and 5 so far this year; the high water mark is 10 in a single year). It imposes a pretty minimal support burden and if it has bugs you can simply ignore them until a Prolog user brings you a patch, because those bugs can only affect Prolog users. There is a lot of code in Emacs which fits this description. Suppose this hypothetical contribution were a language mode for a Japanese programming language, and thus had the same support profile? Suppose also that all messages to the user have already been localized into English, and that there is an English alias for the mode name (that is, `=E6=97=A5=E6=9C=AC-mode' toggles the mode, but there=E2=80=99s an alia= s like `ja-mode' or something), while the rest of the identifiers are in Japanese. Would there be any reason to turn away that contribution, or to make the contributor rewrite it? Stefan Monnier writes: > FWIW, I consider this case quite different from your raw-string case, > because here the main issue for me is whether the code is maintainable > and reviewable by someone else. So, in the context of Emacs, GNU > ELPA, and NonGNU ELPA, I find such uses problematic. If I could count > on having someone else I trust do the reviewing, I might reconsider. I think that if I read between the lines, you are saying that the Emacs project _could_ grow to become multi=E2=80=93lingual at all levels, with a sufficient number of invested contributors who could each review and maintain different parts of the code. Also that like Eli, you would find it inconvenient or problematic in the short term. Is that a fair reading? > We have that where it's inevitable (like in some packages that define > features specific to some languages), but even there we prefer to use > the likes of \u672c instead of the literal characters. At the very > least, that avoids the problem with not having a suitable font to > display them. As an aside, I think that this is a sensible enough choice, though I would prefer to choose a more automatic solution. That is, relying on particular viewers of the source code to tweak their Emacs settings to present the source differently instead of relying on contributors to use the codepoint numbers directly. As you suggested in bug#50865, changing the encoding will automatically render those characters with their codepoint numbers, which is nicer than forcing a human to type them in before committing. This has the advantage of working on identifiers as well as string literals. >> If we could see our way to accepting such code, then I don=E2=80=99t see= why we >> couldn=E2=80=99t accept code that uses Unicode in much smaller ways, suc= h as >> this: >>=20 >> (defvar variable-containing-html #r=EF=BD=A2cli= ck here=EF=BD=A3) > > If we avoid non-ASCII characters, we avoid some problems, so all else > being equal, it's better. Hmm. If we (speaking as broadly as possible!) avoid a problem forever, how will the problem ever get fixed? Personally, I think that the problems are now mostly fixed. Emacs has very complete support for character sets, better than virtually all other applications. Outside of Emacs, support for Unicode is practically omnipresent as well. There are still notable gaps, like the Linux console, but they are the exception rather than the rule. I don=E2=80=99t t= hink that there is much of a problem left to avoid! >> PS: it occurs to me to wonder if my use of Unicode in the prose of this >> message, outside of the examples, detracted from its readability in any >> way? > > If someone is reading this on a text-mode terminal, it could. I am asking if anyone reading my messages, either this one or any of the last dozen I have sent to the list, have noticed any specific problems. I have used non=E2=80=93ascii characters in all of them. I=E2=80= =99m wondering if anyone even noticed. If nobody noticed, or if they didn=E2=80=99t detract from readability, then it is unlikely that Unicode is a problem in general. Yuri Khan writes: > On Tue, 5 Oct 2021 at 01:58, Eli Zaretskii wrote: > >> If someone is reading this on a text-mode terminal, it could. > > We should probably invent a term more accurate than =E2=80=9Ctext-mode > terminal=E2=80=9D for things that fail to display text. True! :D I prefer to say =E2=80=9CLinux console=E2=80=9D in reference to the one ter= minal emulator that we know has severe problems with Unicode. There are many terminal emulators out there, and I=E2=80=99m sure a few of them have probl= ems, but for the most part I think all of them can handle Unicode pretty well primarily because they all rely on OS libraries to do the heavy lifting. The Linux console is handicapped in this area primarily because it is inside the kernel, and thus cannot dynamically load libharfbuzz and libfreetype. (But I can imagine a hypothetical future kernel module which statically links against them in order to provide a full=E2=80=93feat= ured terminal in the console.) db48x