From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Paul Eggert Newsgroups: gmane.emacs.devel Subject: Re: Encoding of etc/HELLO Date: Fri, 20 Apr 2018 14:26:10 -0700 Organization: UCLA Computer Science Department Message-ID: References: <83sh7qxb5j.fsf@gnu.org> <87po2t6gdm.fsf@gmx.de> <83muxxyijl.fsf@gnu.org> <83lgdhyeqv.fsf@gnu.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable X-Trace: blaine.gmane.org 1524259498 9842 195.159.176.226 (20 Apr 2018 21:24:58 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Fri, 20 Apr 2018 21:24:58 +0000 (UTC) User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 Cc: emacs-devel@gnu.org To: Stefan Monnier , Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Apr 20 23:24:54 2018 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1f9dWX-0002OM-0I for ged-emacs-devel@m.gmane.org; Fri, 20 Apr 2018 23:24:53 +0200 Original-Received: from localhost ([::1]:35141 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1f9dYb-0005J9-Fd for ged-emacs-devel@m.gmane.org; Fri, 20 Apr 2018 17:27:01 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:39693) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1f9dXt-0005Il-Aw for emacs-devel@gnu.org; Fri, 20 Apr 2018 17:26:18 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1f9dXs-0000bn-DX for emacs-devel@gnu.org; Fri, 20 Apr 2018 17:26:17 -0400 Original-Received: from zimbra.cs.ucla.edu ([131.179.128.68]:54390) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1f9dXo-0000Uc-KJ; Fri, 20 Apr 2018 17:26:12 -0400 Original-Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 60F8C16161A; Fri, 20 Apr 2018 14:26:11 -0700 (PDT) Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id GHHAJGgK_82J; Fri, 20 Apr 2018 14:26:10 -0700 (PDT) Original-Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 59199161668; Fri, 20 Apr 2018 14:26:10 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id kT3Em0k2qcZ1; Fri, 20 Apr 2018 14:26:10 -0700 (PDT) Original-Received: from Penguin.CS.UCLA.EDU (Penguin.CS.UCLA.EDU [131.179.64.200]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 436EE16161A; Fri, 20 Apr 2018 14:26:10 -0700 (PDT) In-Reply-To: Content-Language: en-US X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy] X-Received-From: 131.179.128.68 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:224751 Archived-At: On 04/20/2018 01:42 PM, Stefan Monnier wrote: > Clearly this problem is not specific to Emacs, so what do people do? > Hold on to iso-2022 for as long as they can (like we do in Emacs)? > Give up on these "details" of rendering for files using a mix of C, J, = and K? > Rely on higher-level info (XML tags and friends) to carry the charset i= nfo? For most uses, people typically just use UTF-8 and give up on the=20 details, which tend to be in areas that many users don't care much about=20 anyway. In practice if (say) a Japanese reader sees a Chinese quotation=20 in a page of Japanese text, there's an excellent chance the reader won't=20 much mind that the Chinese characters are rendered in Japanese-style, as=20 this has long been common practice in Japanese printing anyway. There are of course exceptions where it really matters which font you=20 use, such as the Wikipedia page on Chinese character variants that=20 Cl=C3=A9ment mentioned. But these are rare, and are typically handled by=20 means other than plain text. It's like the Wikipedia page on kerning,=20 which uses images rather than plain UTF-8 text to illustrate how to kern=20 characters properly. I mildly prefer multilingual text to be rendered in a consistent style=20 for my language, as opposed to having it rendered separately for readers=20 of each of its component languages, as this makes the text a bit easier=20 for me to read (which is the point of text, isn't it?). But this of=20 course is merely a style preference. For what it's worth, the April 2018 w3techs.com numbers say that UTF-8=20 is used by 91.3% of websites whose character encoding they know, and=20 that this number is steadily growing (it was 88.9% a year ago). In=20 contrast, ISO 2022 usage is declining steadily. Of course the web is not=20 the entire universe; still, it's pretty clear which way the world is goin= g.