From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: David Kastrup Newsgroups: gmane.emacs.devel Subject: Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF-8 encoded Lisp files Date: Sun, 27 Sep 2015 09:42:06 +0200 Message-ID: <87io6wqpf5.fsf@fencepost.gnu.org> References: <20150921165211.20434.28114@vcs.savannah.gnu.org> <83fv27mt7r.fsf@gnu.org> <83wpvfix7i.fsf@gnu.org> <83fv23hr0z.fsf@gnu.org> <5605CB6B.4000102@cs.ucla.edu> <83twqhhf0g.fsf@gnu.org> <5606AC48.7090801@cs.ucla.edu> <83zj09fbzp.fsf@gnu.org> <5606C140.6090309@cs.ucla.edu> <878u7trwlb.fsf@fencepost.gnu.org> <5606E995.2000102@cs.ucla.edu> <83si61ezxd.fsf@gnu.org> <560700E1.4010403@cs.ucla.edu> <83pp14fhj5.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1443339746 5667 80.91.229.3 (27 Sep 2015 07:42:26 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 27 Sep 2015 07:42:26 +0000 (UTC) Cc: Paul Eggert , monnier@iro.umontreal.ca, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Sep 27 09:42:25 2015 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Zg6bG-0006Jm-4t for ged-emacs-devel@m.gmane.org; Sun, 27 Sep 2015 09:42:22 +0200 Original-Received: from localhost ([::1]:56440 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zg6bF-0007oE-Pm for ged-emacs-devel@m.gmane.org; Sun, 27 Sep 2015 03:42:21 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:48825) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zg6b4-0007o7-Di for emacs-devel@gnu.org; Sun, 27 Sep 2015 03:42:11 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Zg6b3-0001m2-He for emacs-devel@gnu.org; Sun, 27 Sep 2015 03:42:10 -0400 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:53417) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zg6b2-0001lO-0D; Sun, 27 Sep 2015 03:42:08 -0400 Original-Received: from localhost ([127.0.0.1]:39003 helo=lola) by fencepost.gnu.org with esmtp (Exim 4.82) (envelope-from ) id 1Zg6b1-0004CW-4C; Sun, 27 Sep 2015 03:42:07 -0400 Original-Received: by lola (Postfix, from userid 1000) id 90C0CDF4A9; Sun, 27 Sep 2015 09:42:06 +0200 (CEST) In-Reply-To: <83pp14fhj5.fsf@gnu.org> (Eli Zaretskii's message of "Sun, 27 Sep 2015 10:27:58 +0300") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.0.50 (gnu/linux) X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::e X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:190381 Archived-At: Eli Zaretskii writes: > I've also looked at the *.po files in the latest releases of GNU Make, > Gawk, Texinfo, and Binutils, and I find that between 20% and 25% of > such files still use non-UTF-8 encodings. Which, btw, I consider crazy. It's one thing to pick an encoding for local language processing and display. But for an internationalization system, it does not really make sense to venture to local encodings outside of I/O. There is a really strong case for using only UTF-8 in PO files instead of juggling with many-to-many encoding setups. > I see similar figures for the txi-*.tex files that came with Texinfo > 6.0. Presumably, that follows the default conventions of the > respective locales. Texinfo uses PDFTeX for its encoding processing, and PDFTeX is firmly an 8-bit system. TeX wouldn't be TeX if it wasn't macroprogrammed to deal with that, but Texinfo being a rather low-level format, UTF-8 processing time dwarves anything else. So if you have, say, a German input file for Texinfo and can process it either in Latin-1 or UTF-8, chances are that the Latin-1 version runs more than twice as fast. Now that's of course just the processing in printed form. Thanks to Texinfo now being written in Perl, the PDFTeX backend is likely the fastest right now either way so it may not be as much of a concern. But many Texinfo sources originate from a time where UTF-8 was either not supported at all or was a major contributor to conversion time. -- David Kastrup