From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Random832 Newsgroups: gmane.emacs.devel Subject: Re: Casting as wide a net as possible Date: Tue, 15 Dec 2015 14:03:55 -0500 Message-ID: <87lh8vjzz8.fsf@fastmail.com> References: <87zixcud1t.fsf@fastmail.com> <87poy7k0f7.fsf@fastmail.com> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: ger.gmane.org 1450206319 8104 80.91.229.3 (15 Dec 2015 19:05:19 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 15 Dec 2015 19:05:19 +0000 (UTC) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Dec 15 20:05:10 2015 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1a8uuK-0002i1-6Y for ged-emacs-devel@m.gmane.org; Tue, 15 Dec 2015 20:05:08 +0100 Original-Received: from localhost ([::1]:38779 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a8uuJ-0003Bv-GE for ged-emacs-devel@m.gmane.org; Tue, 15 Dec 2015 14:05:07 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:38631) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a8utk-0002Eo-P8 for emacs-devel@gnu.org; Tue, 15 Dec 2015 14:04:33 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a8uth-0007u9-FX for emacs-devel@gnu.org; Tue, 15 Dec 2015 14:04:32 -0500 Original-Received: from plane.gmane.org ([80.91.229.3]:56805) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a8uth-0007u3-8c for emacs-devel@gnu.org; Tue, 15 Dec 2015 14:04:29 -0500 Original-Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1a8utU-000186-8z for emacs-devel@gnu.org; Tue, 15 Dec 2015 20:04:16 +0100 Original-Received: from c-68-39-146-59.hsd1.in.comcast.net ([68.39.146.59]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 15 Dec 2015 20:04:16 +0100 Original-Received: from random832 by c-68-39-146-59.hsd1.in.comcast.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 15 Dec 2015 20:04:16 +0100 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 35 Original-X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: c-68-39-146-59.hsd1.in.comcast.net User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux) Cancel-Lock: sha1:mVyR9xuOWoLoC0Ozss41mLaZXEI= X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 80.91.229.3 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:196326 Archived-At: I’ve been too clever for my own good. My “C1 controls” example was not properly encoded as UTF-8, and I ignored the warnings provided by Gnus for this situation. Below is, I hope, my message as it was intended to appear (all properly encoded as UTF-8). Random832 writes: > There are occasional accented words e.g. naïve, borrowed from > other languages. And also punctuation marks (more common with > people who use certain word processing software packages that > automatically replace typewriter quotes with them). > > Well, obviously there’s Latin-1 and UTF-8. There’s also > Windows-1252, which is semi-compatible with Latin-1. You can > sometimes end up with the Windows-1252 bytes treated as if they > were Latin-1 C1 controls (and perhaps encoded further into > UTF-8). There are also older encodings that aren’t used much > anymore e.g. DOS 437/850, MacRoman, etc. > > I¹ve also seen content that was mechanically translated from one > to another using an 8-bit mapping table, with incompatible > characters mapped arbitrarily. For example, if you ever see > something with quotes/apostrophes replaced with superscripts, > like in this paragraph, this probably means the text originated > in MacRoman and was translated to Latin-1 with the ³André > Pirard² mapping. > > Anyway, the point is, since non-ASCII characters aren’t > pervasive, it’s easy to miss noticing that something’s wrong > with them. For one last demo, this paragraph features UTF-8, > treated as Windows-1252, and then re-encoded as UTF-8 again. P.S. It may be instructive to note that my message was apparently detected by Gnus as being in some kind of Japanese encoding.