From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Random832 Newsgroups: gmane.emacs.devel Subject: Re: Casting as wide a net as possible Date: Tue, 15 Dec 2015 13:54:20 -0500 Message-ID: <87poy7k0f7.fsf@fastmail.com> References: <87zixcud1t.fsf@fastmail.com> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 8bit X-Trace: ger.gmane.org 1450205719 30866 80.91.229.3 (15 Dec 2015 18:55:19 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 15 Dec 2015 18:55:19 +0000 (UTC) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Dec 15 19:55:10 2015 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1a8ukd-00018O-TF for ged-emacs-devel@m.gmane.org; Tue, 15 Dec 2015 19:55:08 +0100 Original-Received: from localhost ([::1]:38717 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a8ukd-0004uv-67 for ged-emacs-devel@m.gmane.org; Tue, 15 Dec 2015 13:55:07 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:60694) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a8ukK-0004um-DK for emacs-devel@gnu.org; Tue, 15 Dec 2015 13:54:49 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a8ukG-0003nn-6G for emacs-devel@gnu.org; Tue, 15 Dec 2015 13:54:48 -0500 Original-Received: from plane.gmane.org ([80.91.229.3]:55978) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a8ukF-0003nj-W8 for emacs-devel@gnu.org; Tue, 15 Dec 2015 13:54:44 -0500 Original-Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1a8ukD-0000C3-7G for emacs-devel@gnu.org; Tue, 15 Dec 2015 19:54:41 +0100 Original-Received: from c-68-39-146-59.hsd1.in.comcast.net ([68.39.146.59]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 15 Dec 2015 19:54:41 +0100 Original-Received: from random832 by c-68-39-146-59.hsd1.in.comcast.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 15 Dec 2015 19:54:41 +0100 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 31 Original-X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: c-68-39-146-59.hsd1.in.comcast.net User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux) Cancel-Lock: sha1:ab04WkqCxpoZF00QpJgoJZvCVjA= X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 80.91.229.3 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:196324 Archived-At: Filipp Gunbin writes: > I see. However, this doesn't seem to affect English and American > English languages, but rather European ones. There are occasional accented words e.g. naĂ¯ve, borrowed from other languages. And also punctuation marks (more common with people who use certain word processing software packages that automatically replace typewriter quotes with them). > Honestly, I always though that those languages do not have many > encodings in use, probably I'm wrong. Well, obviously there’s Latin-1 and UTF-8. There’s also Windows-1252, which is semi-compatible with Latin-1. You can sometimes end up with the Windows-1252 bytes treated as if they were Latin-1 C1 controls (and perhaps encoded further into UTF-8). There are also older encodings that aren’t used much anymore e.g. DOS 437/850, MacRoman, etc. I¹ve also seen content that was mechanically translated from one to another using an 8-bit mapping table, with incompatible characters mapped arbitrarily. For example, if you ever see something with quotes/apostrophes replaced with superscripts, like in this paragraph, this probably means the text originated in MacRoman and was translated to Latin-1 with the ³AndrĂ© Pirard² mapping. Anyway, the point is, since non-ASCII characters arenĂ¢â‚¬â„¢t pervasive, itĂ¢â‚¬â„¢s easy to miss noticing that somethingĂ¢â‚¬â„¢s wrong with them. For one last demo, this paragraph features UTF-8, treated as Windows-1252, and then re-encoded as UTF-8 again.