From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Emanuel Berg Newsgroups: gmane.emacs.help Subject: Re: Getting Emacs to play nice with Hunspell and apostrophes Date: Sat, 14 Jun 2014 03:35:05 +0200 Organization: Aioe.org NNTP Server Message-ID: <87a99gw9x2.fsf@debian.uxu> References: <87ha3s71mt.fsf@debian.uxu> <87tx7rsevi.fsf@debian.uxu> <8738fbscao.fsf@debian.uxu> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: ger.gmane.org 1402710026 2857 80.91.229.3 (14 Jun 2014 01:40:26 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 14 Jun 2014 01:40:26 +0000 (UTC) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Sat Jun 14 03:40:22 2014 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Wvcx8-0005dH-W3 for geh-help-gnu-emacs@m.gmane.org; Sat, 14 Jun 2014 03:40:19 +0200 Original-Received: from localhost ([::1]:33873 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Wvcx8-0004KA-8S for geh-help-gnu-emacs@m.gmane.org; Fri, 13 Jun 2014 21:40:18 -0400 Original-Path: usenet.stanford.edu!news.tele.dk!news.tele.dk!small.news.tele.dk!newsfeed.xs4all.nl!newsfeed3a.news.xs4all.nl!xs4all!news.stack.nl!aioe.org!.POSTED!not-for-mail Original-Newsgroups: gnu.emacs.help Original-Lines: 113 Original-NNTP-Posting-Host: SIvZRMPqRkkTHAHL6NkRuw.user.speranza.aioe.org Original-X-Complaints-To: abuse@aioe.org User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) X-Notice: Filtered by postfilter v. 0.8.2 Cancel-Lock: sha1:t8VCFOeClAv0LiSThH2GF7wvx3g= Mail-Copies-To: never Original-Xref: usenet.stanford.edu gnu.emacs.help:205958 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:98228 Archived-At: Yuri Khan writes: > The fact that everybody uses " and ' and ` is a > historical artifact, a workaround of sorts, due to > the limitations of the mechanical typewriter. We need > not be affected by it any more. > > There was no possibility of including all the > required typographical characters or accented letters > into the printing ball, so both quotes (“ and ”) and > the diaeresis got conflated into a straight quote ", > both single quotes (‘ and ’) into a straight single > quote/apostrophe ', and the backtick ` and tilde ~ > were there to facilitate typing accented letters. > > This limitation then crept into computers, because > this way the character set could be encoded in 7 > bits. The computer keyboard was just modeled after > the typewriter keyboard, with a few extensions. > > Then the inevitable struck: computers expanded from > the US and UK into Germany, Sweden, Finland, France, > Canada, and then countries with non-Latin scripts > (Greek, Cyrillic, and CJK). And all of them wanted to > have dedicated code points for their characters, > e.g. type a single ä instead of [a, > backspace-no-delete, "]. > > For a good while, we lived in a nightmare of ten > thousand code pages. In Russia, you could receive an > email and see a jumble of utterly meaningless words > because the message could be re-encoded (or the > Content-Type charset= stripped or re-labeled) on any > of the intermediate servers; there existed programs > which were able to heuristically detect the chain of > re-encodings applied on the way and decode your > message for you. You could order a book in an > Internet shop, have them completely b0rk up the > encoding of the shipping address: > http://cdn.imagepush.to/in/625x2090/i/3/30/301/24.jpg > Then somebody at the postal system might decode the > characters and the package would still be delivered > at the intended address. > > Now that every widely used operating system supports > Unicode, we don’t have an excuse for clinging to > those workarounds of the past century. We are not > limited by the 7-bit ASCII encoding and can store > texts in their true form. We also are not constrained > by the typewriter keyboard, having input methods > based on Compose or Level3 allowing us to > conveniently enter all the necessary diverse > characters. On X11/GNU/Linux in particular it comes > bundled with the system; on Windows, one has to > install a third-party package. > > Much of the software has already evolved to support > Unicode. That which hasn’t, has to catch up. From a > spell checker, in particular, I expect that it should > (perhaps with an optional switch) be able to flag as > error any spelling of “isn’t” where the character > between n and t is not the preferred apostrophe > character U+2019. First, let me tell you I very much appreciated this post! We agree that ', ", and the rest of the non-Unicode chars that may (not) be used in more or less the same context - we agree that those are there (not there) for techno-historical reasons. Where we *don't* agree is that you think that, if I'm allowed to pseudo-quote you: - Today, now that there aren't any technical limitations, we should go for the more advanced chars. Here is where I say: Just because it is possible, doesn't mean it is desired if there is no gain. It is possible to change all the software in the world to be able to use those chars. But why? For the reasons you stated, in the Internet and Usenet and otherwise computer culture, many, many people have come to use English, and the 7- (or 8) bits chars have spread and became a de facto standard. So people's eyes and brains and fingers are trained to use those. We have all came together from different starting points. The UK and US people had to go the shortest way (as the pioneers, perhaps they earned it). The Swedes had to learn English. The Russians had to go somewhat further because Russian is farther from English than Swedish. And so on. So when we finally have something in common - why break it just because it is possible? With some computer languages like Java it is possible for me to program in Swedish, using the ä, å, and ö. But why would I want to do that? It would bring havoc to my brain as the rest of the language would still be English. But more importantly, it would isolate my program from the rest of the world. I couldn't communicate about it (ask questions, tell people about it with the support of code snippets, etc.) and it couldn't be configured/extended by a non-Swedish speaking person. So I'll just stick to C, in English. Just as I will stick to ' as that is the correct way (as I see it) to write in "Computer English". -- underground experts united: http://user.it.uu.se/~embe8573