* Getting Emacs to play nice with Hunspell and apostrophes @ 2014-06-07 15:39 Nikolai Weibull 2014-06-07 17:43 ` Robert Thorpe ` (2 more replies) 0 siblings, 3 replies; 41+ messages in thread From: Nikolai Weibull @ 2014-06-07 15:39 UTC (permalink / raw) To: Emacs Users Hi! How do I get Emacs to play nice with Hunspell and apostrophes. I thought I had it covered, but it seems that something has changed and now M-x ispell won’t recognize “isn’t” as a word anymore. First off, what English dictionary should I be using? Second, how do I get Emacs to send words containing apostrophes to Hunspell? Fiddling with WORDCHARS in en_US.aff seems so wrong, as Emacs will then send stuff like 'isn't' as a word. The final step is getting “isn’t” to work with Unicode apostrophes, but let’s take it one step at a time. It’s beyond me how this isn’t working, but I’m sure I’m doing something wrong. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Getting Emacs to play nice with Hunspell and apostrophes 2014-06-07 15:39 Getting Emacs to play nice with Hunspell and apostrophes Nikolai Weibull @ 2014-06-07 17:43 ` Robert Thorpe 2014-06-07 17:59 ` Yuri Khan 2014-06-07 18:18 ` Nikolai Weibull 2014-06-07 17:53 ` Sharon Kimble 2014-06-07 18:28 ` Nikolai Weibull 2 siblings, 2 replies; 41+ messages in thread From: Robert Thorpe @ 2014-06-07 17:43 UTC (permalink / raw) To: Nikolai Weibull; +Cc: help-gnu-emacs Nikolai Weibull <now@disu.se> writes: > “isn’t” In Britain and Ireland we generally use "isn't", notice there's no angle on the apostrophe. The one you're using "’" is the Unicode RIGHT QUOTATION MARK. So, to Emacs you are closing quotes around "isn" and putting a "t" straight after that. As far as I know, if you want to use Unicode that would be "isnʼt" which is MODIFIER LETTER APOSTROPHE. Have a look with C-u C-x =. I don't know if using that will work though. BR, Robert Thorpe ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Getting Emacs to play nice with Hunspell and apostrophes 2014-06-07 17:43 ` Robert Thorpe @ 2014-06-07 17:59 ` Yuri Khan 2014-06-07 18:18 ` Nikolai Weibull 1 sibling, 0 replies; 41+ messages in thread From: Yuri Khan @ 2014-06-07 17:59 UTC (permalink / raw) To: Robert Thorpe; +Cc: help-gnu-emacs@gnu.org On Sun, Jun 8, 2014 at 12:43 AM, Robert Thorpe <rt@robertthorpeconsulting.com> wrote: > Nikolai Weibull <now@disu.se> writes: >> “isn’t” > > In Britain and Ireland we generally use "isn't", notice there's no angle > on the apostrophe. The one you're using "’" is the Unicode RIGHT > QUOTATION MARK. So, to Emacs you are closing quotes around "isn" and > putting a "t" straight after that. > > As far as I know, if you want to use Unicode that would be "isnʼt" which > is MODIFIER LETTER APOSTROPHE. Have a look with C-u C-x =. I don't > know if using that will work though. The Unicode tables say: 0027 ' APOSTROPHE […] • 2019 ’ is preferred for apostrophe 2019 ’ RIGHT SINGLE QUOTATION MARK = single comma quotation mark • this is the preferred character to use for apostrophe 02BC ʼ MODIFIER LETTER APOSTROPHE = apostrophe • glottal stop, glottalization, ejective • many languages use this as a letter of their alphabets • used as a tone marker in Bodo, Dogri, and Maithili • 2019 ’ is the preferred character for a punctuation apostrophe In English, the apostrophe is neither a glottal stop mark, nor a letter, nor a tone marker, so 02BC does not apply. 2019 is the correct code, although it is unfortunate that it is overloaded with a closing single quote. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Getting Emacs to play nice with Hunspell and apostrophes 2014-06-07 17:43 ` Robert Thorpe 2014-06-07 17:59 ` Yuri Khan @ 2014-06-07 18:18 ` Nikolai Weibull 1 sibling, 0 replies; 41+ messages in thread From: Nikolai Weibull @ 2014-06-07 18:18 UTC (permalink / raw) To: Robert Thorpe; +Cc: Emacs Users On Sat, Jun 7, 2014 at 7:43 PM, Robert Thorpe <rt@robertthorpeconsulting.com> wrote: > Nikolai Weibull <now@disu.se> writes: >> “isn’t” > > In Britain and Ireland we generally use "isn't", notice there's no angle > on the apostrophe. It’s generally used (in Britain and in other places) instead of the more correct “typographical” apostrophe/right single quotation mark because it’s more easily accessible on the standard computer keyboard, not because it’s preferred. > The one you're using "’" is the Unicode RIGHT > QUOTATION MARK. No, it’s the U+2019 RIGHT SINGLE QUOTATION MARK. > So, to Emacs you are closing quotes around "isn" and > putting a "t" straight after that. I realize that ‘’’ is seen as punctuation by Emacs, which is true in some cases, when, for example, doing quoting in British English, for example, ‘this is a quote’ or when doing nested quoting in American English, for example, “this quote quotes ‘another quote’”, but it’s also sometimes a character that should be seen as part of a word. > As far as I know, if you want to use Unicode that would be "isnʼt" which > is MODIFIER LETTER APOSTROPHE. Have a look with C-u C-x =. I don't > know if using that will work though. No, that’s incorrect, please see, for example, http://en.wikipedia.org/wiki/Apostrophe#Unicode for a description about the use of apostrophes and Unicode. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Getting Emacs to play nice with Hunspell and apostrophes 2014-06-07 15:39 Getting Emacs to play nice with Hunspell and apostrophes Nikolai Weibull 2014-06-07 17:43 ` Robert Thorpe @ 2014-06-07 17:53 ` Sharon Kimble 2014-06-07 18:17 ` Eli Zaretskii 2014-06-07 18:28 ` Nikolai Weibull 2 siblings, 1 reply; 41+ messages in thread From: Sharon Kimble @ 2014-06-07 17:53 UTC (permalink / raw) To: Nikolai Weibull; +Cc: Emacs Users [-- Attachment #1: Type: text/plain, Size: 1860 bytes --] Nikolai Weibull <now@disu.se> writes: > Hi! > > How do I get Emacs to play nice with Hunspell and apostrophes. I > thought I had it covered, but it seems that something has changed and > now M-x ispell won’t recognize “isn’t” as a word anymore. > > First off, what English dictionary should I be using? > > Second, how do I get Emacs to send words containing apostrophes to Hunspell? > > Fiddling with WORDCHARS in en_US.aff seems so wrong, as Emacs will > then send stuff like 'isn't' as a word. > > The final step is getting “isn’t” to work with Unicode apostrophes, > but let’s take it one step at a time. > > It’s beyond me how this isn’t working, but I’m sure I’m doing something wrong. > This is from my "init.el" - --8<---------------cut here---------------start------------->8--- ;; Use hunspell instead of ispell (if (file-exists-p "/usr/bin/hunspell") (progn (setq ispell-program-name "hunspell") (eval-after-load "ispell" '(progn (defun ispell-get-coding-system () 'utf-8))))) (setq ispell-program-name "hunspell") (require 'rw-hunspell) (require 'rw-language-and-country-codes) (require 'rw-ispell) (setq ispell-dictionary "en_GB_hunspell") --8<---------------cut here---------------end--------------->8--- Hopefully this will help you get going with hunspell, which I have found is good! :) This is all I have set up for hunspell and it works okay with "isn’t" as I've just corrected it in this email. If you want copies of "rw-hunspell, rw-language-and-country-codes, rw-ispell" I can priv mail them to you, but I think that I got them From emacswiki. Sharon. -- A taste of linux = http://www.sharons.org.uk my git repo = https://bitbucket.org/boudiccas/dots TGmeds = http://www.tgmeds.org.uk Debian testing, fluxbox 1.3.5, emacs 24.3.91.1 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 818 bytes --] ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Getting Emacs to play nice with Hunspell and apostrophes 2014-06-07 17:53 ` Sharon Kimble @ 2014-06-07 18:17 ` Eli Zaretskii 0 siblings, 0 replies; 41+ messages in thread From: Eli Zaretskii @ 2014-06-07 18:17 UTC (permalink / raw) To: help-gnu-emacs > From: Sharon Kimble <boudiccas@skimble.plus.com> > Date: Sat, 07 Jun 2014 18:53:47 +0100 > Cc: Emacs Users <help-gnu-emacs@gnu.org> > > ;; Use hunspell instead of ispell > (if (file-exists-p "/usr/bin/hunspell") > (progn > (setq ispell-program-name "hunspell") > (eval-after-load "ispell" > '(progn (defun ispell-get-coding-system () 'utf-8))))) > (setq ispell-program-name "hunspell") > (require 'rw-hunspell) > (require 'rw-language-and-country-codes) > (require 'rw-ispell) > (setq ispell-dictionary "en_GB_hunspell") > --8<---------------cut here---------------end--------------->8--- > > Hopefully this will help you get going with hunspell, which I have > found is good! :) This is all I have set up for hunspell and it > works okay with "isn’t" as I've just corrected it in this email. I don't see how that would work, unless you are using a version of an English dictionary that knows about the ’ character. I don't think the rw-* packages have anything to do with that, or could have. So I think a more interesting for the OP question will be where did you get that "en_GB_hunspell" dictionary you are using, because that dictionary might actually hold the answer for his questions. P.S. Most, if not all, of what rw-* packages do is already handled in the ispell.el that comes with the latest versions of Emacs, which include a full support for Hunspell. You might as well try using Emacs without those add-ons, you will probably find they are not needed anymore. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Getting Emacs to play nice with Hunspell and apostrophes 2014-06-07 15:39 Getting Emacs to play nice with Hunspell and apostrophes Nikolai Weibull 2014-06-07 17:43 ` Robert Thorpe 2014-06-07 17:53 ` Sharon Kimble @ 2014-06-07 18:28 ` Nikolai Weibull 2014-06-07 18:40 ` Eli Zaretskii 2 siblings, 1 reply; 41+ messages in thread From: Nikolai Weibull @ 2014-06-07 18:28 UTC (permalink / raw) To: Emacs Users On Sat, Jun 7, 2014 at 5:39 PM, Nikolai Weibull <now@disu.se> wrote: > It’s beyond me how this isn’t working, but I’m sure I’m doing something wrong. I should perhaps also note that the only word in the sentence above that is seen as an error is “isn’t”, as “isn” isn’t a word. I guess either Emacs or hunspell is ignoring single-character words “s” and “m” after each of the other instances of ‘’’ and “It” and “I” are of course seen as correctly spelled words… (…so the simple solution is to add “isn” as a word in my personal dictionary.) ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Getting Emacs to play nice with Hunspell and apostrophes 2014-06-07 18:28 ` Nikolai Weibull @ 2014-06-07 18:40 ` Eli Zaretskii 2014-06-07 19:59 ` Nikolai Weibull 0 siblings, 1 reply; 41+ messages in thread From: Eli Zaretskii @ 2014-06-07 18:40 UTC (permalink / raw) To: help-gnu-emacs > Date: Sat, 7 Jun 2014 20:28:08 +0200 > From: Nikolai Weibull <now@disu.se> > > On Sat, Jun 7, 2014 at 5:39 PM, Nikolai Weibull <now@disu.se> wrote: > > > It’s beyond me how this isn’t working, but I’m sure I’m doing something wrong. > > I should perhaps also note that the only word in the sentence above > that is seen as an error is “isn’t”, as “isn” isn’t a word. I guess > either Emacs or hunspell is ignoring single-character words “s” and > “m” after each of the other instances of ‘’’ and “It” and “I” are of > course seen as correctly spelled words… Emacs just goes with whatever the .aff file of the dictionary you use says. And it cannot do anything else, because the speller uses that dictionary, and decides by its rules what can and what cannot be in a word. Look in the .aff file you use, and you will see that it knows about ' and about n't and about 's, that's why these work. There's no magic here. So I think you must get a hold of a Hunspell-compliant dictionary that knows about the ’ apostrophe. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Getting Emacs to play nice with Hunspell and apostrophes 2014-06-07 18:40 ` Eli Zaretskii @ 2014-06-07 19:59 ` Nikolai Weibull 0 siblings, 0 replies; 41+ messages in thread From: Nikolai Weibull @ 2014-06-07 19:59 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Emacs Users On Sat, Jun 7, 2014 at 8:40 PM, Eli Zaretskii <eliz@gnu.org> wrote: >> Date: Sat, 7 Jun 2014 20:28:08 +0200 >> From: Nikolai Weibull <now@disu.se> >> >> On Sat, Jun 7, 2014 at 5:39 PM, Nikolai Weibull <now@disu.se> wrote: >> >> > It’s beyond me how this isn’t working, but I’m sure I’m doing something wrong. >> >> I should perhaps also note that the only word in the sentence above >> that is seen as an error is “isn’t”, as “isn” isn’t a word. I guess >> either Emacs or hunspell is ignoring single-character words “s” and >> “m” after each of the other instances of ‘’’ and “It” and “I” are of >> course seen as correctly spelled words… > Emacs just goes with whatever the .aff file of the dictionary you use > says. And it cannot do anything else, because the speller uses that > dictionary, and decides by its rules what can and what cannot be in a > word. Yes, I realize that, but that raises the question of how ‘isn’t’ will be parsed if I straight up add ’ to WORDCHARS, but I guess that only matters for the curses interface that I don’t use. > Look in the .aff file you use, and you will see that it knows about ' > and about n't and about 's, that's why these work. There's no magic > here. OK, so having read hunspell(5), it seems that my .aff that comes from OpenOffice doesn’t include “n't” as a possible SFX. The .dic does list the word “isn't”, however, so I’m not sure what to make of this. The one from SCOWL, version 7.1.0, looks about the same as the OpenOffice one. The one from Mozilla is also about the same. > So I think you must get a hold of a Hunspell-compliant dictionary that > knows about the ’ apostrophe. Yes, I suppose so. One solution that seems to work is to add ‘’’ (or ‘'’ to WORDCHARS and then change ispell-dictionary-alist to include ‘’’ in the OTHERCHARS element. This works with hunspell 1.3.3 (which was released a couple of days ago and still doesn’t include the patch for handling offsets correctly). Perhaps this should be handled automatically for OTHERCHARS in ispell.el? ^ permalink raw reply [flat|nested] 41+ messages in thread
[parent not found: <mailman.3187.1402155569.1147.help-gnu-emacs@gnu.org>]
* Re: Getting Emacs to play nice with Hunspell and apostrophes [not found] <mailman.3187.1402155569.1147.help-gnu-emacs@gnu.org> @ 2014-06-11 0:04 ` Emanuel Berg 2014-06-11 5:23 ` Nikolai Weibull [not found] ` <mailman.3375.1402464243.1147.help-gnu-emacs@gnu.org> 2014-06-17 2:12 ` Rusi 1 sibling, 2 replies; 41+ messages in thread From: Emanuel Berg @ 2014-06-11 0:04 UTC (permalink / raw) To: help-gnu-emacs Nikolai Weibull <now@disu.se> writes: > How do I get Emacs to play nice with Hunspell and > apostrophes. I thought I had it covered, but it > seems that something has changed and now M-x ispell > won’t recognize “isn’t” as a word anymore. “isn’t” isn't an error according to my spellchecker, aspell. But before I go on about that, I agree with everyone else saying don't use those silly chars - what's the benefit? They look stupid and they bring along problems like this (though not for aspell, it would seem, but for Hunspell in your case, and in other situations as well). Again, what's the gain using them? (setq ispell-program-name "aspell") (setq ispell-dictionary "english") (setq ispell-silently-savep t) Ignore within special delimiters: (add-to-list 'ispell-skip-region-alist '("`" . "`")) (add-to-list 'ispell-skip-region-alist '("`" . "'")) > First off, what English dictionary should I be using? With aspell, get it from the repositories - likewise dictionaries, which are called aspell-en, aspell-sv, etc. More aspell: http://user.it.uu.se/~embe8573/conf/emacs-init/spell.el To really get the blood pumping behind your ears, you need shortcuts for the different dictionaries, as well. -- underground experts united: http://user.it.uu.se/~embe8573 ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Getting Emacs to play nice with Hunspell and apostrophes 2014-06-11 0:04 ` Emanuel Berg @ 2014-06-11 5:23 ` Nikolai Weibull [not found] ` <mailman.3375.1402464243.1147.help-gnu-emacs@gnu.org> 1 sibling, 0 replies; 41+ messages in thread From: Nikolai Weibull @ 2014-06-11 5:23 UTC (permalink / raw) To: Emanuel Berg; +Cc: Emacs Users On Wed, Jun 11, 2014 at 2:04 AM, Emanuel Berg <embe8573@student.uu.se> wrote: > Nikolai Weibull <now@disu.se> writes: > But before I go on about that, I agree with everyone > else saying don't use those silly chars - what's the > benefit? No one said that. ^ permalink raw reply [flat|nested] 41+ messages in thread
[parent not found: <mailman.3375.1402464243.1147.help-gnu-emacs@gnu.org>]
* Re: Getting Emacs to play nice with Hunspell and apostrophes [not found] ` <mailman.3375.1402464243.1147.help-gnu-emacs@gnu.org> @ 2014-06-11 14:24 ` Emanuel Berg 2014-06-11 15:03 ` Nikolai Weibull [not found] ` <mailman.3418.1402499010.1147.help-gnu-emacs@gnu.org> 0 siblings, 2 replies; 41+ messages in thread From: Emanuel Berg @ 2014-06-11 14:24 UTC (permalink / raw) To: help-gnu-emacs Nikolai Weibull <now@disu.se> writes: >> But before I go on about that, I agree with everyone >> else saying don't use those silly chars - what's the >> benefit? > > No one said that. OK, so I did. That is beside the point. Still, why use it? It looks stupid (people are used to the other way) and it isn't practical, as your post shows. Also, you use other chars that aren't practical - the three dots as one char, for example. Why? If you don't have real problems, I guess you can always be a snob and get a bunch of artificial problems, that you can then pretend to solve. -- underground experts united: http://user.it.uu.se/~embe8573 ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Getting Emacs to play nice with Hunspell and apostrophes 2014-06-11 14:24 ` Emanuel Berg @ 2014-06-11 15:03 ` Nikolai Weibull [not found] ` <mailman.3418.1402499010.1147.help-gnu-emacs@gnu.org> 1 sibling, 0 replies; 41+ messages in thread From: Nikolai Weibull @ 2014-06-11 15:03 UTC (permalink / raw) To: Emanuel Berg; +Cc: Emacs Users On Wed, Jun 11, 2014 at 4:24 PM, Emanuel Berg <embe8573@student.uu.se> wrote: > Nikolai Weibull <now@disu.se> writes: > >>> But before I go on about that, I agree with everyone >>> else saying don't use those silly chars - what's the >>> benefit? >> >> No one said that. > OK, so I did. That is beside the point. Still, why use > it? It looks stupid (people are used to the other way) > and it isn't practical, as your post shows. Also, you > use other chars that aren't practical - the three dots > as one char, for example. Why? If you don't have real > problems, I guess you can always be a snob and get a > bunch of artificial problems, that you can then pretend > to solve. I don’t have any interest in creating problems, real or otherwise, but you sure seem to want to, which is why I won’t discuss this further with you. (And now I can at least pretend to have solved this thread’s troll problem.) ^ permalink raw reply [flat|nested] 41+ messages in thread
[parent not found: <mailman.3418.1402499010.1147.help-gnu-emacs@gnu.org>]
* Re: Getting Emacs to play nice with Hunspell and apostrophes [not found] ` <mailman.3418.1402499010.1147.help-gnu-emacs@gnu.org> @ 2014-06-11 15:20 ` Emanuel Berg 2014-06-11 16:57 ` Teemu Likonen ` (3 more replies) 0 siblings, 4 replies; 41+ messages in thread From: Emanuel Berg @ 2014-06-11 15:20 UTC (permalink / raw) To: help-gnu-emacs Nikolai Weibull <now@disu.se> writes: > I don’t have any interest in creating problems, real > or otherwise, but you sure seem to want to, which is > why I won’t discuss this further with you. You still haven't said one word why anyone would benefit from using those chars instead of the standard " and ' (and ...) that works everywhere and that everyone is familiar with (having trained their eyes for them year-in, year-out). If you can't motivate why something is a problem, it is not a problem. Nonetheless I suggested another solution (using aspell, the first suggestion being stop using those chars). > (And now I can at least pretend to have solved this > thread’s troll problem.) (No comments.) -- underground experts united: http://user.it.uu.se/~embe8573 ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Getting Emacs to play nice with Hunspell and apostrophes 2014-06-11 15:20 ` Emanuel Berg @ 2014-06-11 16:57 ` Teemu Likonen [not found] ` <mailman.3437.1402505846.1147.help-gnu-emacs@gnu.org> ` (2 subsequent siblings) 3 siblings, 0 replies; 41+ messages in thread From: Teemu Likonen @ 2014-06-11 16:57 UTC (permalink / raw) To: help-gnu-emacs [-- Attachment #1: Type: text/plain, Size: 489 bytes --] Emanuel Berg [2014-06-11 17:20:31 +02:00] wrote: > You still haven't said one word why anyone would benefit from using > those chars instead of the standard " and ' (and ...) that works > everywhere and that everyone is familiar with (having trained their > eyes for them year-in, year-out). For instance, every book uses real quotation marks and apostrophes. They are standard in the publishing world. Many people use Emacs to write text that will be published (web, printed material). [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 835 bytes --] ^ permalink raw reply [flat|nested] 41+ messages in thread
[parent not found: <mailman.3437.1402505846.1147.help-gnu-emacs@gnu.org>]
* Re: Getting Emacs to play nice with Hunspell and apostrophes [not found] ` <mailman.3437.1402505846.1147.help-gnu-emacs@gnu.org> @ 2014-06-11 21:32 ` Emanuel Berg 0 siblings, 0 replies; 41+ messages in thread From: Emanuel Berg @ 2014-06-11 21:32 UTC (permalink / raw) To: help-gnu-emacs Teemu Likonen <tlikonen@iki.fi> writes: > For instance, every book uses real quotation marks > and apostrophes. They are standard in the publishing > world. Many people use Emacs to write text that will > be published (web, printed material). Well, OK, sort of... On the other hand, that would typically be produced with LaTeX. (In latex-mode, there are some annoying `` and '' automatically when you do " - I never learned the reason for that.) But in short forms like "isn't", at least I simply write ' in the LaTeX source - I haven't noticed how they turn out - probably that can be tweaked (I'll get back to you on that, as I happen to work on such a document right now). For web material I think '/" is preferable still, because people like to yank it into mails and the like, and it would just be extra work having them change to '/" whenever that happens. Some examples from the computer world: A quote from the ls man page: -G, --no-group in a long listing, don't print group names From the emacs ditto: -Q, --quick Similar to "-q --no-site-file --no-splash". From RFC 3676: If the line is flowed and DelSp is "yes", the trailing space immediately prior to the line's CRLF is logically deleted. If the DelSp parameter is "no" (or not specified, or set to an unrecognized value), the trailing space is not deleted. And speaking of mails - we are using mails/posts right now - so why use it in mails and Usenet posts? On a more general/human scale: you are Finish (I take it), I am Swedish. (I'm not reeling you or anyone else to my side, just stating facts.) We have acquired English and use it because we accept that is very practical and it is simply how it works. Not to mention the Russian and Chinese who had to get fluent with a whole new alphabet and language system! Or this quote from this thread: "In Britain and Ireland we generally use "isn't", notice there's no angle on the apostrophe." Besides virtually all US computer people use '/" what I can tell! So yes, I feel it is close to arrogance that the OP cannot in one word tell me why this would benefit anyone, and even more so as I actually tried to help him in my first post! Anyway, I'm not angry or anything. Peace in the Middle East. Feel free to carry on this discussion though (of course). -- underground experts united: http://user.it.uu.se/~embe8573 ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Getting Emacs to play nice with Hunspell and apostrophes 2014-06-11 15:20 ` Emanuel Berg 2014-06-11 16:57 ` Teemu Likonen [not found] ` <mailman.3437.1402505846.1147.help-gnu-emacs@gnu.org> @ 2014-06-12 5:43 ` Yuri Khan 2014-06-12 12:51 ` Stefan Monnier 2014-06-12 16:58 ` Eli Zaretskii [not found] ` <mailman.3473.1402551809.1147.help-gnu-emacs@gnu.org> 3 siblings, 2 replies; 41+ messages in thread From: Yuri Khan @ 2014-06-12 5:43 UTC (permalink / raw) To: Emanuel Berg; +Cc: help-gnu-emacs@gnu.org On Wed, Jun 11, 2014 at 10:20 PM, Emanuel Berg <embe8573@student.uu.se> wrote: > You still haven't said one word why anyone would > benefit from using those chars instead of the standard > " and ' (and ...) that works everywhere and that > everyone is familiar with (having trained their eyes > for them year-in, year-out). The fact that everybody uses " and ' and ` is a historical artifact, a workaround of sorts, due to the limitations of the mechanical typewriter. We need not be affected by it any more. There was no possibility of including all the required typographical characters or accented letters into the printing ball, so both quotes (“ and ”) and the diaeresis got conflated into a straight quote ", both single quotes (‘ and ’) into a straight single quote/apostrophe ', and the backtick ` and tilde ~ were there to facilitate typing accented letters. This limitation then crept into computers, because this way the character set could be encoded in 7 bits. The computer keyboard was just modeled after the typewriter keyboard, with a few extensions. Then the inevitable struck: computers expanded from the US and UK into Germany, Sweden, Finland, France, Canada, and then countries with non-Latin scripts (Greek, Cyrillic, and CJK). And all of them wanted to have dedicated code points for their characters, e.g. type a single ä instead of [a, backspace-no-delete, "]. For a good while, we lived in a nightmare of ten thousand code pages. In Russia, you could receive an email and see a jumble of utterly meaningless words because the message could be re-encoded (or the Content-Type charset= stripped or re-labeled) on any of the intermediate servers; there existed programs which were able to heuristically detect the chain of re-encodings applied on the way and decode your message for you. You could order a book in an Internet shop, have them completely b0rk up the encoding of the shipping address: http://cdn.imagepush.to/in/625x2090/i/3/30/301/24.jpg Then somebody at the postal system might decode the characters and the package would still be delivered at the intended address. Now that every widely used operating system supports Unicode, we don’t have an excuse for clinging to those workarounds of the past century. We are not limited by the 7-bit ASCII encoding and can store texts in their true form. We also are not constrained by the typewriter keyboard, having input methods based on Compose or Level3 allowing us to conveniently enter all the necessary diverse characters. On X11/GNU/Linux in particular it comes bundled with the system; on Windows, one has to install a third-party package. Much of the software has already evolved to support Unicode. That which hasn’t, has to catch up. From a spell checker, in particular, I expect that it should (perhaps with an optional switch) be able to flag as error any spelling of “isn’t” where the character between n and t is not the preferred apostrophe character U+2019. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Getting Emacs to play nice with Hunspell and apostrophes 2014-06-12 5:43 ` Yuri Khan @ 2014-06-12 12:51 ` Stefan Monnier 2014-06-12 13:36 ` Nikolai Weibull [not found] ` <mailman.3496.1402580195.1147.help-gnu-emacs@gnu.org> 2014-06-12 16:58 ` Eli Zaretskii 1 sibling, 2 replies; 41+ messages in thread From: Stefan Monnier @ 2014-06-12 12:51 UTC (permalink / raw) To: help-gnu-emacs > The fact that everybody uses " and ' and ` is a historical artifact, a > workaround of sorts, due to the limitations of the mechanical > typewriter. We need not be affected by it any more. The limitation of "number of keys on a keyboard" along with the limitations of the human brain mean that it's still very convenient for users to be able to just use " and ' rather than having umpteen subvariants and needing to remember which to use where and how to type them in. Stefan ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Getting Emacs to play nice with Hunspell and apostrophes 2014-06-12 12:51 ` Stefan Monnier @ 2014-06-12 13:36 ` Nikolai Weibull 2014-06-12 14:48 ` Stefan Monnier [not found] ` <mailman.3496.1402580195.1147.help-gnu-emacs@gnu.org> 1 sibling, 1 reply; 41+ messages in thread From: Nikolai Weibull @ 2014-06-12 13:36 UTC (permalink / raw) To: Stefan Monnier; +Cc: Emacs Users On Thu, Jun 12, 2014 at 2:51 PM, Stefan Monnier <monnier@iro.umontreal.ca> wrote: >> The fact that everybody uses " and ' and ` is a historical artifact, a >> workaround of sorts, due to the limitations of the mechanical >> typewriter. We need not be affected by it any more. > > The limitation of "number of keys on a keyboard" along with the > limitations of the human brain mean that it's still very convenient for > users to be able to just use " and ' rather than having umpteen > subvariants and needing to remember which to use where and how to type > them in. Remembering when to use which of four symbols is hardly taxing (and – even when considering additional “variants” such as ‘′’, ‘″’ for prime and double prime – not close to the definition of umpteen, I’d say), though the “how to type them in” arguments deserves a bit more consideration, such as the automatic replacement that many editors perform. Personally, keyboard bindings such as \C-k ' 9 (from Vim and now Evil) are wired deep into my fingers, so much so that I still haven’t been able to move over to using the more convenient & ' 9 from the rfc1345 input method. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Getting Emacs to play nice with Hunspell and apostrophes 2014-06-12 13:36 ` Nikolai Weibull @ 2014-06-12 14:48 ` Stefan Monnier 0 siblings, 0 replies; 41+ messages in thread From: Stefan Monnier @ 2014-06-12 14:48 UTC (permalink / raw) To: Nikolai Weibull; +Cc: Emacs Users > Remembering when to use which of four symbols is hardly taxing (and – I'd expect that most people would first have to *learn* before they could have a chance at remembering. After all, these are typographical conventions that aren't taught at school. And then seeing how people mix up "your/you're" and friends, I think you're being overly optimistic. Stefan ^ permalink raw reply [flat|nested] 41+ messages in thread
[parent not found: <mailman.3496.1402580195.1147.help-gnu-emacs@gnu.org>]
* Re: Getting Emacs to play nice with Hunspell and apostrophes [not found] ` <mailman.3496.1402580195.1147.help-gnu-emacs@gnu.org> @ 2014-06-14 1:49 ` Emanuel Berg 2014-06-14 5:45 ` Yuri Khan [not found] ` <mailman.3627.1402724759.1147.help-gnu-emacs@gnu.org> 0 siblings, 2 replies; 41+ messages in thread From: Emanuel Berg @ 2014-06-14 1:49 UTC (permalink / raw) To: help-gnu-emacs Nikolai Weibull <now@disu.se> writes: > Remembering when to use which of four symbols is > hardly taxing (and – even when considering additional > “variants” such as ‘′’, ‘″’ for prime and double > prime – not close to the definition of umpteen, I’d > say), though the “how to type them in” arguments > deserves a bit more consideration, such as the > automatic replacement that many editors perform. The “ and ’ just looks silly and they are disruptive. The two chars after the words "such as" I cannot see (they are shown as diamonds). As for remembering/typing, it is again not a question of - "is it possible to do?" - not with respect to humans nor to technology - of course it is possible! - the question is - and what I can see you still haven't answered it with one word - the question is *why* - what is the gain? who would benefit from it, and how so? This entire thread is an example why not to do it (though I agree a spellchecker should be fixed to cope, anyway, as some people have the poor taste to use those chars and those have to be accounted for) - and I just raised additional problems, on top of the fact that so much software around is just not up to it - so why this is (and can be) a problem (annoyance) is clear - the only thing that is a mystery is why anyone would want it to begin with. > Personally, keyboard bindings such as \C-k ' 9 (from > Vim and now Evil) are wired deep into my fingers, so > much so that I still haven’t been able to move over > to using the more convenient & ' 9 from the rfc1345 > input method. OK, let me tell you how I do ' and ". ' I do by moving my right little finger one step (key) to the right. The " I do by moving the right little finger to the right shift, at the same time as the ring finger slides along to the ' key. So can you find one singe area in which anyone (human or technology) benefits from those goofy chars? It is just snobbish, not reality. Don't do it! -- underground experts united: http://user.it.uu.se/~embe8573 ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Getting Emacs to play nice with Hunspell and apostrophes 2014-06-14 1:49 ` Emanuel Berg @ 2014-06-14 5:45 ` Yuri Khan [not found] ` <mailman.3627.1402724759.1147.help-gnu-emacs@gnu.org> 1 sibling, 0 replies; 41+ messages in thread From: Yuri Khan @ 2014-06-14 5:45 UTC (permalink / raw) To: Emanuel Berg; +Cc: help-gnu-emacs@gnu.org On Sat, Jun 14, 2014 at 8:49 AM, Emanuel Berg <embe8573@student.uu.se> wrote: > The “ and ’ just looks silly and they are > disruptive. The two chars after the words "such as" I > cannot see (they are shown as diamonds). This is where I disagree. Curly quotes (and, in Russian print tradition, double angle quotes) are what I am used to seeing in print and consider to be the correct way to write, independent of the medium. Straight quotes I recognize in both print and on screen as a no longer necessary homage to the old clunky typewriter, and perceive as silly. As for your problems seeing curly quotes, that’s because of your display engine. Text mode Linux console is limited to at most 512 character shapes; this limitation dates back to the original VGA card and is another one that should no longer affect us. Nowadays, you should be able to use a graphical-based text renderer — be it X11 or framebuffer. Myself, I haven’t bothered to set up a framebuffer console on any of my computers — I prefer working in an X11 environment with Freetype-rendered, subpixel-antialiased Unicode fonts and rich xkb customizability. > the question is *why* - > what is the gain? who would benefit from it, and how > so? By encoding more precise character semantics into our texts, we make them better suited for any kind of automated processing. Conflating similarly shaped characters, on the other hand, makes it more complicated. For example, the task of producing nice printouts from an ASCII-encoded source requires a complex piece of software like [La]TeX, or the mechanism of entity references in HTML (“). On the other hand, with UTF-8, we can directly encode the desired characters in a text document and print it out with any text editor or web browser. (You can, of course, argue that a printout of an ASCII document with straight quotes is not too ugly; or that TeX is not exceedingly complex; or that entity references are not very disrupting.) > OK, let me tell you how I do ' and ". ' I do by moving > my right little finger one step (key) to the right. The > " I do by moving the right little finger to the right > shift, at the same time as the ring finger slides along > to the ' key. Now let me tell you how I do curly quotes. First, with my right thumb, I hold the AltGr modifier. Then, I press k and l in sequence to get a balanced pair of double curly quotes, or ; and ' for single quotes (I customized my xkb configuration files to get this but it works similarly with the out-of-the-box config). This works for me in both Latin/English and Cyrillic/Russian layouts. On the other hand, the straight quote is only available in the Latin layout; in Russian, I would have to first switch to Latin, then type the single quote, and finally switch back to Russian. ^ permalink raw reply [flat|nested] 41+ messages in thread
[parent not found: <mailman.3627.1402724759.1147.help-gnu-emacs@gnu.org>]
* Re: Getting Emacs to play nice with Hunspell and apostrophes [not found] ` <mailman.3627.1402724759.1147.help-gnu-emacs@gnu.org> @ 2014-06-14 11:14 ` Emanuel Berg 2014-06-14 14:51 ` Yuri Khan ` (2 more replies) 2014-06-15 2:48 ` Sergio Pokrovskij 1 sibling, 3 replies; 41+ messages in thread From: Emanuel Berg @ 2014-06-14 11:14 UTC (permalink / raw) To: help-gnu-emacs Yuri Khan <yuri.v.khan@gmail.com> writes: >> The “ and ’ just looks silly and they are >> disruptive. The two chars after the words "such as" >> I cannot see (they are shown as diamonds). > > This is where I disagree. Curly quotes (and, in > Russian print tradition, double angle quotes) are > what I am used to seeing in print and consider to be > the correct way to write OK, I believe you. However, the point I made with all people coming from different cultures is that it doesn't matter where we are from individually. When I went to school, I suppose I was most comfortable with Swedish. But I'm not supposing we all switch to Swedish! OK, that's a ridiculous example as it is extreme, while what we discuss now is perhaps trivial (' or ’) - but in principle it is the same. The computer language is English, and as I showed - the man pages for ls and emacs, as well as the RFC excerpt, as well as all experience with mails and Usenet and programming culture - all show that in "Computer English", ' (not ’) is correct. In a sense, this language is something that even the US, UK, etc. people have to acquire, though in another way altogether, of course. You see, kernel, allocation, dynamic, data structure, heap, process, deadlock, etc. are all English words. But put together a sentence and show it to a surfer in Southern California. You know what I'm saying? (By the way, do you know what they call a guy in Southern California who is interested in cars? Well, a "sensitive intellectual" :)) - now, the Scots and Irish are of course not calling their variables McDigit or O'String, but do they write <centre>, DialogueBox, background-colour, and so on? No - in Computer English it is <center>, DialogBox, and background-color, just as it is ', not ’. > independent of the medium There is no such independence. There are computers. > Straight quotes I recognize in both print and on > screen as a no longer necessary homage to the old > clunky typewriter, and perceive as silly. They are not homages to anything - they exist. It is of course interesting to know why they are there but as for as for this discussion it doesn't matter. What matters is that they are there, they exist. > As for your problems seeing curly quotes, that’s > because of your display engine. Yes, another reason why not to use them. > Text mode Linux console is limited to at most 512 > character shapes; this limitation dates back to the > original VGA card and is another one that should no > longer affect us. Nowadays, you should be able to use > a graphical-based text renderer — be it X11 or > framebuffer. Myself, I haven’t bothered to set up a > framebuffer console on any of my computers — I prefer > working in an X11 environment with Freetype-rendered, > subpixel-antialiased Unicode fonts and rich xkb > customizability. The Linux console is faster with text than Emacs running in for example xterm. I could get a faster computer hypothetically but then I'd also have to spend hours getting the keyboard and fonts and everything as I want them. But I already have that, so why do it? But I don't think the console is that much "better" than X/xterm in general - just in my case with all the configuration, I'm very happy with that and see no reason to do it again in X. And certainly not for this reason... > By encoding more precise character semantics into our > texts, we make them better suited for any kind of > automated processing. Conflating similarly shaped > characters, on the other hand, makes it more > complicated. > > For example, the task of producing nice printouts > from an ASCII-encoded source requires a complex piece > of software like [La]TeX, or the mechanism of entity > references in HTML (“). On the other hand, with > UTF-8, we can directly encode the desired characters > in a text document and print it out with any text > editor or web browser. > > (You can, of course, argue that a printout of an > ASCII document with straight quotes is not too ugly; > or that TeX is not exceedingly complex; or that > entity references are not very disrupting.) ASCII doesn't look ugly printed, it looks the same as it does on computers. But the main propose of ASCII of course isn't to be printed but to be processed and crunched... and read (on computers). I can't say I have that much respect for HTML as a technical system but yes, I think ' should be used, both when typing and in presentation - where the material will be read in a browser (i.e., a computer program) and sometimes yanked to a mail or post or configuration file. LaTeX is indeed complex but it is for a good reason - so there won't be any limitations creating complex documents. When you print LaTeX I don't really care what the chars look like because with LaTeX you typically print ambitious documents of several pages so then you get into the flow when reading, so you stop thinking about the chars really fast. However, every code/configuration file snippet, man page quote and so on should use '. Also, when you write LaTeX, only ' (and the like) should be used just as is the case for programming, HTML, and all other computer writing and programming. But after that, when a PDF has been created, that is sort of beyond the dynamic world of computers and more into the book world - there, I don't see any real benefits of using either ' or ’. However, since it doesn't really matter, why not stick to ' as it is the de facto standard? >> OK, let me tell you how I do ' and ". ' I do by >> moving my right little finger one step (key) to the >> right. The " I do by moving the right little finger >> to the right shift, at the same time as the ring >> finger slides along to the ' key. > > Now let me tell you how I do curly quotes. > > First, with my right thumb, I hold the AltGr > modifier. Then, I press k and l in sequence to get a > balanced pair of double curly quotes, or ; and ' for > single quotes (I customized my xkb configuration > files to get this but it works similarly with the > out-of-the-box config). This works for me in both > Latin/English and Cyrillic/Russian layouts. On the > other hand, the straight quote is only available in > the Latin layout; in Russian, I would have to first > switch to Latin, then type the single quote, and > finally switch back to Russian. Yes, but when you program and write in English (like now), don't you use the US keyboard layout? That's what I do to get the brackets and the semicolon and all that with no fuss - it is not that I use the Swedish chars that much, anyway! (Which is again the whole point.) And with the US layout, ' (and so on) are easier to type than the chars you suggest. -- underground experts united: http://user.it.uu.se/~embe8573 ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Getting Emacs to play nice with Hunspell and apostrophes 2014-06-14 11:14 ` Emanuel Berg @ 2014-06-14 14:51 ` Yuri Khan 2014-06-14 15:26 ` Teemu Likonen 2014-06-17 1:42 ` Garreau, Alexandre [not found] ` <mailman.3651.1402757512.1147.help-gnu-emacs@gnu.org> 2014-06-17 1:46 ` Garreau, Alexandre 2 siblings, 2 replies; 41+ messages in thread From: Yuri Khan @ 2014-06-14 14:51 UTC (permalink / raw) To: Emanuel Berg; +Cc: help-gnu-emacs@gnu.org On Sat, Jun 14, 2014 at 6:14 PM, Emanuel Berg <embe8573@student.uu.se> wrote: > Yuri Khan <yuri.v.khan@gmail.com> writes: > >> Curly quotes (and, in >> Russian print tradition, double angle quotes) are >> what I am used to seeing in print and consider to be >> the correct way to write > > OK, I believe you. However, the point I made with all > people coming from different cultures is that it > doesn't matter where we are from individually. When I > went to school, I suppose I was most comfortable with > Swedish. But I'm not supposing we all switch to > Swedish! OK, so what? I expect that people of all cultures who were exposed to books printed before the advent of the computer and the word processor are used to typographic characters. > OK, that's a ridiculous example as it is > extreme, while what we discuss now is perhaps trivial > (' or ’) - but in principle it is the same. The > computer language is English, and as I showed - the man > pages for ls and emacs, as well as the RFC excerpt, as > well as all experience with mails and Usenet and > programming culture - all show that in "Computer > English", ' (not ’) is correct. They are that way because they were written in the dark age of ten thousand code pages and never updated to Unicode. The GCC error messages in the en_US.utf8 locale, on the other hand, do use curly quotes. >> Straight quotes I recognize in both print and on >> screen as a no longer necessary homage to the old >> clunky typewriter, and perceive as silly. > > They are not homages to anything - they exist. It is of > course interesting to know why they are there but as > for as for this discussion it doesn't matter. What > matters is that they are there, they exist. They exist *because* there was a certain technical limitation in the last fifty years or so. Since this limitation has been removed, there is no reason for them. OK, I do not suggest that Perl should drop its backtick operator or that computer languages universally start using curly quotes for character and string literals (although that would make many languages more elegant by simplifying parsing). But how about we reserve all these artificial characters for computer languages, one of which English is not. >> As for your problems seeing curly quotes, that’s >> because of your display engine. > > Yes, another reason why not to use them. I believe users of the VGA text console are intelligent beings and respect their decision to suffer. > I can't say I have that much respect for HTML as a > technical system but yes, I think ' should be used, > both when typing and in presentation - where the > material will be read in a browser (i.e., a computer > program) and sometimes yanked to a mail or post or > configuration file. For configuration files, by all means, the character which is proper for that particular file format must be used. Otherwise, primarily, the material will be read by a human being, and only secondarily in a computer program. I wish for a future where the Web replaces the printed book, therefore, the Web must do all things books do, and then some. > LaTeX is indeed complex but it is for a good reason - > so there won't be any limitations creating complex > documents. When you print LaTeX I don't really care > what the chars look like because with LaTeX you > typically print ambitious documents of several pages so > then you get into the flow when reading, so you stop > thinking about the chars really fast. No. If I have to read a printed document, every straight quote, every hyphen used in place of a dash, every uneven space, pulls me out of the flow. The only way for me to stop thinking about the characters is if they are exactly as in a book typeset by a skilled typesetter on a pre-computer-era press. Yes, LaTeX does a lot to produce a beautifully typeset printout from an ASCII source. This is not enough; I want that same beautiful typesetting on screen, in browser, in any page width I happen to have, in my favorite typeface and font size, without having to recompile the document. And at the same time, it does too much. It has to maintain, and document authors have to utilize, a multitude of workarounds that are caused by TeX not using Unicode internally. > when you program and write in English (like > now), don't you use the US keyboard layout? That's what > I do to get the brackets and the semicolon and all that > with no fuss - it is not that I use the Swedish chars > that much, anyway! (Which is again the whole point.) > And with the US layout, ' (and so on) are easier to > type than the chars you suggest. The difference between ' and AltGr+' is almost negligible for me. Additionally, when I use an apostrophe in a string constant in a language where strings are delimited by single quotes, or double curly quotes where delimited by double quotes, I don’t have to backslash-quote them. I do understand we have engaged in a holy war not directly related to the original poster’s problem. Let’s agree to disagree. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Getting Emacs to play nice with Hunspell and apostrophes 2014-06-14 14:51 ` Yuri Khan @ 2014-06-14 15:26 ` Teemu Likonen 2014-06-17 1:42 ` Garreau, Alexandre 1 sibling, 0 replies; 41+ messages in thread From: Teemu Likonen @ 2014-06-14 15:26 UTC (permalink / raw) To: Yuri Khan; +Cc: help-gnu-emacs, Emanuel Berg [-- Attachment #1: Type: text/plain, Size: 829 bytes --] Yuri Khan [2014-06-14 21:51:43 +07:00] wrote: > Yes, LaTeX does a lot to produce a beautifully typeset printout from > an ASCII source. This is not enough; I want that same beautiful > typesetting on screen, in browser, in any page width I happen to have, > in my favorite typeface and font size, without having to recompile the > document. And at the same time, it does too much. It has to maintain, > and document authors have to utilize, a multitude of workarounds that > are caused by TeX not using Unicode internally. Yes, but you know, Xelatex with fontspec/mathspec package takes UTF-8 input files and uses Truetype and Opentype fonts. That’s the today’s Latex actually. I don’t use ``quotes'' and other character-level markup anymore. Just plain “ ” will do (or ” ”, as we do in Finnish). [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 835 bytes --] ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Getting Emacs to play nice with Hunspell and apostrophes 2014-06-14 14:51 ` Yuri Khan 2014-06-14 15:26 ` Teemu Likonen @ 2014-06-17 1:42 ` Garreau, Alexandre 1 sibling, 0 replies; 41+ messages in thread From: Garreau, Alexandre @ 2014-06-17 1:42 UTC (permalink / raw) To: Yuri Khan; +Cc: help-gnu-emacs [-- Attachment #1: Type: text/plain, Size: 2246 bytes --] On 2014-06-14 at 16:51, Yuri Khan wrote: > The GCC error messages in the en_US.utf8 locale, on the other hand, do > use curly quotes. Indeed, just because “computer English” is made for computers, not human beings, who prefer to have readable text, just like it was before typewriters. > OK, I do not suggest that Perl should drop its backtick operator or > that computer languages universally start using curly quotes for > character and string literals (although that would make many languages > more elegant by simplifying parsing). But how about we reserve all > these artificial characters for computer languages, one of which > English is not. Having more language neutral programming languages would be cool, even languages based on semantic interpretation of binary data that would move the complexity of syntactic representation of its content from data toward editor would be really more useful, clean, simple, egalitarian, etc. > Otherwise, primarily, the material will be read by a human being, and > only secondarily in a computer program. I wish for a future where the > Web replaces the printed book, therefore, the Web must do all things > books do, and then some. I hope that by “the Web” you mean “the concept of the ensemble of linked interpreted documents to read shared by the medium of computer networks and read on computers interfaces”, not the poor current implementation of it, which is still using obsolete and despotic client–server model (<http://thewebmustdie.com/>, <http://secushare.org/>). > Yes, LaTeX does a lot to produce a beautifully typeset printout from > an ASCII source. This is not enough; I want that same beautiful > typesetting on screen, in browser, in any page width I happen to have, > in my favorite typeface and font size, without having to recompile the > document. And at the same time, it does too much. It has to maintain, > and document authors have to utilize, a multitude of workarounds that > are caused by TeX not using Unicode internally. Having something technically and typographically good like LaTeX, semantic and interpreted like HTML and language-neutral like markdown/any-binary-interpreted-format would be great. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 948 bytes --] ^ permalink raw reply [flat|nested] 41+ messages in thread
[parent not found: <mailman.3651.1402757512.1147.help-gnu-emacs@gnu.org>]
* Re: Getting Emacs to play nice with Hunspell and apostrophes [not found] ` <mailman.3651.1402757512.1147.help-gnu-emacs@gnu.org> @ 2014-06-14 16:13 ` Emanuel Berg 2014-06-16 15:35 ` Joost Kremers ` (2 more replies) 0 siblings, 3 replies; 41+ messages in thread From: Emanuel Berg @ 2014-06-14 16:13 UTC (permalink / raw) To: help-gnu-emacs Yuri Khan <yuri.v.khan@gmail.com> writes: >>> Curly quotes (and, in Russian print tradition, >>> double angle quotes) are what I am used to seeing in >>> print and consider to be the correct way to write >> OK, I believe you. However, the point I made with all >> people coming from different cultures is that it >> doesn't matter where we are from individually. When I >> went to school, I suppose I was most comfortable with >> Swedish. But I'm not supposing we all switch to >> Swedish! > > OK, so what? I expect that people of all cultures who > were exposed to books printed before the advent of > the computer and the word processor are used to > typographic characters. I'm OK disagreeing but I want you to understand me. The point is: the cultures are in this discussion irrelevant. If the cultures were what decided things you should be speaking Russian and I Swedish. We don't, because we have travelled to a common point so that when we interact in the computer world, we are using the "Computer English" language, which I have described several times now. This is the English in the man pages, in the RFCs, in the C code, in the HTML, and all that. In this language you don't write <mitten> if you are Swedish, <centre> if you are British, etc., *all* write <center>, otherwise it doesn't work! Likewise, to quote in Usenet post we use >, to double quote, >>, and so on; to mark where the signature starts we use --, because otherwise highlighting/hiding of the quotes/signature doesn't work, because the clients are looking for those specific chars! In "Computer English", the de facto standard is ' and ", and it doesn't matter what books anyone read as kids. Because we are not doing that *now*! All of us have moved to a common culture which is common for practical reasons - it is not aesthetics or snobbism, it is reality - and there is no reason whatsoever to fight it. It only creates exactly the problems as was the very reason the OP had to write to this list. >> OK, that's a ridiculous example as it is extreme, >> while what we discuss now is perhaps trivial (' or >> ) - but in principle it is the same. The computer >> language is English, and as I showed - the man pages >> for ls and emacs, as well as the RFC excerpt, as >> well as all experience with mails and Usenet and >> programming culture - all show that in "Computer >> English", ' (not ) is correct. > > They are that way because they were written in the > dark age of ten thousand code pages and never updated > to Unicode. It doesn't matter. That's the way it is. Like the sentence I just wrote. I don't care why the English word for "way" is "way". It just is, and it is very, very unpractical and extremely arrogant for anyone to say, I don't like it to be "way", for no reason whatsoever save for aesthetics (which isn't a consensus by the way) I like it to be "yaw" - and the argument for changing, is that there are (of course!) historical roots for the word "way" being "way" - if someone had thought about it really hard (and exactly like me, today) he or she would have decided the word for "way" should be "yaw" --- it doesn't make any sense! > They exist *because* there was a certain technical > limitation in the last fifty years or so. Since this > limitation has been removed, there is no reason for > them. They do not exist because there was a technical limitation fifty years ago. They exist, today, because they are useful, today! > I believe users of the VGA text console are > intelligent beings and respect their decision to > suffer. Forget it. I have Gnus configured to transparently replace your goofy chars with the correct ones. > Otherwise, primarily, the material will be read by a > human being, and only secondarily in a computer > program. I wish for a future where the Web replaces > the printed book Lunacy. > therefore, the Web must do all things books do, and > then some. The web can already do that in principle but that doesn't mean books, papers, libraries, and so on will disappear. That's a horrible thought but luckily it won't happen. > If I have to read a printed document, every straight > quote, every hyphen used in place of a dash, every > uneven space, pulls me out of the flow. The only way > for me to stop thinking about the characters is if > they are exactly as in a book typeset by a skilled > typesetter on a pre-computer-era press. Yes, this is only snobbism and aesthetics for the sake of it. This is what I have expected from day one. Yes, LaTeX can produce very good looking documents and I have spent countless of hours in that department - but that you isn't able to read a book without it is just - I don't know. It is not reality. In reality you read what you have to read. >> when you program and write in English (like now), >> don't you use the US keyboard layout? That's what I >> do to get the brackets and the semicolon and all >> that with no fuss - it is not that I use the Swedish >> chars that much, anyway! (Which is again the whole >> point.) And with the US layout, ' (and so on) are >> easier to type than the chars you suggest. > > The difference between ' and AltGr+' is almost > negligible for me. We don't have to "almost" that: ' is one key, AltGr+' is two. > I do understand we have engaged in a holy war not > directly related to the original posters > problem. Lets agree to disagree. The OP had a problem because he used the incorrect chars. While the spellchecker still should cope, I still haven't heard one argument that makes sense why anyone should benefit from those goofy chars. -- underground experts united: http://user.it.uu.se/~embe8573 ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Getting Emacs to play nice with Hunspell and apostrophes 2014-06-14 16:13 ` Emanuel Berg @ 2014-06-16 15:35 ` Joost Kremers 2014-06-17 2:21 ` Garreau, Alexandre [not found] ` <mailman.3799.1402971688.1147.help-gnu-emacs@gnu.org> 2 siblings, 0 replies; 41+ messages in thread From: Joost Kremers @ 2014-06-16 15:35 UTC (permalink / raw) To: help-gnu-emacs Emanuel Berg wrote: > The OP had a problem because he used the incorrect > chars. While the spellchecker still should cope, I > still haven't heard one argument that makes sense why > anyone should benefit from those goofy chars. That is becaues you define the word "benefit" in your own way and refuse to accept that where you only see needless "goofiness", others actually see benefit. As for the topic of the discussion, Unicode is gradually replacing ASCII, and that will mean more and more people using typographic quotes. And more and more hoops to jump through for people that prefer to stick to older software that doesn't properly implement unicode. -- Joost Kremers joostkremers@fastmail.fm Selbst in die Unterwelt dringt durch Spalten Licht EN:SiS(9) ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Getting Emacs to play nice with Hunspell and apostrophes 2014-06-14 16:13 ` Emanuel Berg 2014-06-16 15:35 ` Joost Kremers @ 2014-06-17 2:21 ` Garreau, Alexandre [not found] ` <mailman.3799.1402971688.1147.help-gnu-emacs@gnu.org> 2 siblings, 0 replies; 41+ messages in thread From: Garreau, Alexandre @ 2014-06-17 2:21 UTC (permalink / raw) To: Emanuel Berg; +Cc: help-gnu-emacs [-- Attachment #1: Type: text/plain, Size: 9737 bytes --] On 2014-06-14 at 18:13, Emanuel Berg wrote: > Yuri Khan <yuri.v.khan@gmail.com> writes: >>>> Curly quotes (and, in Russian print tradition, double angle quotes) >>>> are what I am used to seeing in print and consider to be the >>>> correct way to write >>> OK, I believe you. However, the point I made with all people coming >>> from different cultures is that it doesn't matter where we are from >>> individually. When I went to school, I suppose I was most >>> comfortable with Swedish. But I'm not supposing we all switch to >>> Swedish! >> >> OK, so what? I expect that people of all cultures who were exposed to >> books printed before the advent of the computer and the word >> processor are used to typographic characters. > > I'm OK disagreeing but I want you to understand me. The point is: the > cultures are in this discussion irrelevant. If the cultures were what > decided things you should be speaking Russian and I Swedish. As we still do, most of our time. > We don't, because we have travelled to a common point so that when we > interact in the computer world, we are using the “Computer English” > language, which I have described several times now. No, we travelled to a common point where colonialism and oppression imposed English as a poor (for that purpose) international language we all make decades of more or less hard work to poorly learn (that’s less visible when we write, but that’s really visible when we speak). While we could just more think to constructed languages that we make some weeks or months to perfectly speak (and however, it were demonstrated it takes less time learning Esperanto and *then* English than just learning English alone). > This is the English in the man pages, in the RFCs, Oh, good point. But let’s agree to disagree on this standard of standards. > in the C code, Let’s bet how much time again C will stay around… before we move to something more powerful (some interesting ideas: <https://www.gnu.org/software/epsilon>), that we could make more neutral, or even where we could move syntactic representation of content separated of content itself (like MVC) and move complexity from the compiler toward the editor (and just letting the compiler doing things like JIT, native code caching or on-the-fly optimization). > in the HTML, and all that. In this language you don't write <mitten> > if you are Swedish, <centre> if you are British, etc. *all* write > <center>, otherwise it doesn't work! Because HTML is not language-neutral. But if you think HTML (and more generally XML, and even more generally things based on XML like XMPP) is well made and really efficient, you have some problems. > Likewise, to quote in Usenet post we use >, to double quote, >>, and > so on; to mark where the signature starts we use --, because otherwise > highlighting/hiding of the quotes/signature doesn't work, because the > clients are looking for those specific chars! Yes, standards. But standards aren’t necessarily not language-neutral. Just like TCP/IP *is* a world wide standard and *is* language-neutral (since it’s binary, for performance and simplicity reasons). > In “Computer English”, the de facto standard is ' and ", and it > doesn't matter what books anyone read as kids. Yes it matters, because here we speak English, not a programming language that’s based on English. > Because we are not doing that *now*! All of us have moved to a common > culture which is common for practical reasons For causes, but not for reasons. Otherwise we would be speaking a Lojban with a more logical alphabet and base 12. > — it is not aesthetics or snobbism, it is reality — It is tradition. But when tradition stays too much time reality we have a problem. > and there is no reason whatsoever to fight it. Efficiency, readability, etc. all these things that help to increase our every-days freedom. >>> OK, that's a ridiculous example as it is extreme, while what we >>> discuss now is perhaps trivial (' or ) — but in principle it is the >>> same. The computer language is English, and as I showed — the man >>> pages for ls and emacs, as well as the RFC excerpt, as well as all >>> experience with mails and Usenet and programming culture — all show >>> that in “Computer English”, ' (not ) is correct. >> >> They are that way because they were written in the dark age of ten >> thousand code pages and never updated to Unicode. > > It doesn't matter. That's the way it is. Like the sentence I just > wrote. I don't care why the English word for “way” is “way”. Just as people don’t care about what’s an operating system, a cli, etc. > It just is, Yeah, it is magical. Just as people consider computers, you consider language. Except language is really a more general and important thing than just “computing”. Because the notion of “language” include many concept of “computing”. > and it is very, very unpractical and extremely arrogant for anyone to > say, I don't like it to be “way”, for no reason whatsoever save for > aesthetics (which isn't a consensus by the way) I like it to be “yaw” Esperantists, and Lojbanists, and all people working on language are doing that “arrogant” thing, and they proved it is a lot more practical than what people do by default —that to say: almost nothing. > - and the argument for changing, is that there are (of course!) > historical roots for the word “way” being “way” — if someone had > thought about it really hard (and exactly like me, today) he or she > would have decided the word for “way” should be “yaw” — it doesn't > make any sense! Yes it doesn’t, and that’s a reason for changing. Because we’re doing a lot of unpractical things every days, and changing, “progressing” allows us to gain more freedom. >> They exist *because* there was a certain technical >> limitation in the last fifty years or so. Since this >> limitation has been removed, there is no reason for >> them. > > They do not exist because there was a technical > limitation fifty years ago. They exist, today, because > they are useful, today! No, they’re useless and unpractical, they always were, and they always will. >> I believe users of the VGA text console are >> intelligent beings and respect their decision to >> suffer. > > Forget it. I have Gnus configured to transparently > replace your goofy chars with the correct ones. Thanks for the idea, I’m going to do the opposite. How did you do? >> therefore, the Web must do all things books do, and >> then some. > > The web can already do that in principle but that > doesn't mean books, papers, libraries, and so on will > disappear. That's a horrible thought but luckily it > won't happen. Just as *calligraphy* didn’t disappear with printer invention. But since you need *one lifetime* to write a calligraphied big book (let’s say, some documentation), and since there’s a *looooooot* more interesting things to do (just like reading all sorts of the really interesting things human beings can write all across the globe), we just *all* read printed books. For the same reasons, we will *almost* (but like calligraphy, some will continue for the sake of the art, and “snobbism and aesthetics” just like you like to say) stop to print books as soon as printers will stop being obsessed with money, editors with proprietary coercion, and computer makers to not-pluggable OLED screens (planned obsolescence and profit optimization) and eInk patents. >> If I have to read a printed document, every straight >> quote, every hyphen used in place of a dash, every >> uneven space, pulls me out of the flow. The only way >> for me to stop thinking about the characters is if >> they are exactly as in a book typeset by a skilled >> typesetter on a pre-computer-era press. > > Yes, this is only snobbism and aesthetics for the sake of it. All this is just studied for readability, to read better, to read quicker, to read the more. That’s pragmatism. >>> when you program and write in English (like now), >>> don't you use the US keyboard layout? That's what I >>> do to get the brackets and the semicolon and all >>> that with no fuss - it is not that I use the Swedish >>> chars that much, anyway! (Which is again the whole >>> point.) And with the US layout, ' (and so on) are >>> easier to type than the chars you suggest. >> >> The difference between ' and AltGr+' is almost >> negligible for me. > > We don't have to "almost" that: ' is one key, AltGr+' > is two. But you press AltGr with the thumb, and the thumb is made to be moved without disturbing the rest of the hand (you know, to *take* objects, that thing monkeys and primates can and other mammals can’t) so when you use your thumb to use a modifier it is biomechanically equivalent to just press one key, not two. That’s the reason why more modifiers should be near the thumb. >> I do understand we have engaged in a holy war not >> directly related to the original posters >> problem. Lets agree to disagree. > > The OP had a problem because he used the incorrect > chars. While the spellchecker still should cope, I > still haven't heard one argument that makes sense why > anyone should benefit from those goofy chars. Because they make text more readable and understandable. Then you can disagree, refuse to see the importance of details, it is your right. But it is our right to have our software working well for the rest of us, like we want. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 948 bytes --] ^ permalink raw reply [flat|nested] 41+ messages in thread
[parent not found: <mailman.3799.1402971688.1147.help-gnu-emacs@gnu.org>]
* Re: Getting Emacs to play nice with Hunspell and apostrophes [not found] ` <mailman.3799.1402971688.1147.help-gnu-emacs@gnu.org> @ 2014-06-17 2:41 ` Rusi 2014-06-17 3:05 ` Rusi 0 siblings, 1 reply; 41+ messages in thread From: Rusi @ 2014-06-17 2:41 UTC (permalink / raw) To: help-gnu-emacs On Tuesday, June 17, 2014 7:51:07 AM UTC+5:30, Garreau, Alexandre wrote: > On 2014-06-14 at 18:13, Emanuel Berg wrote: > > in the C code, > Let's bet how much time again C will stay around... before we move to > something more powerful (some interesting ideas: > <https://www.gnu.org/software/epsilon>), that we could make more Epsilon looks interesting but too theoretical, so it probably ends up making the opposite case to the one you want to Alexandre. Agda and Julia are two recent languages with strongly increasing popularity that would make your case better. Agda is the <<type-hackery lab>> of haskell Julia aims to get the ease-of-use of scripting languages with the efficiency of C/FORTRAN specifically for modern parallel hardware Agda http://wiki.portal.chalmers.se/agda/pmwiki.php?n=Docs.UnicodeInput Julia http://iaindunning.com/2014/julia-unicode.html ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Getting Emacs to play nice with Hunspell and apostrophes 2014-06-17 2:41 ` Rusi @ 2014-06-17 3:05 ` Rusi 0 siblings, 0 replies; 41+ messages in thread From: Rusi @ 2014-06-17 3:05 UTC (permalink / raw) To: help-gnu-emacs Some immediate evidence that unicode is not exactly stable yet… I wrote: As a noob member of the «enthusiastically embrace unicode» camp The LEFT POINTING DOUBLE ANGLE QUOTATION MARK (guillemet) stayed probably because of this: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable However when I wrote the guillemet here, GG messed it: Agda is the <<type-hackery lab>> of haskell it didn't I guess because of : Content-Type: text/plain; charset=ISO-8859-1 So evidently google groups is doing exactly the opposite of what it should be doing. Its trying very hard NOT to use UTF-8 [And now to force UTF-8 in this message, let me sign in devanagari: रुसि मोदि — Rusi Mody ] ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Getting Emacs to play nice with Hunspell and apostrophes 2014-06-14 11:14 ` Emanuel Berg 2014-06-14 14:51 ` Yuri Khan [not found] ` <mailman.3651.1402757512.1147.help-gnu-emacs@gnu.org> @ 2014-06-17 1:46 ` Garreau, Alexandre 2 siblings, 0 replies; 41+ messages in thread From: Garreau, Alexandre @ 2014-06-17 1:46 UTC (permalink / raw) To: Emanuel Berg; +Cc: help-gnu-emacs [-- Attachment #1: Type: text/plain, Size: 786 bytes --] On 2014-06-14 at 13:14, Emanuel Berg wrote: > Yuri Khan <yuri.v.khan@gmail.com> writes: > Yes, but when you program and write in English (like > now), don't you use the US keyboard layout? Using the US Dvorak keyboard layout is more efficient anyway, and easier to learn than the horrible Qwerty. > And with the US layout, ' (and so on) are easier to type than the > chars you suggest. The keyboard (layout) should adapt to you, not the opposite. You shouldn’t be the slave of your keyboard layout. And anyway if you used US Dvorak layout, you could just use Programer Dvorak layout for programming, where “computer English” symbols are really accessible, while making clean English symbols such as “”‘’… more accessible with the US Dvorak layout. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 948 bytes --] ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Getting Emacs to play nice with Hunspell and apostrophes [not found] ` <mailman.3627.1402724759.1147.help-gnu-emacs@gnu.org> 2014-06-14 11:14 ` Emanuel Berg @ 2014-06-15 2:48 ` Sergio Pokrovskij 2014-06-17 1:30 ` Garreau, Alexandre 1 sibling, 1 reply; 41+ messages in thread From: Sergio Pokrovskij @ 2014-06-15 2:48 UTC (permalink / raw) To: help-gnu-emacs I can't tell you how much I dislike the ugly quotes in Emacs Info, e.g. ╭──── │ `C-c C-a (`org-attach')' │ The dispatcher for commands related to the attachment system. ╰──── I always use paired quotes, but normally I use the ASCII apostrophe. I admit this causes problems for the speller with e.g. the Wikipedia convention about its representation of ''italics'', '''bold face''' etc. >>>>> "Yuri" == Yuri Khan skribis: [...] Yuri> Now let me tell you how I do curly quotes. Yuri> First, with my right thumb, I hold the AltGr Yuri> modifier. Then, I press k and l in sequence to get a Yuri> balanced pair of double curly quotes, or ; and ' for Yuri> single quotes (I customized my xkb configuration files to Yuri> get this but it works similarly with the out-of-the-box Yuri> config). In Emacs I use (on both Linux and MS Windows): C-c 6 to produce the English 66-99 pair “_” C-c 9 to produce the German 99-66 pair „_“ C-c " to produce the French angular pair «_» The point gets positioned in between: --8<---------------cut here---------------start------------->8--- (defun insert-66-99 () "Make a pair of 66-99 quotes and be positioned to type inside." (interactive) (insert "“”") (backward-char)) (global-set-key "\C-c6" 'insert-66-99) --8<---------------cut here---------------end--------------->8--- -- Sergio ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Getting Emacs to play nice with Hunspell and apostrophes 2014-06-15 2:48 ` Sergio Pokrovskij @ 2014-06-17 1:30 ` Garreau, Alexandre 0 siblings, 0 replies; 41+ messages in thread From: Garreau, Alexandre @ 2014-06-17 1:30 UTC (permalink / raw) To: help-gnu-emacs [-- Attachment #1: Type: text/plain, Size: 708 bytes --] On 2014-06-15 at 04:48, Sergio Pokrovskij wrote: > I can't tell you how much I dislike the ugly quotes in Emacs Info, It would be great to have Emacs Info doing like TeX and replacing `' by “”. > skribis Let’s recall some things works better for universality of human language :) (actually I prefer lojban (even if I can still criticize it too), but Esperanto is still better than any other non-constructed language). > C-c " to produce the French angular pair «_» You forgot fine unbreakable spaces, that are used in clean French typography inside French angular quotes, just like this: « _ » (but most of time normal unbreakable spaces are used, like that: « _ »). [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 948 bytes --] ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Getting Emacs to play nice with Hunspell and apostrophes 2014-06-12 5:43 ` Yuri Khan 2014-06-12 12:51 ` Stefan Monnier @ 2014-06-12 16:58 ` Eli Zaretskii 1 sibling, 0 replies; 41+ messages in thread From: Eli Zaretskii @ 2014-06-12 16:58 UTC (permalink / raw) To: help-gnu-emacs > Date: Thu, 12 Jun 2014 12:43:24 +0700 > From: Yuri Khan <yuri.v.khan@gmail.com> > Cc: "help-gnu-emacs@gnu.org" <help-gnu-emacs@gnu.org> > > From a spell checker, in particular, I expect that it should > (perhaps with an optional switch) be able to flag as error any > spelling of “isn’t” where the character between n and t is not the > preferred apostrophe character U+2019. You cannot expect that from a speller. You should expect that from people who produce the dictionaries for the speller, because it's the dictionary files that tell the speller which characters can and cannot appear in a word, and which suffixes can and cannot be appended to a word for it to remain correctly spelled. Hunspell already supports all that, it's just your dictionary that doesn't. Look at the *.aff files to understand how the ' apostrope works, and you will see why the speller is not the issue here. ^ permalink raw reply [flat|nested] 41+ messages in thread
[parent not found: <mailman.3473.1402551809.1147.help-gnu-emacs@gnu.org>]
* Re: Getting Emacs to play nice with Hunspell and apostrophes [not found] ` <mailman.3473.1402551809.1147.help-gnu-emacs@gnu.org> @ 2014-06-14 1:35 ` Emanuel Berg 2014-06-14 2:38 ` Emanuel Berg 1 sibling, 0 replies; 41+ messages in thread From: Emanuel Berg @ 2014-06-14 1:35 UTC (permalink / raw) To: help-gnu-emacs Yuri Khan <yuri.v.khan@gmail.com> writes: > The fact that everybody uses " and ' and ` is a > historical artifact, a workaround of sorts, due to > the limitations of the mechanical typewriter. We need > not be affected by it any more. > > There was no possibility of including all the > required typographical characters or accented letters > into the printing ball, so both quotes (“ and ”) and > the diaeresis got conflated into a straight quote ", > both single quotes (‘ and ’) into a straight single > quote/apostrophe ', and the backtick ` and tilde ~ > were there to facilitate typing accented letters. > > This limitation then crept into computers, because > this way the character set could be encoded in 7 > bits. The computer keyboard was just modeled after > the typewriter keyboard, with a few extensions. > > Then the inevitable struck: computers expanded from > the US and UK into Germany, Sweden, Finland, France, > Canada, and then countries with non-Latin scripts > (Greek, Cyrillic, and CJK). And all of them wanted to > have dedicated code points for their characters, > e.g. type a single ä instead of [a, > backspace-no-delete, "]. > > For a good while, we lived in a nightmare of ten > thousand code pages. In Russia, you could receive an > email and see a jumble of utterly meaningless words > because the message could be re-encoded (or the > Content-Type charset= stripped or re-labeled) on any > of the intermediate servers; there existed programs > which were able to heuristically detect the chain of > re-encodings applied on the way and decode your > message for you. You could order a book in an > Internet shop, have them completely b0rk up the > encoding of the shipping address: > http://cdn.imagepush.to/in/625x2090/i/3/30/301/24.jpg > Then somebody at the postal system might decode the > characters and the package would still be delivered > at the intended address. > > Now that every widely used operating system supports > Unicode, we don’t have an excuse for clinging to > those workarounds of the past century. We are not > limited by the 7-bit ASCII encoding and can store > texts in their true form. We also are not constrained > by the typewriter keyboard, having input methods > based on Compose or Level3 allowing us to > conveniently enter all the necessary diverse > characters. On X11/GNU/Linux in particular it comes > bundled with the system; on Windows, one has to > install a third-party package. > > Much of the software has already evolved to support > Unicode. That which hasn’t, has to catch up. From a > spell checker, in particular, I expect that it should > (perhaps with an optional switch) be able to flag as > error any spelling of “isn’t” where the character > between n and t is not the preferred apostrophe > character U+2019. First, let me tell you I very much appreciated this post! We agree that ', ", and the rest of the non-Unicode chars that may (not) be used in more or less the same context - we agree that those are there (not there) for techno-historical reasons. Where we *don't* agree is that you think that, if I'm allowed to pseudo-quote you: - Today, now that there aren't any technical limitations, we should go for the more advanced chars. Here is where I say: Just because it is possible, doesn't mean it is desired if there is no gain. It is possible to change all the software in the world to be able to use those chars. But why? For the reasons you stated, in the Internet and Usenet and otherwise computer culture, many, many people have come to use English, and the 7- (or 8) bits chars have spread and became a de facto standard. So people's eyes and brains and fingers are trained to use those. We have all came together from different starting points. The UK and US people had to go the shortest way (as the pioneers, perhaps they earned it). The Swedes had to learn English. The Russians had to go somewhat further because Russian is farther from English than Swedish. And so on. So when we finally have something in common - why break it just because it is possible? With some computer languages like Java it is possible for me to program in Swedish, using the ä, å, and ö. But why would I want to do that? It would bring havoc to my brain as the rest of the language would still be English. But more importantly, it would isolate my program from the rest of the world. I couldn't communicate about it (ask questions, tell people about it with the support of code snippets, etc.) and it couldn't be configured/extended by a non-Swedish speaking person. So I'll just stick to C, in English. Just as I will stick to ' as that is the correct way (as I see it) to write in "Computer English". -- underground experts united: http://user.it.uu.se/~embe8573 ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Getting Emacs to play nice with Hunspell and apostrophes [not found] ` <mailman.3473.1402551809.1147.help-gnu-emacs@gnu.org> 2014-06-14 1:35 ` Emanuel Berg @ 2014-06-14 2:38 ` Emanuel Berg 2014-06-14 7:11 ` Yuri Khan [not found] ` <mailman.3630.1402729917.1147.help-gnu-emacs@gnu.org> 1 sibling, 2 replies; 41+ messages in thread From: Emanuel Berg @ 2014-06-14 2:38 UTC (permalink / raw) To: help-gnu-emacs Yuri Khan <yuri.v.khan@gmail.com> writes: > You could order a book in an Internet shop, have them > completely b0rk up the encoding of the shipping > address: > > http://cdn.imagepush.to/in/625x2090/i/3/30/301/24.jpg > > Then somebody at the postal system might decode the > characters and the package would still be delivered > at the intended address. Ha-ha, unbelievable! How did that happen? First you wrote in Russian at the Internet shop's web page - then it got like that because of them translating Unicode (?) to ISO-8859-1 (which is 8-bit, with the ASCII as its lower half) - ? Why didn't the Internet shop do it? Did they actually think that was a language or some transcription of Russian? How was it translated to Russian at the postal office? I can only make out the first line: Russia, Moscow. -- underground experts united: http://user.it.uu.se/~embe8573 ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Getting Emacs to play nice with Hunspell and apostrophes 2014-06-14 2:38 ` Emanuel Berg @ 2014-06-14 7:11 ` Yuri Khan [not found] ` <mailman.3630.1402729917.1147.help-gnu-emacs@gnu.org> 1 sibling, 0 replies; 41+ messages in thread From: Yuri Khan @ 2014-06-14 7:11 UTC (permalink / raw) To: Emanuel Berg; +Cc: help-gnu-emacs@gnu.org On Sat, Jun 14, 2014 at 9:38 AM, Emanuel Berg <embe8573@student.uu.se> wrote: > Yuri Khan <yuri.v.khan@gmail.com> writes: > >> You could order a book in an Internet shop, have them >> completely b0rk up the encoding of the shipping >> address: >> >> http://cdn.imagepush.to/in/625x2090/i/3/30/301/24.jpg >> >> Then somebody at the postal system might decode the >> characters and the package would still be delivered >> at the intended address. > > Ha-ha, unbelievable! How did that happen? First you > wrote in Russian at the Internet shop's web page - then > it got like that because of them translating Unicode > (?) to ISO-8859-1 (which is 8-bit, with the ASCII as > its lower half) - ? Why didn't the Internet shop do it? First I must say it’s not mine and likely not a common occurrence for the Russian Post which is nowadays notorious for its lack of customer orientedness. In technical terms, I can think of the following sequence of events: * The user comes to a website containing an order form. (The form contains a free input <textarea> for the street address and possibly an <input> for the recipient name, and a <select> for the country. The latter ensures that the word RUSSIE is printed in its legible form.) * The user enters her address and name into the web form, in Russian; also selects Russian Federation from the country dropdown. * The browser encodes the address in KOI8-R, one of the three code pages used in Russia. In this encoding, the string Москва (Moscow) has the following byte representation: ED CF D3 CB D7 C1. (The KOI8-R encoding was designed in such a way that it remains readable if the high bit is stripped: mOSKWA. Too bad the links were already 8-bit-clean at the time Harry Potter was published.) * The browser sends the form data to the web server, labeled as Content-Type: application/x-www-form-urlencoded; encoding=KOI8-r. (At that time, Unicode was not as ubiquitous as it is now; browsers operated in an encoding that best matched the user’s input.) * The web server passes the form data to the backend script (Perl CGI or possibly PHP running as a module). * The backend script disregards the encoding= parameter, reinterprets the string as if it were encoded in ISO-8859-1 (or possibly windows-1252, which is an extension of ISO-8859-1). The byte representation ED CF D3 CB D7 C1 decodes into íÏÓË×Á (small i with acute, capital I with diaeresis, capital O with acute, capital E with diaeresis, multiplication sign, capital A with acute). This string then gets stored in the database (which is likely configured to operate in ISO-8859-1 or windows-1251) and lives happily ever after. > Did they actually think that was a language or some > transcription of Russian? Most probably, at the time a human being at the merchant side got involved, the address was already mangled. They did not have the knowledge of Russian code pages, and decided to make a best reasonable effort — “send it as is and let those crazy Russians sort it out”. > How was it translated to > Russian at the postal office? I can only make out the > first line: Russia, Moscow. The package contains two pieces of information — the country name in French (RUSSIE) and the postal code 119415 — which get the package to the postal office 119415 at 14 Udaltsov street in Moscow, near the customer’s place of residence. (Postal codes are unique within Russia, the first three digits unambiguously identifying the city.) https://goo.gl/maps/4TK3D (pin at the post office building). The worker at the post office might be familiar with both the KOI8-R and Windows-1250 encoding tables, but that is highly unlikely. Alternatively, the worker might regard the mysteriously labeled package as a peculiar form of a substitution cypher puzzle. [Challenge Accepted] He takes a red pen and starts scribbling right on the package. * First, he notices that the two middle letters in the first word are identical, and guesses that this word must be Rossi[ya] (Russia). * This allows him to decode two letters of the next word, which can then be guessed as Moskva (Moscow) — what else could be * Substituting the known letters into the customer’s first name gives “Св***а**” (Sv***a**), which the postal worker recognizes as Svetlana (a fairly common Russian feminine name, and the most common of those starting with Sv). (The last letter does not match because of grammatical case declension.) * This now gives enough information to decode and guess the street as pr. Vernadskogo and deliver the package to the Moscow State University dormitory at Vernadskogo, 37 (other marker at the map linked above), room 1817-1. Probably also lecture Svetlana that, until all web sites embrace Unicode, it’s safer to write your address in transliteration. Now, while this all makes for great war stories, it Should Not be necessary. Unicode should be used in all stages of Internet shop order processing, and addresses written in any local language should be deliverable without post office workers having to solve a challenge. ^ permalink raw reply [flat|nested] 41+ messages in thread
[parent not found: <mailman.3630.1402729917.1147.help-gnu-emacs@gnu.org>]
* Re: Getting Emacs to play nice with Hunspell and apostrophes [not found] ` <mailman.3630.1402729917.1147.help-gnu-emacs@gnu.org> @ 2014-06-14 11:20 ` Emanuel Berg 0 siblings, 0 replies; 41+ messages in thread From: Emanuel Berg @ 2014-06-14 11:20 UTC (permalink / raw) To: help-gnu-emacs Yuri Khan <yuri.v.khan@gmail.com> writes: > First I must say it’s not mine and likely not a > common occurrence for the Russian Post which is > nowadays notorious for its lack of customer > orientedness. That was my first thought, what cool guys you have working the mails! > Now, while this all makes for great war stories, it > Should Not be necessary. Unicode should be used in > all stages of Internet shop order processing, and > addresses written in any local language should be > deliverable without post office workers having to > solve a challenge. Of course I agree, however sending your MGU address to some company is sort of beyond this discussion. Of course there should be ways to communicate in Russian with computers. Yes, a very cool story! -- underground experts united: http://user.it.uu.se/~embe8573 ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Getting Emacs to play nice with Hunspell and apostrophes [not found] <mailman.3187.1402155569.1147.help-gnu-emacs@gnu.org> 2014-06-11 0:04 ` Emanuel Berg @ 2014-06-17 2:12 ` Rusi 2014-06-17 2:33 ` Garreau, Alexandre 1 sibling, 1 reply; 41+ messages in thread From: Rusi @ 2014-06-17 2:12 UTC (permalink / raw) To: help-gnu-emacs On Tuesday, June 17, 2014 7:12:11 AM UTC+5:30, Garreau, Alexandre wrote: > On 2014-06-14 at 16:51, Yuri Khan wrote: > > The GCC error messages in the en_US.utf8 locale, on the other hand, do > > use curly quotes. > Indeed, just because “computer English” is made for computers, not human > beings, who prefer to have readable text, just like it was before > typewriters. > > OK, I do not suggest that Perl should drop its backtick operator or > > that computer languages universally start using curly quotes for > > character and string literals (although that would make many languages > > more elegant by simplifying parsing). But how about we reserve all > > these artificial characters for computer languages, one of which > > English is not. > Having more language neutral programming languages would be cool, even > languages based on semantic interpretation of binary data that would > move the complexity of syntactic representation of its content from data > toward editor would be really more useful, clean, simple, egalitarian, > etc. Interesting thread that I missed… As a noob member of the «enthusiastically embrace unicode» camp Ironically I was introduced to the possibility of using unicode by gmail tantalizingly showing me an अ [devanagari letter A] Later on however Ive found gmail too clever in how it transliterates eg a into अ. emacs is more predictable. So now I type into emacs and paste into gmail if necessary. So I'd like to express my thanks that emacs is doing unicode very well And now that programming languages — the original forté of emacs — are beginning to get out of ASCII-hell, here are two of my blog posts. I started by writing http://blog.languager.org/2014/04/unicoded-python.html to express my wishlist (for python) for getting out of ASCII-prison and into what you call a more 'neutral' frame¹ Discovered later that Haskell is already doing some of this http://blog.languager.org/2014/05/unicode-in-haskell-source.html [And a good deal more] And finally APL is making a resurgence: http://baruchel.hd.free.fr/apps/apl/ > > Otherwise, primarily, the material will be read by a human being, and > > only secondarily in a computer program. I wish for a future where the > > Web replaces the printed book, therefore, the Web must do all things > > books do, and then some. > I hope that by “the Web” you mean “the concept of the ensemble of linked > interpreted documents to read shared by the medium of computer networks > and read on computers interfaces”, not the poor current implementation > of it, which is still using obsolete and despotic client–server model > (<http://thewebmustdie.com/>, <http://secushare.org/>). > > Yes, LaTeX does a lot to produce a beautifully typeset printout from > > an ASCII source. This is not enough; I want that same beautiful > > typesetting on screen, in browser, in any page width I happen to have, > > in my favorite typeface and font size, without having to recompile the > > document. And at the same time, it does too much. It has to maintain, > > and document authors have to utilize, a multitude of workarounds that > > are caused by TeX not using Unicode internally. > Having something technically and typographically good like LaTeX, > semantic and interpreted like HTML and language-neutral like > markdown/any-binary-interpreted-format would be great. Yes its important that we start moving to xetex (luatex) where I can directly write α etc than \alpha. Just have to multiply this one char by the 100s that occur in proofs and we should see why the latter is clunky, ugly, unreadable, bug-spreading compared to the former PS [Travelling for a few days so may not respond to responses] ¹ Dare I say 'universal'? As math is the only language approaching universality known to humanity. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Getting Emacs to play nice with Hunspell and apostrophes 2014-06-17 2:12 ` Rusi @ 2014-06-17 2:33 ` Garreau, Alexandre 0 siblings, 0 replies; 41+ messages in thread From: Garreau, Alexandre @ 2014-06-17 2:33 UTC (permalink / raw) To: Rusi; +Cc: help-gnu-emacs [-- Attachment #1: Type: text/plain, Size: 1856 bytes --] On 2014-06-17 at 04:12, Rusi wrote: > On Tuesday, June 17, 2014 7:12:11 AM UTC+5:30, Garreau, Alexandre wrote: >> On 2014-06-14 at 16:51, Yuri Khan wrote: >>> Yes, LaTeX does a lot to produce a beautifully typeset printout from >>> an ASCII source. This is not enough; I want that same beautiful >>> typesetting on screen, in browser, in any page width I happen to have, >>> in my favorite typeface and font size, without having to recompile the >>> document. And at the same time, it does too much. It has to maintain, >>> and document authors have to utilize, a multitude of workarounds that >>> are caused by TeX not using Unicode internally. > >> Having something technically and typographically good like LaTeX, >> semantic and interpreted like HTML and language-neutral like >> markdown/any-binary-interpreted-format would be great. > > Yes its important that we start moving to XeteX (luatex) where I can > directly write α etc than \alpha. I know XeteX, but I wasn’t thinking to it… And yet LaTeX is not fully language-neutral because of command names (\emph, \textbf, \title, \section, etc.) and isn’t interpreted, and not reaaaaally semantic (since it’s only made to be compiled into a graphical thing). > ¹ Dare I say “universal”? As math is the only language approaching > universality known to humanity. Well, nothing is really universal (everything need a shared knowledge, thus, a culture). Even math, when it isn’t based on latin or greek language, stay based on occidental/arabic/indo-European culture and symbols. But we can artifically make universal things, just as we more or less did with lojban, or TCP/IP, etc. So what we can do is just invent new pieces of culture based on the most universal things we can, and avoiding linguistic/geographic/gender/class cultural biases. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 948 bytes --] ^ permalink raw reply [flat|nested] 41+ messages in thread
end of thread, other threads:[~2014-06-17 3:05 UTC | newest] Thread overview: 41+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-06-07 15:39 Getting Emacs to play nice with Hunspell and apostrophes Nikolai Weibull 2014-06-07 17:43 ` Robert Thorpe 2014-06-07 17:59 ` Yuri Khan 2014-06-07 18:18 ` Nikolai Weibull 2014-06-07 17:53 ` Sharon Kimble 2014-06-07 18:17 ` Eli Zaretskii 2014-06-07 18:28 ` Nikolai Weibull 2014-06-07 18:40 ` Eli Zaretskii 2014-06-07 19:59 ` Nikolai Weibull [not found] <mailman.3187.1402155569.1147.help-gnu-emacs@gnu.org> 2014-06-11 0:04 ` Emanuel Berg 2014-06-11 5:23 ` Nikolai Weibull [not found] ` <mailman.3375.1402464243.1147.help-gnu-emacs@gnu.org> 2014-06-11 14:24 ` Emanuel Berg 2014-06-11 15:03 ` Nikolai Weibull [not found] ` <mailman.3418.1402499010.1147.help-gnu-emacs@gnu.org> 2014-06-11 15:20 ` Emanuel Berg 2014-06-11 16:57 ` Teemu Likonen [not found] ` <mailman.3437.1402505846.1147.help-gnu-emacs@gnu.org> 2014-06-11 21:32 ` Emanuel Berg 2014-06-12 5:43 ` Yuri Khan 2014-06-12 12:51 ` Stefan Monnier 2014-06-12 13:36 ` Nikolai Weibull 2014-06-12 14:48 ` Stefan Monnier [not found] ` <mailman.3496.1402580195.1147.help-gnu-emacs@gnu.org> 2014-06-14 1:49 ` Emanuel Berg 2014-06-14 5:45 ` Yuri Khan [not found] ` <mailman.3627.1402724759.1147.help-gnu-emacs@gnu.org> 2014-06-14 11:14 ` Emanuel Berg 2014-06-14 14:51 ` Yuri Khan 2014-06-14 15:26 ` Teemu Likonen 2014-06-17 1:42 ` Garreau, Alexandre [not found] ` <mailman.3651.1402757512.1147.help-gnu-emacs@gnu.org> 2014-06-14 16:13 ` Emanuel Berg 2014-06-16 15:35 ` Joost Kremers 2014-06-17 2:21 ` Garreau, Alexandre [not found] ` <mailman.3799.1402971688.1147.help-gnu-emacs@gnu.org> 2014-06-17 2:41 ` Rusi 2014-06-17 3:05 ` Rusi 2014-06-17 1:46 ` Garreau, Alexandre 2014-06-15 2:48 ` Sergio Pokrovskij 2014-06-17 1:30 ` Garreau, Alexandre 2014-06-12 16:58 ` Eli Zaretskii [not found] ` <mailman.3473.1402551809.1147.help-gnu-emacs@gnu.org> 2014-06-14 1:35 ` Emanuel Berg 2014-06-14 2:38 ` Emanuel Berg 2014-06-14 7:11 ` Yuri Khan [not found] ` <mailman.3630.1402729917.1147.help-gnu-emacs@gnu.org> 2014-06-14 11:20 ` Emanuel Berg 2014-06-17 2:12 ` Rusi 2014-06-17 2:33 ` Garreau, Alexandre
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).