* Hunspell for Japanese
@ 2018-02-17 13:53 Tak Kunihiro
2018-02-17 15:18 ` Eli Zaretskii
2018-02-18 5:31 ` Tak Kunihiro
0 siblings, 2 replies; 5+ messages in thread
From: Tak Kunihiro @ 2018-02-17 13:53 UTC (permalink / raw)
To: help-gnu-emacs; +Cc: 国広卓也
I want to spellcheck English phrases that are mixed in Japanese
phrases by `hunspell'. When I call M-x ispell-word, responses from `aspell' and
`hunspell' differ. The difference results in how underlines are drawn in
flyspell-mode. The `hunspell' gives many unnecessary underlines on Japanese phrases.
So I add following to my ~/.emacs.d/inits.el for now.
(defun flyspell-ignore-non-ascii (beg end info)
"Tell flyspell to ignore non ascii characters.
Call this on `flyspell-incorrect-hook'."
(string-match "[^!-~]" (buffer-substring beg end)))
(add-hook 'flyspell-incorrect-hook 'flyspell-ignore-non-ascii)
Is is possible to make `hunspell' behave like `aspell'?
GNU Emacs 25.3.1 (x86_64-apple-darwin13.4.0, NS appkit-1265.21 Version 10.9.5 (Build 13F1911))
of 2017-09-19
##
## Aspell
##
$ which aspell
/opt/local/bin/aspell
$ Emacs -Q
M-: (insert "Emacsは日本ではイーマックスと呼ばれる")
C-a
M-: (setq ispell-program-name "aspell")
M-x ispell-word
X-b *Messages*
> Starting new Ispell process /opt/local/bin/aspell with default dictionary...
> Checking spelling of EMACSは日本語ではイーマックスと呼ばれる...
> EMACSは日本語ではイーマックスと呼ばれる is correct
> You can run the command ‘ispell-word’ with M-$
##
## Hunspell
##
$ which hunspell
/opt/local/bin/hunspell
$ hunspell -D
...
/opt/local/share/hunspell/en_US
LOADED DICTIONARY:
/opt/local/share/hunspell/en_US.aff
/opt/local/share/hunspell/en_US.dic
Hunspell 1.6.2
$ Emacs -Q
M-: (insert "Emacsは日本ではイーマックスと呼ばれる")
C-a
M-: (setq ispell-program-name "hunspell")
M-x ispell-word
X-b *Messages*
> Starting new Ispell process hunspell with default dictionary...
> Checking spelling of EMACSは日本語ではイーマックスと呼ばれる...
> ispell-word: Ispell and its process have different character maps
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Hunspell for Japanese
2018-02-17 13:53 Hunspell for Japanese Tak Kunihiro
@ 2018-02-17 15:18 ` Eli Zaretskii
2018-02-18 5:31 ` Tak Kunihiro
1 sibling, 0 replies; 5+ messages in thread
From: Eli Zaretskii @ 2018-02-17 15:18 UTC (permalink / raw)
To: help-gnu-emacs
> From: Tak Kunihiro <tkk@misasa.okayama-u.ac.jp>
> Date: Sat, 17 Feb 2018 22:53:50 +0900
> Cc: 国広卓也 <tkk@misasa.okayama-u.ac.jp>
>
> I want to spellcheck English phrases that are mixed in Japanese
> phrases by `hunspell'. When I call M-x ispell-word, responses from `aspell' and
> `hunspell' differ. The difference results in how underlines are drawn in
> flyspell-mode. The `hunspell' gives many unnecessary underlines on Japanese phrases.
If your dictionary is for English, why do you expect flyspell-mode to
work correctly with words in another language? It can't do anything
sensible with such foreign words. The underlines flyspell-mode shows
in Japanese words when the dictionary is for English could be
anything; you should simply disregard any such underlines in
non-English words.
Can you tell why you pay attention to underlines in non-English words
in this situation?
> Is is possible to make `hunspell' behave like `aspell'?
They are very different programs, so they cannot behave the same.
> $ which hunspell
> /opt/local/bin/hunspell
> $ hunspell -D
> ...
> /opt/local/share/hunspell/en_US
> LOADED DICTIONARY:
> /opt/local/share/hunspell/en_US.aff
> /opt/local/share/hunspell/en_US.dic
> Hunspell 1.6.2
> $ Emacs -Q
> M-: (insert "Emacsは日本ではイーマックスと呼ばれる")
> C-a
> M-: (setq ispell-program-name "hunspell")
> M-x ispell-word
> X-b *Messages*
>
> > Starting new Ispell process hunspell with default dictionary...
> > Checking spelling of EMACSは日本語ではイーマックスと呼ばれる...
> > ispell-word: Ispell and its process have different character maps
I see the same message. It is caused by Hunspell somehow considering
the string "は日本語ではイーマックスと呼ばれる" as more than one word,
and it therefore returns 3 misspellings, which then trigger the above
cryptic error message.
But once again, you've set up flyspell-mode to work in English, so you
shouldn't pay attention to what it does with Japanese. For starters,
I believe the encoding Emacs uses is incorrect in that case, because
the en_US.aff file probably states that it wants a Latin-1 encoding,
not UTF-8. But even using UTF-8 will not help here, AFAIU.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Hunspell for Japanese
2018-02-17 13:53 Hunspell for Japanese Tak Kunihiro
2018-02-17 15:18 ` Eli Zaretskii
@ 2018-02-18 5:31 ` Tak Kunihiro
2018-02-18 15:59 ` Eli Zaretskii
2018-02-24 1:41 ` Tak Kunihiro
1 sibling, 2 replies; 5+ messages in thread
From: Tak Kunihiro @ 2018-02-18 5:31 UTC (permalink / raw)
To: help-gnu-emacs, eliz; +Cc: tkk
Thank you for the reply.
I see. It is true that I should not expect both Aspell and Hunspell
to handle Japanese correctly when their task is to check English. It
was just a lucky case how flyspell-mode with Aspell ignores Japanese
words and show no underlines.
> Can you tell why you pay attention to underlines in non-English
> words in this situation?
When I write Japanese, very often English words such for `Emacs' are
mixed. Thus I (I think most of Japanese) run flyspell-mode with
English dictionary all the time. I expect flyspell-mode ignores all
Japanese words and only checks English words like how LibreOffice
does.
With flyspell-mode with Hunspell, lines are shown under many Japanese
phrases (not all Japanese phases) and I cannot tell which underline
corresponds to misspelled English words. As inferred already, Aspell
only shows underline on wrong spelled English.
> But once again, you've set up flyspell-mode to work in English, so you
> shouldn't pay attention to what it does with Japanese.
I agree. I also see problem with M-x ispell-buffer, and noticed a
solution.
(defvar ispell-regexp-non-ascii "[^\000-\377]+"
"Regular expression to match a non-ascii word.")
(add-to-list 'ispell-skip-region-alist (list ispell-regexp-non-ascii))
Once I accept this solution for M-x spell-buffer, I would accept a
solution for flyspell-mode as shown below.
(defun flyspell-skip-non-ascii (beg end info)
"Tell flyspell to skip a non-ascii word.
Call this on `flyspell-incorrect-hook'."
(string-match ispell-regexp-non-ascii (buffer-substring beg end)))
(add-hook 'flyspell-incorrect-hook 'flyspell-skip-non-ascii)
It took me a while to figure this out. I think that what M-x
ispell-buffer and flyspell-mode provide is fundamental functionalities
and it is good to be documented in somewhere in Emacs such for (info
"(emacs) Spelling"). Can you give suggestion?
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Hunspell for Japanese
2018-02-18 5:31 ` Tak Kunihiro
@ 2018-02-18 15:59 ` Eli Zaretskii
2018-02-24 1:41 ` Tak Kunihiro
1 sibling, 0 replies; 5+ messages in thread
From: Eli Zaretskii @ 2018-02-18 15:59 UTC (permalink / raw)
To: help-gnu-emacs
> Date: Sun, 18 Feb 2018 14:31:56 +0900 (JST)
> Cc: tkk@misasa.okayama-u.ac.jp
> From: Tak Kunihiro <tkk@misasa.okayama-u.ac.jp>
>
> (defvar ispell-regexp-non-ascii "[^\000-\377]+"
> "Regular expression to match a non-ascii word.")
> (add-to-list 'ispell-skip-region-alist (list ispell-regexp-non-ascii))
>
> Once I accept this solution for M-x spell-buffer, I would accept a
> solution for flyspell-mode as shown below.
>
> (defun flyspell-skip-non-ascii (beg end info)
> "Tell flyspell to skip a non-ascii word.
> Call this on `flyspell-incorrect-hook'."
> (string-match ispell-regexp-non-ascii (buffer-substring beg end)))
> (add-hook 'flyspell-incorrect-hook 'flyspell-skip-non-ascii)
>
> It took me a while to figure this out. I think that what M-x
> ispell-buffer and flyspell-mode provide is fundamental functionalities
> and it is good to be documented in somewhere in Emacs such for (info
> "(emacs) Spelling"). Can you give suggestion?
On the Wiki?
You see, the solution you propose has one significant disadvantage: it
will skip words used in English prose which are written using
non-ASCII characters. It's true that there aren't many of those, but
they do exist.
You could try instead use 2 dictionaries at the same time, one for
English, the other for Japanese. This will only work with Hunspell,
and only in Emacs 26 or later. Caveat: I never tried it with these
two languages, so I don't know whether this combination has some
subtle problems with that feature.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Hunspell for Japanese
2018-02-18 5:31 ` Tak Kunihiro
2018-02-18 15:59 ` Eli Zaretskii
@ 2018-02-24 1:41 ` Tak Kunihiro
1 sibling, 0 replies; 5+ messages in thread
From: Tak Kunihiro @ 2018-02-24 1:41 UTC (permalink / raw)
To: help-gnu-emacs, eliz; +Cc: tkk
> On the Wiki?
OK. I put the solution on EmacsWiki.
https://www.emacswiki.org/emacs/FlySpell#toc14
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2018-02-24 1:41 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-02-17 13:53 Hunspell for Japanese Tak Kunihiro
2018-02-17 15:18 ` Eli Zaretskii
2018-02-18 5:31 ` Tak Kunihiro
2018-02-18 15:59 ` Eli Zaretskii
2018-02-24 1:41 ` Tak Kunihiro
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).