* bug#7668: ispell and dictionary encodings
@ 2010-12-17 18:30 Reuben Thomas
2010-12-20 11:31 ` Agustin Martin
0 siblings, 1 reply; 5+ messages in thread
From: Reuben Thomas @ 2010-12-17 18:30 UTC (permalink / raw)
To: 7668
I've just been puzzling my way through ispell.gz's dictionary encoding
code, after switching from aspell to hunspell in order to be able to
treat Unicode curly single quotes as normal intraword punctuation
(which it seems aspell cannot be persuaded to do, but that's another
story).
I noticed a feature of ispell-dictionary-base-alist, which I don't
understand: the last (7th) element of each dictionary definition is
called "Coding System", which seems to be the coding system of the
case character and non-case-character strings, but it is also passed
to the spelling program as the input encoding, which is wrong, since
the input encoding depends on the file to be checked.
I currently use the classic workaround of making up my own dictionary
definition which includes accented characters that I want to be able
to use in words (which is necessary anyway), and which specifies utf-8
as the coding system. This only works because I use utf-8 for all my
text files.
It seems, therefore, that the argument to follow
ispell-encoding8-command (which itself is mis-documented:
Command line option prefix to select UTF-8 if supported, nil otherwise.
If UTF-8 if supported by spellchecker and is selectable from the command line
this variable will contain \"--encoding=\" for aspell and \"-i \" for hunspell,
so UTF-8 or other mime charsets can be selected. That will be set for hunspell
>=1.1.6 or aspell >= 0.60 in `ispell-check-version'.
It is not just for selecting UTF-8; indeed, that's the irony: in the
default configuration it's used mostly to select 8-bit character sets!
And there are one or two other typos. How about (suitably rewrapped):
Command line option prefix to select coding system if supported, nil otherwise.
If the coding system is selectable from the command line
this variable will contain \"--encoding=\" for aspell and \"-i \" for hunspell,
so that the input encoding can be selected. That will be set for hunspell
>= 1.1.6 or aspell >= 0.60 in `ispell-check-version'.
Then, the following code in ispell-start-process:
;; If we are using recent aspell or hunspell, make sure we use the
right encoding
;; for communication. ispell or older aspell/hunspell does not support this
(if ispell-encoding8-command
(setq args
(append args
(list
(concat ispell-encoding8-command
(symbol-name (ispell-get-coding-system)))))))
needs fixing: rather than using ispell-get-coding-system, it should
use a prefix of buffer-file-coding-system (without the suffix that
specifies the line ending).
I'm sure I'm missing things here, but if what I've said above makes
any sense, I'd like to help refine it into a sensible proposal to
improve ispell.el.
--
http://rrt.sc3d.org
^ permalink raw reply [flat|nested] 5+ messages in thread
* bug#7668: ispell and dictionary encodings
2010-12-17 18:30 bug#7668: ispell and dictionary encodings Reuben Thomas
@ 2010-12-20 11:31 ` Agustin Martin
[not found] ` <AANLkTi=gk2W44z9ghqi72Ls5Zi9-hJr5jRwQrHKUvgD5@mail.gmail.com>
0 siblings, 1 reply; 5+ messages in thread
From: Agustin Martin @ 2010-12-20 11:31 UTC (permalink / raw)
To: Reuben Thomas, 7668
On Fri, Dec 17, 2010 at 06:30:14PM +0000, Reuben Thomas wrote:
> I've just been puzzling my way through ispell.gz's dictionary encoding
> code, after switching from aspell to hunspell in order to be able to
> treat Unicode curly single quotes as normal intraword punctuation
> (which it seems aspell cannot be persuaded to do, but that's another
> story).
>
> I noticed a feature of ispell-dictionary-base-alist, which I don't
> understand: the last (7th) element of each dictionary definition is
> called "Coding System", which seems to be the coding system of the
> case character and non-case-character strings, but it is also passed
> to the spelling program as the input encoding, which is wrong, since
> the input encoding depends on the file to be checked.
That element represents the language that will be used for communication
with the dictionary. case-character and non-case-character strings should
be in the same encoding as it.
> I currently use the classic workaround of making up my own dictionary
> definition which includes accented characters that I want to be able
> to use in words (which is necessary anyway), and which specifies utf-8
> as the coding system. This only works because I use utf-8 for all my
> text files.
If you are not going to use XEmacs, but only FSF Emacs, just use [:alpha:]
for the case-character and non-case-character strings along with utf-8. That
is already done automatically for aspell dictionaries, where is easy to get
a list of installed dictionaries and additional info.
> It seems, therefore, that the argument to follow
> ispell-encoding8-command (which itself is mis-documented:
>
> Command line option prefix to select UTF-8 if supported, nil otherwise.
> If UTF-8 if supported by spellchecker and is selectable from the command line
> this variable will contain \"--encoding=\" for aspell and \"-i \" for hunspell,
> so UTF-8 or other mime charsets can be selected. That will be set for hunspell
> >=1.1.6 or aspell >= 0.60 in `ispell-check-version'.
>
> It is not just for selecting UTF-8; indeed, that's the irony: in the
> default configuration it's used mostly to select 8-bit character sets!
> And there are one or two other typos. How about (suitably rewrapped):
>
> Command line option prefix to select coding system if supported, nil otherwise.
> If the coding system is selectable from the command line
> this variable will contain \"--encoding=\" for aspell and \"-i \" for hunspell,
> so that the input encoding can be selected. That will be set for hunspell
> >= 1.1.6 or aspell >= 0.60 in `ispell-check-version'.
Agreed, thanks
> Then, the following code in ispell-start-process:
>
> ;; If we are using recent aspell or hunspell, make sure we use the
> right encoding
> ;; for communication. ispell or older aspell/hunspell does not support this
> (if ispell-encoding8-command
> (setq args
> (append args
> (list
> (concat ispell-encoding8-command
> (symbol-name (ispell-get-coding-system)))))))
>
> needs fixing: rather than using ispell-get-coding-system, it should
> use a prefix of buffer-file-coding-system (without the suffix that
> specifies the line ending).
No, current code is correct. It is telling the spellchecker that
communication with the dictionary will be done in (ispell-get-coding-system)
coding system. ispell.el will do the internal conversions needed for that in
a diferent place, so everything is transparent to the user.
> I'm sure I'm missing things here, but if what I've said above makes
> any sense, I'd like to help refine it into a sensible proposal to
> improve ispell.el.
Thanks for looking into this. Will prepare a change with the
`ispell-encoding8-command' documentation fix.
Regards,
--
Agustin
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2010-12-21 23:11 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-12-17 18:30 bug#7668: ispell and dictionary encodings Reuben Thomas
2010-12-20 11:31 ` Agustin Martin
[not found] ` <AANLkTi=gk2W44z9ghqi72Ls5Zi9-hJr5jRwQrHKUvgD5@mail.gmail.com>
2010-12-20 15:40 ` Reuben Thomas
2010-12-21 11:30 ` Agustin Martin
2010-12-21 23:11 ` Reuben Thomas
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.