From: Agustin Martin <agustin.martin@hispalinux.es>
To: Reuben Thomas <rrt@sc3d.org>, 7668@debbugs.gnu.org
Subject: bug#7668: ispell and dictionary encodings
Date: Mon, 20 Dec 2010 12:31:48 +0100 [thread overview]
Message-ID: <20101220113148.GA12469@agmartin.aq.upm.es> (raw)
In-Reply-To: <AANLkTin26ZJupakNsWxgteT9A4TOGCZQtAc=H6OTGG7-@mail.gmail.com>
On Fri, Dec 17, 2010 at 06:30:14PM +0000, Reuben Thomas wrote:
> I've just been puzzling my way through ispell.gz's dictionary encoding
> code, after switching from aspell to hunspell in order to be able to
> treat Unicode curly single quotes as normal intraword punctuation
> (which it seems aspell cannot be persuaded to do, but that's another
> story).
>
> I noticed a feature of ispell-dictionary-base-alist, which I don't
> understand: the last (7th) element of each dictionary definition is
> called "Coding System", which seems to be the coding system of the
> case character and non-case-character strings, but it is also passed
> to the spelling program as the input encoding, which is wrong, since
> the input encoding depends on the file to be checked.
That element represents the language that will be used for communication
with the dictionary. case-character and non-case-character strings should
be in the same encoding as it.
> I currently use the classic workaround of making up my own dictionary
> definition which includes accented characters that I want to be able
> to use in words (which is necessary anyway), and which specifies utf-8
> as the coding system. This only works because I use utf-8 for all my
> text files.
If you are not going to use XEmacs, but only FSF Emacs, just use [:alpha:]
for the case-character and non-case-character strings along with utf-8. That
is already done automatically for aspell dictionaries, where is easy to get
a list of installed dictionaries and additional info.
> It seems, therefore, that the argument to follow
> ispell-encoding8-command (which itself is mis-documented:
>
> Command line option prefix to select UTF-8 if supported, nil otherwise.
> If UTF-8 if supported by spellchecker and is selectable from the command line
> this variable will contain \"--encoding=\" for aspell and \"-i \" for hunspell,
> so UTF-8 or other mime charsets can be selected. That will be set for hunspell
> >=1.1.6 or aspell >= 0.60 in `ispell-check-version'.
>
> It is not just for selecting UTF-8; indeed, that's the irony: in the
> default configuration it's used mostly to select 8-bit character sets!
> And there are one or two other typos. How about (suitably rewrapped):
>
> Command line option prefix to select coding system if supported, nil otherwise.
> If the coding system is selectable from the command line
> this variable will contain \"--encoding=\" for aspell and \"-i \" for hunspell,
> so that the input encoding can be selected. That will be set for hunspell
> >= 1.1.6 or aspell >= 0.60 in `ispell-check-version'.
Agreed, thanks
> Then, the following code in ispell-start-process:
>
> ;; If we are using recent aspell or hunspell, make sure we use the
> right encoding
> ;; for communication. ispell or older aspell/hunspell does not support this
> (if ispell-encoding8-command
> (setq args
> (append args
> (list
> (concat ispell-encoding8-command
> (symbol-name (ispell-get-coding-system)))))))
>
> needs fixing: rather than using ispell-get-coding-system, it should
> use a prefix of buffer-file-coding-system (without the suffix that
> specifies the line ending).
No, current code is correct. It is telling the spellchecker that
communication with the dictionary will be done in (ispell-get-coding-system)
coding system. ispell.el will do the internal conversions needed for that in
a diferent place, so everything is transparent to the user.
> I'm sure I'm missing things here, but if what I've said above makes
> any sense, I'd like to help refine it into a sensible proposal to
> improve ispell.el.
Thanks for looking into this. Will prepare a change with the
`ispell-encoding8-command' documentation fix.
Regards,
--
Agustin
next prev parent reply other threads:[~2010-12-20 11:31 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-12-17 18:30 bug#7668: ispell and dictionary encodings Reuben Thomas
2010-12-20 11:31 ` Agustin Martin [this message]
[not found] ` <AANLkTi=gk2W44z9ghqi72Ls5Zi9-hJr5jRwQrHKUvgD5@mail.gmail.com>
2010-12-20 15:40 ` Reuben Thomas
2010-12-21 11:30 ` Agustin Martin
2010-12-21 23:11 ` Reuben Thomas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101220113148.GA12469@agmartin.aq.upm.es \
--to=agustin.martin@hispalinux.es \
--cc=7668@debbugs.gnu.org \
--cc=rrt@sc3d.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.