From: Agustin Martin <agustin.martin@hispalinux.es>
To: emacs-devel@gnu.org
Subject: Re: Ispell and unibyte characters
Date: Fri, 13 Apr 2012 17:25:25 +0200 [thread overview]
Message-ID: <20120413152525.GA14949@agmartin.aq.upm.es> (raw)
In-Reply-To: <83d37c4vw5.fsf@gnu.org>
On Thu, Apr 12, 2012 at 10:01:30PM +0300, Eli Zaretskii wrote:
> I wrote:
> > I am still dealing with an open issue here. Some languages have non 7bit
> > wordchars, like Catalan middledot, and it should be converted to UTF-8 if
> > default communication language is changed to UTF-8.
>
> Sorry, I don't understand: do you mean "non 8-bit wordchars"? I don't
> think 7 bits is assumed anywhere.
I mean wordchars that cannot be represented in 7bit encoding, like Catalan
middledot (available in 8bit latin1)
> Assuming you did mean 8-bit, then why not use UTF-8 for Catalan from
> the get-go? Only some languages can use single-byte encodings, and
> evidently Catalan is not one of them. For that matter, why shouldn't
> aspell and hunspell use UTF-8 by default (something I already asked)?
[...]
> I don't understand what are you trying to accomplish by encoding
> OTHERCHARS in UTF-8. What exactly is the problem with them being
> encoded in some 8-bit encoding? Please explain.
Imagine a fake entry in the general list, either in ispell.el or provided
through `ispell-base-dicts-override-alist' (no accented chars for simplicity)
("catala8"
"[A-Za-z]" "[^A-Za-z]" "['\267-]" nil ("-B" "-d" "catalan") nil iso-8859-1)
Unless emacs knows the encoding for \267 (middledot "·") it cannot decode it
properly. I prefer to not use UTF-8 here, because I want the entry to also be
useful for ispell (and also be XEmacs incompatible). The best approach here
seems to decode the otherchars regexp according to provided coding-system.
I have noticed that there seems to be no need to encode the resulting string
in UTF-8, Emacs will know what to do with the decoded string.
I tested something like
(dolist (adict ispell-dictionary-alist)
(add-to-list 'tmp-dicts-alist
(list
(nth 0 adict) ; dict name
"[[:alpha:]]" ; casechars
"[^[:alpha:]]" ; not-casechars
(if ispell-encoding8-command
;; Decode 8bit otherchars if needed
(decode-coding-string (nth 3 adict) (nth 7 adict))
(nth 3 adict)) ; otherchars
(nth 4 adict) ; many-otherchars-p
(nth 5 adict) ; ispell-args
(nth 6 adict) ; extended-character-mode
(if ispell-encoding8-command
'utf-8
(nth 7 adict)))))
and seems to work well.
> I wrote:
> > but get a sgml-lexical-context error. Need to look more carefuly, so this
> > will take longer.
I have tested further and this seems to be an unrelated problem. Some time
ago I already noticed some problems with flyspell.el and sgml mode (in
particular psgml) regarding sgml-lexical-context error
sgml-lexical-context: Wrong type argument: stringp, nil
sometimes when running flyspell-buffer after enabling flyspell-mode. I am
also seing something like
Error in post-command-hook (flyspell-post-command-hook):
(wrong-type-argument stringp nil)
when enabling flyspell-mode from the beginning of my sgml buffer. Cannot
reproduce with emacs -Q, still trying to find where this comes from. Both
problems tested with emacs-snapshot_20120410.
For Debian I do not use sgml-lexical-context, but an improved version of old
regexp to try keeping things compatible with XEmacs. This seems to work well
and has some advantages over sgml-lexical-context
1) Is compatible with XEmacs
2) Is twice faster when using flyspell-buffer than sgml-lexical-context
3) Does not trigger above error.
I am considering to use this improved regexp instead of sgml-lexical-context
for above reasons, but this is another issue.
--
Agustin
next prev parent reply other threads:[~2012-04-13 15:25 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-03-17 18:46 Ispell and unibyte characters Eli Zaretskii
2012-03-26 17:39 ` Agustin Martin
2012-03-26 20:08 ` Eli Zaretskii
2012-03-26 22:07 ` Lennart Borgman
2012-03-28 19:18 ` Agustin Martin
2012-03-29 18:06 ` Eli Zaretskii
2012-03-29 21:13 ` Andreas Schwab
2012-03-30 6:28 ` Eli Zaretskii
2012-04-26 9:54 ` Eli Zaretskii
2012-04-10 19:08 ` Agustin Martin
2012-04-10 19:11 ` Eli Zaretskii
2012-04-12 14:36 ` Agustin Martin
2012-04-12 19:01 ` Eli Zaretskii
2012-04-13 15:25 ` Agustin Martin [this message]
2012-04-13 15:53 ` Eli Zaretskii
2012-04-13 16:38 ` Agustin Martin
2012-04-13 17:51 ` Stefan Monnier
2012-04-13 18:44 ` Agustin Martin
2012-04-14 1:57 ` Stefan Monnier
2012-04-15 0:02 ` Agustin Martin
2012-04-16 2:40 ` Stefan Monnier
2012-04-20 15:25 ` Agustin Martin
2012-04-20 15:36 ` Eli Zaretskii
2012-04-20 16:17 ` Agustin Martin
2012-04-21 2:17 ` Stefan Monnier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120413152525.GA14949@agmartin.aq.upm.es \
--to=agustin.martin@hispalinux.es \
--cc=emacs-devel@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).