From: Agustin Martin <agustin.martin@hispalinux.es>
To: Reuben Thomas <rrt@sc3d.org>
Cc: 7781@debbugs.gnu.org
Subject: bug#7781: 23.2.91; ispell problem with hunspell and UTF-8 file
Date: Fri, 7 Jan 2011 14:14:03 +0100 [thread overview]
Message-ID: <AANLkTikUy8-jH4Co8R6d0TbgR6d+kymEawXx0gV-gCEm@mail.gmail.com> (raw)
In-Reply-To: <87sjx9fula.fsf@sc3d.org>
2011/1/4 Reuben Thomas <rrt@sc3d.org>:
> With the following text, and using emacs -Q, I get the errors you can
> see in the messages log below when using hunspell to spell-check a UTF-8
> buffer with some extended characters in it.
>
> I did test this with emacs -Q, but the current session, in which I
> reproduced the problem and am now composing this bug report, was not
> started with -Q (this is so submitting the bug report works properly!).
>
> I am running a freshly bzr-pulled build of the emacs-23 branch.
Hi, Reuben,
I can also reproduce this with emacs23.2. I could locate problems in
two lines, after splititng original lines,
-- Cut here -- 8< ----- minimal.txt: utf-8
of out-of-copyright works. The Kindle may be a loss leader, but at £109
it’s still not cheap. Feedbooks, rather than integrating easily into
-- Cut here -- 8< ----- End of minimal.txt
In first line, currency seems to give some conversion errors when
iso-8859-1 is used, when that should have ignored by hunspell. I get
tons of
UTF-8 encoding error. Missing continuation byte in 0. character position:
for that line when using
$ cat minimal.txt | hunspell -d en_US -a -i iso-8859-1
In second line unusual apostrophe seems to cause some confusion to
hunspell when utf8 is used. Comparing what aspell and hunspell give in
similar text I get
$ cat minimal.txt | aspell --encoding=utf-8 -d en_US -a
& Feedbooks 6 22: Feed books, Feed-books, Feedback's, Feedbags, ...
$ cat minimal.txt | hunspell -d en_US -i utf-8 -a
& Feedbooks 8 24: Feed books, Feed-books, Feedback, Feedbags, ...
Do not worry about first number, is the number of suggestions. However
position in second number differ. Seems that hunspell is not
considering that apostrophe as a single (multibyte) char when
counting, but as three components
Looks to me an hunspell bug. I found no reference to this problem in
hunspell sf site, but noticed that Hunspell 1.2.14 was released
yesterday. Need to check if that has some related new.
--
Agustin
next prev parent reply other threads:[~2011-01-07 13:14 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-03 23:14 bug#7781: 23.2.91; ispell problem with hunspell and UTF-8 file Reuben Thomas
2011-01-07 13:14 ` Agustin Martin [this message]
2011-01-07 14:30 ` Reuben Thomas
2011-02-11 17:00 ` Agustin Martin
2014-10-16 13:37 ` Agustin Martin
2014-10-16 13:54 ` Eli Zaretskii
2014-10-16 14:08 ` Agustin Martin
2012-01-01 21:42 ` bug#7781: ispell problem with hunspell and UTF-8 file (and other, related hunspell problems) Richard Wordingham
2013-04-13 19:12 ` bug#7781: [PATCH] Fix ispell problem with hunspell and UTF-8 file Николай Сущенко
2013-04-14 5:42 ` Eli Zaretskii
2013-04-14 6:33 ` Николай Сущенко
2013-04-14 7:08 ` Eli Zaretskii
2013-04-20 18:43 ` Николай Сущенко
2014-04-27 21:30 ` bug#7781: hunspell and latex-mode Peter Münster
2014-04-28 15:37 ` Eli Zaretskii
2014-04-28 16:18 ` Peter Münster
2014-04-28 16:48 ` Eli Zaretskii
2014-04-28 17:17 ` Peter Münster
2014-04-28 17:32 ` Eli Zaretskii
2014-04-28 18:27 ` Peter Münster
2014-04-29 10:03 ` Agustin Martin
2014-04-29 10:13 ` Peter Münster
2014-04-29 10:21 ` Agustin Martin
2014-04-29 10:20 ` Peter Münster
2014-04-29 10:39 ` Agustin Martin
2014-04-29 11:54 ` Peter Münster
2014-04-29 12:48 ` Peter Münster
2014-04-29 13:57 ` Eli Zaretskii
2014-04-29 14:30 ` Peter Münster
2014-04-29 15:25 ` Eli Zaretskii
2014-04-29 16:34 ` Peter Münster
2014-09-25 9:54 ` bug#7781: Bug still present in hunspell 1.3.3; Eli's patch still works Reuben Thomas
2020-08-28 12:00 ` bug#7781: 23.2.91; ispell problem with hunspell and UTF-8 file Stefan Kangas
2020-08-28 12:36 ` Eli Zaretskii
2020-08-28 12:56 ` Stefan Kangas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=AANLkTikUy8-jH4Co8R6d0TbgR6d+kymEawXx0gV-gCEm@mail.gmail.com \
--to=agustin.martin@hispalinux.es \
--cc=7781@debbugs.gnu.org \
--cc=rrt@sc3d.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).