From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Reuben Thomas Newsgroups: gmane.emacs.bugs Subject: bug#7781: 23.2.91; ispell problem with hunspell and UTF-8 file Date: Mon, 03 Jan 2011 23:14:41 +0000 Message-ID: <87sjx9fula.fsf@sc3d.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: dough.gmane.org 1294097040 27128 80.91.229.12 (3 Jan 2011 23:24:00 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Mon, 3 Jan 2011 23:24:00 +0000 (UTC) To: 7781@debbugs.gnu.org Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Tue Jan 04 00:23:55 2011 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1PZtko-0007W2-8M for geb-bug-gnu-emacs@m.gmane.org; Tue, 04 Jan 2011 00:23:55 +0100 Original-Received: from localhost ([127.0.0.1]:33595 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PZtkn-0002XX-EO for geb-bug-gnu-emacs@m.gmane.org; Mon, 03 Jan 2011 18:23:53 -0500 Original-Received: from [140.186.70.92] (port=56186 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PZtkh-0002XJ-Rn for bug-gnu-emacs@gnu.org; Mon, 03 Jan 2011 18:23:49 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PZtkg-0004ga-Ch for bug-gnu-emacs@gnu.org; Mon, 03 Jan 2011 18:23:47 -0500 Original-Received: from debbugs.gnu.org ([140.186.70.43]:55846) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PZtkg-0004gV-8L for bug-gnu-emacs@gnu.org; Mon, 03 Jan 2011 18:23:46 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.69) (envelope-from ) id 1PZtVS-0008E8-2z; Mon, 03 Jan 2011 18:08:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Reuben Thomas Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 03 Jan 2011 23:08:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 7781 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: X-Debbugs-Original-To: bug-gnu-emacs@gnu.org Original-Received: via spool by submit@debbugs.gnu.org id=B.129409607131604 (code B ref -1); Mon, 03 Jan 2011 23:08:01 +0000 Original-Received: (at submit) by debbugs.gnu.org; 3 Jan 2011 23:07:51 +0000 Original-Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PZtVG-0008Dg-MD for submit@debbugs.gnu.org; Mon, 03 Jan 2011 18:07:51 -0500 Original-Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PZtVE-0008DU-LM for submit@debbugs.gnu.org; Mon, 03 Jan 2011 18:07:50 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PZtc6-0003Lt-PI for submit@debbugs.gnu.org; Mon, 03 Jan 2011 18:14:56 -0500 Original-Received: from lists.gnu.org ([199.232.76.165]:48785) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PZtc6-0003Lo-L3 for submit@debbugs.gnu.org; Mon, 03 Jan 2011 18:14:54 -0500 Original-Received: from [140.186.70.92] (port=42219 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PZtc4-0008Es-Rm for bug-gnu-emacs@gnu.org; Mon, 03 Jan 2011 18:14:54 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PZtc2-0003KU-77 for bug-gnu-emacs@gnu.org; Mon, 03 Jan 2011 18:14:52 -0500 Original-Received: from exprod7og117.obsmtp.com ([64.18.2.6]:46914) by eggs.gnu.org with smtp (Exim 4.71) (envelope-from ) id 1PZtc1-0003J0-Ox for bug-gnu-emacs@gnu.org; Mon, 03 Jan 2011 18:14:50 -0500 Original-Received: from source ([74.125.82.180]) by exprod7ob117.postini.com ([64.18.6.12]) with SMTP ID DSNKTSJYZSwGu9KUs92jjoN/sixmT0CvmwgC@postini.com; Mon, 03 Jan 2011 15:14:49 PST Original-Received: by wyb28 with SMTP id 28so13620920wyb.39 for ; Mon, 03 Jan 2011 15:14:45 -0800 (PST) Original-Received: by 10.216.170.213 with SMTP id p63mr24498657wel.6.1294096484771; Mon, 03 Jan 2011 15:14:44 -0800 (PST) Original-Received: from mord (87-194-87-241.bethere.co.uk [87.194.87.241]) by mx.google.com with ESMTPS id m50sm10180679wek.32.2011.01.03.15.14.43 (version=TLSv1/SSLv3 cipher=RC4-MD5); Mon, 03 Jan 2011 15:14:43 -0800 (PST) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6, seldom 2.4 (older, 4) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list Resent-Date: Mon, 03 Jan 2011 18:08:02 -0500 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:43072 Archived-At: With the following text, and using emacs -Q, I get the errors you can see in the messages log below when using hunspell to spell-check a UTF-8 buffer with some extended characters in it. I did test this with emacs -Q, but the current session, in which I reproduced the problem and am now composing this bug report, was not started with -Q (this is so submitting the bug report works properly!). I am running a freshly bzr-pulled build of the emacs-23 branch. Text follows ----cut here---- --- title: Kindle 3 is a good first attempt tags: computing, books format: markdown date: Mon, 03 Jan 2011 20:53:13 +0000 post-id: 2585181001 --- Giving my girlfriend a Kindle for Christmas was the carrot in a multi-prong= ed strategy to avoid needing more bookshelves (the stick being =E2=80=9CI w= ill start giving away your books=E2=80=9D and my contribution being to arch= ive books I=E2=80=99ve read (or return the many that aren=E2=80=99t even mi= ne). This therefore required that I stocked it with books before she got he= r hands on it, which in turn was all the excuse I needed to play with the t= hing. My lazy solution was simply to download all of [Feedbooks](http://www.feedb= ooks.com); I [wrote some scripts](http://rrt.sc3d.org/Software/Kindle/) to = make this actually lazy, rather than brain-numbingly dull. In the process I= found that while the Kindle is nice to hold and great to read, it struggle= s to cope with a large collection of books (even though the nearly 3,000 vo= lumes of Feedbooks only half-filled its 4Gb memory), and is woeful as a res= earch tool. And, of course, Amazon=E2=80=99s first-mover-evil surfaced earl= y. Here are the problems I had: 1. Amazon=E2=80=99s own store doesn=E2=80=99t seem to contain free books. I= think it=E2=80=99s poor form not to give people a straightforward choice o= f free editions of out-of-copyright works. The Kindle may be a loss leader,= but at =C2=A3109 it=E2=80=99s still not cheap. Feedbooks, rather than inte= grating easily into the Kindle, like, say, a 3rd-party software provider in= to Ubuntu=E2=80=99s Software Center, provide a catalogue which itself is in= the form of a book, doesn=E2=80=99t automatically update, and offers a lis= t ordered only by title. In other words, it=E2=80=99s useless; one is bette= r off using the built-in web browser to search the online catalogue=E2=80=A6 2. =E2=80=A6or better, another browser, since the Kindle=E2=80=99s is woefu= lly slow (and I don=E2=80=99t just mean the screen update). It=E2=80=99s ju= st about usable, and hence useful in an emergency, but is no good as, for e= xample, an online research tool to use in parallel with the books you have = downloaded, although=E2=80=A6 3. =E2=80=A6offline search is awful too. With just the few ebooks that come= loaded on the device, it was slow; with the thousands of books I loaded, i= t simply locked up the device, even when trying to search in the manual, pr= esumably already indexed. The Kindle seems to index its contents in the bac= kground, but even now, over a week later, search doesn=E2=80=99t work. The = only effective navigation is by a book=E2=80=99s table of contents, and, to= choose which books to read, the user-definable collections, though=E2=80=A6 4. =E2=80=A6collections are a pain to set up for many books, as you have to= select each book manually; there is no way I have found to select a range.= (Fortunately, I was able to define collections programmatically, but this = will be beyond most users.) In summary, it=E2=80=99s a lovely device, but the software is rather toytow= n. Amazon could improve it (and indeed, the 3.0.3 firmware update, at the e= xperimental stage when I checked, claims, vaguely, =E2=80=9Cperformance imp= rovements=E2=80=9D), but given that their main interest is in selling books= and Kindles, I=E2=80=99m not hopeful that it will happen before the next h= ardware iteration; whether it happens at all depends on competition, and th= ere should be plenty of that, to go by the number of other ebook readers. ----cut here---- In GNU Emacs 23.2.91.3 (i686-pc-linux-gnu, GTK+ Version 2.22.0) of 2011-01-03 on mord Windowing system distributor `The X.Org Foundation', version 11.0.10900000 Important settings: value of $LC_ALL: nil value of $LC_COLLATE: nil value of $LC_CTYPE: nil value of $LC_MESSAGES: nil value of $LC_MONETARY: nil value of $LC_NUMERIC: nil value of $LC_TIME: nil value of $LANG: en_GB.UTF-8 value of $XMODIFIERS: nil locale-coding-system: utf-8-unix default enable-multibyte-characters: t Major mode: Text Minor modes in effect: longlines-mode: t buffer-face-mode: t flyspell-mode: t show-paren-mode: t savehist-mode: t minibuffer-electric-default-mode: t iswitchb-mode: t icomplete-mode: t global-auto-revert-mode: t desktop-save-mode: t smart-quotes-mode: t mouse-wheel-mode: t use-hard-newlines: t file-name-shadow-mode: t global-font-lock-mode: t font-lock-mode: t blink-cursor-mode: t auto-encryption-mode: t auto-compression-mode: t column-number-mode: t line-number-mode: t transient-mark-mode: t Recent input: M-x r e p o r t - e m h u n s p e l=20 l SPC i s p e l l SPC w i t h SPC h u=20 n s l e s p e =20 p e p e l l SPC=20 f a i l s C-g =20 =20 M-x i s p e l l=20 SPC SPC SPC M-x i s p e =20 Recent messages: Scanning for "hard" Perl constructions... done Applying style hooks... done Scanning for "hard" Perl constructions... done Scanning for "hard" Perl constructions... done Scanning for "hard" Perl constructions... done Scanning for "hard" Perl constructions... done Lazy desktop load complete Quit Spell-checking Kindle 3 is a good first attempt using hunspell with british= +accs dictionary... Spell-checking region using hunspell with british+accs dictionary...done ispell-process-line: Ispell misalignment: word `Feedbooks' point 1363; prob= ably incompatible versions Load-path shadows: /usr/local/share/emacs/23.2.91/site-lisp/auctex/tex-style hides /usr/share/= emacs/site-lisp/auctex/tex-style /usr/local/share/emacs/23.2.91/site-lisp/auctex/tex-buf hides /usr/share/em= acs/site-lisp/auctex/tex-buf /usr/local/share/emacs/23.2.91/site-lisp/auctex/context hides /usr/share/em= acs/site-lisp/auctex/context /usr/local/share/emacs/23.2.91/site-lisp/auctex/bib-cite hides /usr/share/e= macs/site-lisp/auctex/bib-cite /usr/local/share/emacs/23.2.91/site-lisp/auctex/tex-fold hides /usr/share/e= macs/site-lisp/auctex/tex-fold /usr/local/share/emacs/23.2.91/site-lisp/auctex/tex-jp hides /usr/share/ema= cs/site-lisp/auctex/tex-jp /usr/local/share/emacs/23.2.91/site-lisp/auctex/context-nl hides /usr/share= /emacs/site-lisp/auctex/context-nl /usr/local/share/emacs/23.2.91/site-lisp/auctex/toolbar-x hides /usr/share/= emacs/site-lisp/auctex/toolbar-x /usr/local/share/emacs/23.2.91/site-lisp/auctex/tex-mik hides /usr/share/em= acs/site-lisp/auctex/tex-mik /usr/local/share/emacs/23.2.91/site-lisp/auctex/context-en hides /usr/share= /emacs/site-lisp/auctex/context-en /usr/local/share/emacs/23.2.91/site-lisp/auctex/texmathp hides /usr/share/e= macs/site-lisp/auctex/texmathp /usr/local/share/emacs/23.2.91/site-lisp/auctex/tex-info hides /usr/share/e= macs/site-lisp/auctex/tex-info /usr/local/share/emacs/23.2.91/site-lisp/auctex/tex-fptex hides /usr/share/= emacs/site-lisp/auctex/tex-fptex /usr/local/share/emacs/23.2.91/site-lisp/auctex/tex-font hides /usr/share/e= macs/site-lisp/auctex/tex-font /usr/local/share/emacs/23.2.91/site-lisp/auctex/latex hides /usr/share/emac= s/site-lisp/auctex/latex /usr/local/share/emacs/23.2.91/site-lisp/auctex/font-latex hides /usr/share= /emacs/site-lisp/auctex/font-latex /usr/local/share/emacs/23.2.91/site-lisp/auctex/tex-bar hides /usr/share/em= acs/site-lisp/auctex/tex-bar /usr/local/share/emacs/23.2.91/site-lisp/auctex/multi-prompt hides /usr/sha= re/emacs/site-lisp/auctex/multi-prompt /usr/local/share/emacs/23.2.91/site-lisp/auctex/tex hides /usr/share/emacs/= site-lisp/auctex/tex Features: (shadow sort mail-extr message sendmail ecomplete rfc822 mml mml-sec password-cache mm-decode mm-bodies mm-encode mailcap mail-parse rfc2231 rfc2047 rfc2045 qp ietf-drums mailabbrev nnheader gnus-util netrc time-date mm-util mail-prsvr gmm-utils wid-edit mailheader canlock sha1 hex-util hashcash mail-utils emacsbug preview prv-emacs byte-opt warnings tex-buf noutline outline font-latex bytecomp byte-compile latex tex-style tex nxml-uchnm rng-xsd xsd-regexp rng-cmpct rng-nxml rng-valid rng-loc rng-uri rng-parse nxml-parse rng-match rng-dt rng-util rng-pttrn nxml-ns nxml-mode nxml-outln nxml-rap nxml-util nxml-glyph nxml-enc xmltok sgml-mode conf-mode newcomment make-mode vc-git cperl-mode longlines face-remap filladapt flyspell auto-dictionary-autoloads dictionary-autoloads js2-mode-autoloads package reporter completing-help ff-paths uniquify paren savehist minibuf-eldef iswitchb icomplete autorevert time cus-start cus-load desktop server change-mode advice help-fns advice-preload php-mode derived etags cc-langs cl cl-19 cc-mode cc-fonts cc-menus cc-cmds cc-styles cc-align cc-engine cc-vars cc-defs speedbar sb-image ezimage dframe easymenu assoc lua-mode regexp-opt comint ring whitespace etags-update smart-quotes edmacro kmacro ispell ffap muse-autoloads emacs-goodies-el emacs-goodies-custom emacs-goodies-loaddefs easy-mmode devhelp preview-latex tex-site auto-loads tooltip ediff-hook vc-hooks lisp-float-type mwheel x-win x-dnd font-setting tool-bar dnd fontset image fringe lisp-mode register page menu-bar rfn-eshadow timer select scroll-bar mldrag mouse jit-lock font-lock syntax facemenu font-core frame cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean japanese hebrew greek romanian slovak czech european ethiopic indian cyrillic chinese case-table epa-hook jka-cmpr-hook help simple abbrev loaddefs button minibuffer faces cus-face files text-properties overlay md5 base64 format env code-pages mule custom widget hashtable-print-readable backquote make-network-process dbusbind system-font-setting font-render-setting gtk x-toolkit x multi-tty emacs) --=20 http://rrt.sc3d.org/