From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Agustin Martin Newsgroups: gmane.emacs.devel Subject: Re: Ispell and unibyte characters Date: Tue, 10 Apr 2012 21:08:03 +0200 Message-ID: <20120410190803.GA13517@agmartin.aq.upm.es> References: <83aa3f2hgh.fsf@gnu.org> <20120326173912.GA22306@agmartin.aq.upm.es> <20120328191821.GA6266@agmartin.aq.upm.es> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="tKW2IUtsqtDRztdT" X-Trace: dough.gmane.org 1334084908 20421 80.91.229.3 (10 Apr 2012 19:08:28 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Tue, 10 Apr 2012 19:08:28 +0000 (UTC) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Apr 10 21:08:27 2012 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1SHgQS-0006uM-BT for ged-emacs-devel@m.gmane.org; Tue, 10 Apr 2012 21:08:24 +0200 Original-Received: from localhost ([::1]:54266 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SHgQQ-0005O5-Uo for ged-emacs-devel@m.gmane.org; Tue, 10 Apr 2012 15:08:22 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:48172) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SHgQN-0005Ng-QL for emacs-devel@gnu.org; Tue, 10 Apr 2012 15:08:21 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SHgQH-0003er-BK for emacs-devel@gnu.org; Tue, 10 Apr 2012 15:08:19 -0400 Original-Received: from fibonacci.ccupm.upm.es ([138.100.198.70]:36348 helo=smtp.upm.es) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SHgQH-0003d0-27 for emacs-devel@gnu.org; Tue, 10 Apr 2012 15:08:13 -0400 Original-Received: from agmartin.aq.upm.es (Agmartin.aq.upm.es [138.100.41.131]) by smtp.upm.es (8.14.3/8.14.3/fibonacci-001) with ESMTP id q3AJ84Kc022109; Tue, 10 Apr 2012 21:08:04 +0200 Original-Received: by agmartin.aq.upm.es (Postfix, from userid 1000) id D10822031D; Tue, 10 Apr 2012 21:08:03 +0200 (CEST) Mail-Followup-To: emacs-devel@gnu.org Content-Disposition: inline In-Reply-To: <20120328191821.GA6266@agmartin.aq.upm.es> User-Agent: Mutt/1.5.21 (2010-09-15) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 138.100.198.70 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:149579 Archived-At: --tKW2IUtsqtDRztdT Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Wed, Mar 28, 2012 at 09:18:21PM +0200, Agustin Martin wrote: > On Mon, Mar 26, 2012 at 04:08:06PM -0400, Eli Zaretskii wrote: > > > Date: Mon, 26 Mar 2012 19:39:12 +0200 > > > From: Agustin Martin > > > > > > Hi Eli, > > > > Thanks for responding, I was beginning to think that no one is > > interested. In general, I find that ispell.el is in sore need of > > modernization; at least that's my conclusion so far from playing with > > hunspell (with which I want to replace my aging collection of Ispell > > and its dictionaries that I use for many years). > > > > > At least for aspell ispell.el already uses utf8 as default communication > > > encoding and [:alpha:] as CASECHARS (and ^[:alpha:] as NOT-CASECHARS). > > > OTHERCHARS is guessed from aspell .dat file for given dictionary. > > > > The question is, why isn't this done for any modern speller. The only > > one I know of that cannot handle UTF-8 is Ispell. > > I think the only real remaining reason is for XEmacs compatibility. AFAIK > XEmacs does not support [:alpha:]. > > I thought about filtering ispell-dictionary-base-alist when used from FSF > Emacs, so it uses [:alpha:] and still keeps compatibility. I am currently a > bit busy, but at some time I may try this for Debian and see what happens. For the records, I am attaching what I am currently trying, post-processing global dictionary list while leaving local definitions at ~/.emacs unmodified. This should also deal with [#11200: ispell.el sets incorrect encoding for the default dictionary]. I would like to test this a bit more and commit if there are no problems. -- Agustin --tKW2IUtsqtDRztdT Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="ispell.el_alpha-regexp.2.diff" --- ispell.el.orig 2012-04-10 20:02:51.422092761 +0200 +++ ispell.el 2012-04-10 20:18:27.464680054 +0200 @@ -783,6 +783,12 @@ (make-obsolete-variable 'ispell-aspell-supports-utf8 'ispell-encoding8-command "23.1") +(defvar ispell-emacs-alpha-regexp + (if (string-match "^[[:alpha:]]+$" "abcde") + "[[:alpha:]]" + nil) + "[[:alpha:]] if Emacs supports [:alpha:] regexp, nil +otherwise (current XEmacs does not support it).") ;;; ********************************************************************** ;;; The following are used by ispell, and should not be changed. @@ -1179,8 +1185,7 @@ (error nil)) ispell-really-aspell ispell-encoding8-command - ;; XEmacs does not like [:alpha:] regexps. - (string-match "^[[:alpha:]]+$" "abcde")) + ispell-emacs-alpha-regexp) (unless ispell-aspell-dictionary-alist (ispell-find-aspell-dictionaries))) @@ -1204,8 +1209,27 @@ ispell-dictionary-base-alist)) (unless (assoc (car dict) all-dicts-alist) (add-to-list 'all-dicts-alist dict))) - (setq ispell-dictionary-alist all-dicts-alist)))) + (setq ispell-dictionary-alist all-dicts-alist)) + ;; If Emacs flavor supports [:alpha:] use it for global dicts. If + ;; spellchecker also supports UTF-8 via command-line option use it + ;; in communication. This does not affect definitions in ~/.emacs. + (if ispell-emacs-alpha-regexp + (let (tmp-dicts-alist) + (dolist (adict ispell-dictionary-alist) + (add-to-list 'tmp-dicts-alist + (list + (nth 0 adict) ; dict name + "[[:alpha:]]" ; casechars + "[^[:alpha:]]" ; not-casechars + (nth 3 adict) ; otherchars + (nth 4 adict) ; many-otherchars-p + (nth 5 adict) ; ispell-args + (nth 6 adict) ; extended-character-mode + (if ispell-encoding8-command + 'utf-8 + (nth 7 adict))))) + (setq ispell-dictionary-alist tmp-dicts-alist))))) (defun ispell-valid-dictionary-list () "Return a list of valid dictionaries. --tKW2IUtsqtDRztdT--