From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Agustin Martin <agustin.martin@hispalinux.es>
Newsgroups: gmane.emacs.devel
Subject: Re: Ispell and unibyte characters
Date: Fri, 13 Apr 2012 18:38:23 +0200
Message-ID: <20120413163823.GA26947@agmartin.aq.upm.es>
References: <83aa3f2hgh.fsf@gnu.org>
	<20120326173912.GA22306@agmartin.aq.upm.es>
	<E1SCGD0-0001Dm-Tu@fencepost.gnu.org>
	<20120328191821.GA6266@agmartin.aq.upm.es>
	<20120410190803.GA13517@agmartin.aq.upm.es>
	<83ty0r5rmd.fsf@gnu.org>
	<20120412143657.GA18352@agmartin.aq.upm.es>
	<83d37c4vw5.fsf@gnu.org>
	<20120413152525.GA14949@agmartin.aq.upm.es>
	<83ehrr39wq.fsf@gnu.org>
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable
X-Trace: dough.gmane.org 1334335122 25426 80.91.229.3 (13 Apr 2012 16:38:42 GMT)
X-Complaints-To: usenet@dough.gmane.org
NNTP-Posting-Date: Fri, 13 Apr 2012 16:38:42 +0000 (UTC)
To: emacs-devel@gnu.org
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Apr 13 18:38:42 2012
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1SIjWD-00044u-EX
	for ged-emacs-devel@m.gmane.org; Fri, 13 Apr 2012 18:38:41 +0200
Original-Received: from localhost ([::1]:43211 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1SIjWC-0003fi-Nx
	for ged-emacs-devel@m.gmane.org; Fri, 13 Apr 2012 12:38:40 -0400
Original-Received: from eggs.gnu.org ([208.118.235.92]:49179)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <agustin.martin@upm.es>) id 1SIjW5-0003fC-Bl
	for emacs-devel@gnu.org; Fri, 13 Apr 2012 12:38:39 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <agustin.martin@upm.es>) id 1SIjVy-0003V0-PU
	for emacs-devel@gnu.org; Fri, 13 Apr 2012 12:38:32 -0400
Original-Received: from fibonacci.ccupm.upm.es ([138.100.198.70]:37951 helo=smtp.upm.es)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <agustin.martin@upm.es>) id 1SIjVy-0003Ug-FC
	for emacs-devel@gnu.org; Fri, 13 Apr 2012 12:38:26 -0400
Original-Received: from agmartin.aq.upm.es (Agmartin.aq.upm.es [138.100.41.131])
	by smtp.upm.es (8.14.3/8.14.3/fibonacci-001) with ESMTP id
	q3DGcN3s016478; Fri, 13 Apr 2012 18:38:23 +0200
Original-Received: by agmartin.aq.upm.es (Postfix, from userid 1000)
	id B0ADEE52; Fri, 13 Apr 2012 18:38:23 +0200 (CEST)
Mail-Followup-To: emacs-devel@gnu.org
Content-Disposition: inline
In-Reply-To: <83ehrr39wq.fsf@gnu.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-MIME-Autoconverted: from 8bit to quoted-printable by smtp.upm.es id
	q3DGcN3s016478
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3)
X-Received-From: 138.100.198.70
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:149639
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/149639>

On Fri, Apr 13, 2012 at 06:53:57PM +0300, Eli Zaretskii wrote:
> > Date: Fri, 13 Apr 2012 17:25:25 +0200
> > From: Agustin Martin <agustin.martin@hispalinux.es>
> >=20
> > > I don't understand what are you trying to accomplish by encoding
> > > OTHERCHARS in UTF-8.  What exactly is the problem with them being
> > > encoded in some 8-bit encoding?  Please explain.
> >=20
> > Imagine a fake entry in the general list, either in ispell.el or prov=
ided
> > through `ispell-base-dicts-override-alist' (no accented chars for sim=
plicity)
> >=20
> > ("catala8"
> >      "[A-Za-z]" "[^A-Za-z]" "['\267-]" nil ("-B" "-d" "catalan") nil =
iso-8859-1)
> >=20
> > Unless emacs knows the encoding for \267 (middledot "=B7") it cannot =
decode it
> > properly. I prefer to not use UTF-8 here, because I want the entry to=
 also be
> > useful for ispell (and also be XEmacs incompatible). The best approac=
h here
> > seems to decode the otherchars regexp according to provided coding-sy=
stem.
> >=20
> > I have noticed that there seems to be no need to encode the resulting=
 string
> > in UTF-8, Emacs will know what to do with the decoded string.
> >=20
> > I tested something like
> >=20
> >  (dolist (adict ispell-dictionary-alist)
> >   	    (add-to-list 'tmp-dicts-alist
> >    			 (list
> >    			  (nth 0 adict)  ; dict name
> >     			  "[[:alpha:]]"  ; casechars
> >     			  "[^[:alpha:]]" ; not-casechars
> > 			  (if ispell-encoding8-command
> > 			      ;; Decode 8bit otherchars if needed
> > 			      (decode-coding-string (nth 3 adict) (nth 7 adict))
> > 			    (nth 3 adict)) ; otherchars
> >     			  (nth 4 adict)  ; many-otherchars-p
> >    			  (nth 5 adict)  ; ispell-args
> >    			  (nth 6 adict)  ; extended-character-mode
> > 			  (if ispell-encoding8-command
> > 			      'utf-8
> > 			    (nth 7 adict)))))
> >=20
> > and seems to work well.
>=20
> So you are taking the Catalan dictionary spec written for Ispell and
> convert it to a spec that could be used to support more characters by
> using UTF-8, is that right?  If so, I find this a bit kludgey. =20

I think differently and like above approach because I find it way more
versatile for general definitions. This is not a matter of ispell blind
reuse. In particular I noticed this problem in Debian with the catalan sp=
ec
written for aspell (automatically created after info provided by aspell-c=
a
package).  That info is written that way to also be useful for XEmacs, bu=
t
with above post-processing it can work way better for Emacs.

> How
> about having a completely separate spec instead?  More generally, why
> not separate ispell-dictionary-alist into 2 alists, one to be used
> with Ispell, the other to be used with aspell and hunspell?  I think
> this would be cleaner, don't you agree?

As a matter of fact that is what we do in Debian from info provided by
ispell, aspell and hunspell dicts maintainers. The difference is that the
provided info is supposed to be valid for both Emacs and XEmacs, so
I find post-processing as above very useful, because it helps to take the
best for Emacs. Global dicts alist is built from

(dolist (dict (append found-dicts-alist
  	    ispell-base-dicts-override-alist
	    ispell-dictionary-base-alist))

where first found wins. `found-dicts-alist' has the result of automatic
search (currently used only for aspell) and has higher priority,=20
`ispell-dictionary-base-alist' is the fallback alist having the lower
priority. Depending on the spellchecker=20
`ispell-base-dicts-override-alist' is set to an alist corresponding to
ispell, aspell or hunspell dictionaries (they are handled independently)

I do not think that maintaining separate hardcoded dict lists in ispell.e=
l
for ispell, aspell and hunspell worths.

For hunspell, in the future I'd go for some sort of parsing mechanism lik=
e
current one for aspell.

--=20
Agustin