unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Kenichi Handa <handa@m17n.org>
Cc: agustin.martin@hispalinux.es, lionel@mamane.lu,
	emacs-devel@gnu.org, k.stevens@ieee.org, 130397@bugs.debian.org
Subject: Re: Bug 130397
Date: Wed, 5 Jan 2005 11:00:11 +0900 (JST)	[thread overview]
Message-ID: <200501050200.LAA12589@etlken.m17n.org> (raw)
In-Reply-To: <m1llb9p887.fsf-monnier+emacs@gnu.org> (message from Stefan on Tue, 04 Jan 2005 09:55:09 -0500)

In article <m1llb9p887.fsf-monnier+emacs@gnu.org>, Stefan <monnier@iro.umontreal.ca> writes:

>>  Hmmm, then how about the attached patch to the latest CVS
>>  emacs?  With that, all equivalent charaters (e.g a-grave in
>>  all laitn-X) should be handled well.  This patch will be
>>  applicable also to Emacs 21.3 but not yet tested in that
>>  version.

> Can someone explain to me why ispell.el needs those kinds of things?

> My vague understanding is that ispell.el needs to know which chars are part
> of a word and that in the past (pre-MULE), this had to be redefined for each
> and every language since the codes 128-255 could mean completely
> different things.

> Why can't ispell.el just use the `w' syntax to decide what is a word and
> then rely on the decoding/encoding to do the rest of the work?

> That would fix the problem where a word like "expérience" is checked as two
> words if the dictionary is "american".

That will cause another problem.  For instance, when we have
"español" in a buffer and the ispell dictionary is czech
(latin-2), as "español" is encoded into "espa?ol" by
latin-2, it causes the error "Ispell and its process have
different character maps" because ispell returns the result
of two words "eapa" and "ol".

>>  + ;; Char-table that maps an Unicode character (charset:
>>  + ;; latin-iso8859-1, mule-unicode-0100-24ff) to
>>  + ;; a string in which all equivalent characters are listed.
>>  + 
>>  + (defconst ispell-unified-chars-table
>>  +   (let ((table (make-char-table 'ispell-unified-chars-table)))
>>  +     (map-char-table
>>  +      #'(lambda (c v)
>>  + 	 (if (and v (/= c v))
>>  + 	     (let ((unified (or (aref table v) (string v))))
>>  + 	       (aset table v (concat unified (string c))))))
>>  +      ucs-mule-8859-to-mule-unicode)
>>  +     table))

> All the elements of this table should be multibyte strings.
> For this, we may need to wrap the (string X) into
> (string-to-multibyte (string X))

As `c' and `v' are always multibyte characters, (string X)
always return a multibyte string.

>>  + 		(string-as-multibyte
>>  + 		 (mapconcat
>>  + 		  #'(lambda (c)
>>  + 		      (let ((unichar (aref ucs-mule-8859-to-mule-unicode c)))
>>  + 			(if unichar
>>  + 			    (aref ispell-unified-chars-table unichar)
>>  + 			  (string c))))
>>  + 		  str ""))))

> Do you expect the output of mapconcat to be unibyte and to contain
> emacs-mule encoding of multibyte chars?

No.  STR may be an ASCII-only string, in which case, the
result of mapconcat is a unibyte ASCII-only string.  I'd
like to change it to a multibyte ASCII-only stirng to avoid
converting STR again and again in such a case.

---
Ken'ichi HANDA
handa@m17n.org

  reply	other threads:[~2005-01-05  2:00 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <Pine.LNX.4.43.0305140821370.30166-100000@wr-linux02.rki.ivbb.bund.de>
     [not found] ` <m3addpd2ur.fsf@dionysos.nib>
     [not found]   ` <E19HNCh-0000tv-00@fencepost.gnu.org>
     [not found]     ` <20040517120658.GA6919@agmartin.aq.upm.es>
     [not found]       ` <E1BQ5z5-0000f4-5u@fencepost.gnu.org>
2004-05-19 11:44         ` Bug 130397 (Was: Emacs - Ispell problem with i[no]german dictionary) Agustin Martin
2004-05-21  8:01           ` Agustin Martin
2004-12-17 12:15       ` Agustin Martin
2004-12-22 12:37         ` Kenichi Handa
2004-12-22 17:13           ` Agustin Martin
2005-01-04 12:50             ` Kenichi Handa
2005-01-04 14:55               ` Bug 130397 Stefan
2005-01-05  2:00                 ` Kenichi Handa [this message]
2005-01-05  4:42                   ` Stefan Monnier
2005-01-05  5:50                     ` Kenichi Handa
2005-01-05 14:02                       ` Stefan Monnier
2005-01-06  0:44                         ` Kenichi Handa
2005-01-06 16:30                           ` Ken Stevens
2005-01-06 17:33                             ` Stefan Monnier
2005-01-07  0:39                               ` Kenichi Handa
2005-01-07 15:48                             ` Agustin Martin
2005-01-08 12:31                             ` Geoff Kuenning
2005-01-08 12:47                               ` David Kastrup
2005-01-08 13:29                                 ` Miles Bader
2005-01-08 17:15                                   ` Geoff Kuenning
2005-01-10  4:45                                   ` Eli Zaretskii
2005-01-10  9:09                                     ` David Kastrup
2005-01-10 20:16                                       ` Eli Zaretskii
2005-01-13  7:50                                       ` Kenichi Handa
2005-01-08 22:39                               ` Peter Heslin
2005-01-07 15:36                       ` Agustin Martin
2005-01-07 20:29                         ` Ken Stevens
2005-01-07 21:27                         ` Juri Linkov
2005-01-13  5:59                           ` Kenichi Handa
2005-01-18 10:44                             ` Juri Linkov
2005-01-18 13:57                               ` Geoff Kuenning
2005-01-19  7:34                                 ` Juri Linkov
2005-01-19 12:22                                   ` Geoff Kuenning
2005-04-29  0:29                                   ` Geoff Kuenning
2005-04-29  8:45                                     ` Thien-Thi Nguyen
2005-01-18 23:24                               ` Kenichi Handa
2005-01-19  7:43                                 ` Juri Linkov
2005-01-19 12:52                                   ` Kenichi Handa
2005-01-19 13:08                                     ` David Kastrup
2005-01-07 15:34               ` Bug 130397 (Was: Emacs - Ispell problem with i[no]german dictionary) Agustin Martin
2005-01-10 13:06             ` Lionel Elie Mamane
2005-01-10 17:16               ` Agustin Martin
2005-01-11  5:16                 ` Kenichi Handa
2005-01-11 19:56                   ` Agustin Martin
2005-01-11 21:39                     ` Lionel Elie Mamane
2005-01-12  7:37                     ` Kenichi Handa
2005-01-12 19:17                       ` Agustin Martin
2005-01-13  5:53                         ` Kenichi Handa
2005-01-11 14:29                 ` Richard Stallman
2005-01-12  7:45                   ` Kenichi Handa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200501050200.LAA12589@etlken.m17n.org \
    --to=handa@m17n.org \
    --cc=130397@bugs.debian.org \
    --cc=agustin.martin@hispalinux.es \
    --cc=emacs-devel@gnu.org \
    --cc=k.stevens@ieee.org \
    --cc=lionel@mamane.lu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).