* Same non-ASCII characters not 'equal'
@ 2006-08-13 11:44 Sebastian Tennant
2006-08-15 1:20 ` James Cloos
0 siblings, 1 reply; 7+ messages in thread
From: Sebastian Tennant @ 2006-08-13 11:44 UTC (permalink / raw)
Hello all,
I'm trying to write a little vocab tester but I've stumbled upon some
strange behaviour I can't figure out.
For some reason the following code does not match strings containing
special characters (i.e., non-ASCII characters input using an input
method)?
(with-temp-buffer
(set-input-method 'turkish-postfix)
(let ((dict (list '("glass" "bardak") '("house" "ev") '("girl" "kız")
'("child" "çocuk") '("little" "küçük") '("good" "iyi")
'("bad" "fena") '("horse" "at") '("this" "bu")))
(input (read-from-minibuffer "? " nil nil nil nil nil t))
match)
(dolist (each dict (and match (message "Equal")))
(when (member input each) (setq match t)))))
Take 'child' and 'çocuk' for instance. Because the (turkish-postfix)
input method is inherited in the minibuffer you have to type
'c h i 2 l d' to enter 'child' and a match is found, but when you
enter 'çocuk' by typing 'c , o c u k', no match is found. Could this
be a bug even?
sebyte
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Same non-ASCII characters not 'equal'
2006-08-13 11:44 Same non-ASCII characters not 'equal' Sebastian Tennant
@ 2006-08-15 1:20 ` James Cloos
2006-08-17 7:23 ` Sebastian Tennant
0 siblings, 1 reply; 7+ messages in thread
From: James Cloos @ 2006-08-15 1:20 UTC (permalink / raw)
Cc: help-gnu-emacs
>>>>> "Sebastian" == Sebastian Tennant <sebyte@smolny.plus.com> writes:
Sebastian> Take 'child' and 'çocuk' for instance. Because the (turkish-postfix)
Sebastian> input method is inherited in the minibuffer you have to type
Sebastian> 'c h i 2 l d' to enter 'child' and a match is found, but when you
Sebastian> enter 'çocuk' by typing 'c , o c u k', no match is found. Could this
Sebastian> be a bug even?
Emacs versions other than the emacs-unicode-2 branch store each of the
iso-8859-x glyphsets separately. You are probably ending up with the
8859-1 (Latin 1) version of U+00E7 LATIN SMALL LETTER C WITH CEDILLA
in the elisp; using the turkish-postfix input method most likely uses
8859-9 (Latin 5).
One way to make latin1’s ç and latin5’s ç match is to use one or both
of unify-8859-on-decoding-mode and/or unify-8859-on-encoding-mode.
Or, make sure you use the same encoding to enter the elisp that your
users will use. There are commands to convert the current buffer to
a different encoding.
Since I’ve moved almost exclusively to the unicode-2 branch, I don’t
remember the specifics of the unify-8859 modes, but they are documented
in info.
-JimC (who has been caught by this issue before)
--
James Cloos <cloos@jhcloos.com>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Same non-ASCII characters not 'equal'
2006-08-15 1:20 ` James Cloos
@ 2006-08-17 7:23 ` Sebastian Tennant
2006-08-17 16:49 ` James Cloos
0 siblings, 1 reply; 7+ messages in thread
From: Sebastian Tennant @ 2006-08-17 7:23 UTC (permalink / raw)
Quoth James Cloos <cloos@jhcloos.com>:
>>>>>> "Sebastian" == Sebastian Tennant <sebyte@smolny.plus.com> writes:
> Sebastian> Take 'child' and 'çocuk' for instance. Because the (turkish-postfix)
> Sebastian> input method is inherited in the minibuffer you have to type
> Sebastian> 'c h i 2 l d' to enter 'child' and a match is found, but when you
> Sebastian> enter 'çocuk' by typing 'c , o c u k', no match is found. Could this
> Sebastian> be a bug even?
>
> Emacs versions other than the emacs-unicode-2 branch store each of the
> iso-8859-x glyphsets separately. You are probably ending up with the
> 8859-1 (Latin 1) version of U+00E7 LATIN SMALL LETTER C WITH CEDILLA
> in the elisp; using the turkish-postfix input method most likely uses
> 8859-9 (Latin 5).
I don't think this is the problem as I'm working with a unicode
terminal, and the encodings used for read and write are mule-utf-8
> One way to make latin1’s ç and latin5’s ç match is to use one or both
> of unify-8859-on-decoding-mode and/or unify-8859-on-encoding-mode.
I've tried setting these variables in the temporary buffer, without
success.
> Or, make sure you use the same encoding to enter the elisp that your
> users will use. There are commands to convert the current buffer to
> a different encoding.
Everything is mule-utf-8.
> Since I’ve moved almost exclusively to the unicode-2 branch, I don’t
> remember the specifics of the unify-8859 modes, but they are documented
> in info.
I'm not sure what you mean by unicode-2 branch
(emacs-version)
"GNU Emacs 21.4.1 (i486-pc-linux-gnu)
of 2006-05-15 on trouble, modified by Debian"
> -JimC (who has been caught by this issue before)
Thanks for your help Jim, but I'm still stuck :-(
I've managed to establish that the problem is caused by either the
read or write to disk, or both. If the dictionary is defined in the
function, matches are found without a problem. It's only when the
dictionary is populated from disk when matches of non-ASCII characters
fail.
Can you think of anything else I can try?
Perhaps a few variable checks in the code, to help diagnose the
problem?
Sebastian
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Same non-ASCII characters not 'equal'
2006-08-17 7:23 ` Sebastian Tennant
@ 2006-08-17 16:49 ` James Cloos
2006-08-21 12:25 ` Sebastian Tennant
0 siblings, 1 reply; 7+ messages in thread
From: James Cloos @ 2006-08-17 16:49 UTC (permalink / raw)
Cc: help-gnu-emacs
>>>>> "Sebastian" == Sebastian Tennant <sebyte@smolny.plus.com>
>>>>> writes:
>> You are probably ending up with the 8859-1 (Latin 1) version of
>> U+00E7 LATIN SMALL LETTER C WITH CEDILLA in the elisp; using the
>> turkish-postfix input method most likely uses 8859-9 (Latin 5).
Sebastian> I don't think this is the problem as I'm working with a
Sebastian> unicode terminal, and the encodings used for read and write
Sebastian> are mule-utf-8
It is still possible that turkish-postfix generates a latin5 ç rather
than a mule-utf-8 ç. But, no. I get the same buffer code when using
turkish-postfix as when using X’s “<Multi_key> <,> <c>”. (That w/o
anything interesting in ~/.emacs but with LANG=en_US-UTF8 run on a sid
box with emacs-snapshot-nox installed via apt.)
Sebastian> Everything is mule-utf-8.
Then my guess was a red herring. I don’t know what the problem is.
>> Since I’ve moved almost exclusively to the unicode-2 branch, I
>> don’t remember the specifics of the unify-8859 modes, but they are
>> documented in info.
Sebastian> I'm not sure what you mean by unicode-2 branch
Sebastian> (emacs-version) "GNU Emacs 21.4.1 (i486-pc-linux-gnu)
Sebastian> of 2006-05-15 on trouble, modified by Debian"
The unicode-2 branch is a branch of the Emacs CVS repository. You can
grab it from cvs by using:
cvs -d :pserver:cvs.savannah.gnu.org:/cvsroot/emacs co -r emacs-unicode-2 emacs
instead of using:
cvs -d :pserver:cvs.savannah.gnu.org:/cvsroot/emacs co emacs
which grabs the HEAD branch.
The HEAD branch is to be released as Emacs-22. The emacs-unicode-2
branch is likely to be the basis of the Emacs-23 release.
On debian, you can get a compile of snapshots of the HEAD branch by
installing emacs-snapshot, emacs-snapshot-nox or emacs-snapshot-gtk
rather than using emacs, emacs-nox, emacs21 or emacs21-nox. (On sid
what you are running would be emacs21 or emacs21-nox, as applicable.
What is emacs21 on sid *may* be just emacs on sarge. I’m not sure
about etch. Emacs-snapshot *might* handle this better than emacs21
does. Or it might not. I’m confident that the unicode-2 branch,
however, will get it right. But on debian you’ll have to compile it
yourself. (On ubuntu, emacs-snapshot is certainly available for edgy
and — I *think* — for dapper; I’ve not tried anything older than that.)
Sebastian> I've managed to establish that the problem is caused by
Sebastian> either the read or write to disk, or both. If the
Sebastian> dictionary is defined in the function, matches are found
Sebastian> without a problem. It's only when the dictionary is
Sebastian> populated from disk when matches of non-ASCII characters
Sebastian> fail.
Try running (describe-char) with the point on the offending characters
in the buffer containing the data as read from disk. If they don’t
match what you get from (describe-char) on the freshly keyboard-input
characters then my guess was on the mark after all. Or at least in
the same ballpark ☺ — or the same football pitch, if you prefer.
If that is the case, I presume you need to set the coding-system for
reading in the dictionary data as mule-utf-8.
-JimC
--
James Cloos <cloos@jhcloos.com>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Same non-ASCII characters not 'equal'
2006-08-17 16:49 ` James Cloos
@ 2006-08-21 12:25 ` Sebastian Tennant
2006-08-21 13:15 ` James Cloos
0 siblings, 1 reply; 7+ messages in thread
From: Sebastian Tennant @ 2006-08-21 12:25 UTC (permalink / raw)
Cc: help-gnu-emacs
Quoth James Cloos <cloos@jhcloos.com>:
> Then my guess was a red herring. I don’t know what the problem is.
>
> On debian, you can get a compile of snapshots of the HEAD branch by
> installing emacs-snapshot, emacs-snapshot-nox or emacs-snapshot-gtk
> rather than using emacs, emacs-nox, emacs21 or emacs21-nox. (On sid
> what you are running would be emacs21 or emacs21-nox, as applicable.
> What is emacs21 on sid *may* be just emacs on sarge. I’m not sure
> about etch. Emacs-snapshot *might* handle this better than emacs21
> does. Or it might not. I’m confident that the unicode-2 branch,
> however, will get it right. But on debian you’ll have to compile it
> yourself. (On ubuntu, emacs-snapshot is certainly available for edgy
> and — I *think* — for dapper; I’ve not tried anything older than that.)
I've installed emacs-snapshot-nox from sid on my predominately etch
box... and the problem is solved :-)
Thanks for your assistance Jim.
Sebastian
^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <mailman.5138.1155469477.9609.help-gnu-emacs@gnu.org>]
* Re: Same non-ASCII characters not 'equal'
[not found] <mailman.5138.1155469477.9609.help-gnu-emacs@gnu.org>
@ 2006-08-13 16:56 ` Pascal Bourguignon
0 siblings, 0 replies; 7+ messages in thread
From: Pascal Bourguignon @ 2006-08-13 16:56 UTC (permalink / raw)
Sebastian Tennant <sebyte@smolny.plus.com> writes:
> Hello all,
>
> I'm trying to write a little vocab tester but I've stumbled upon some
> strange behaviour I can't figure out.
>
> For some reason the following code does not match strings containing
> special characters (i.e., non-ASCII characters input using an input
> method)?
>
> (with-temp-buffer
> (set-input-method 'turkish-postfix)
> (let ((dict (list '("glass" "bardak") '("house" "ev") '("girl" "kız")
> '("child" "çocuk") '("little" "küçük") '("good" "iyi")
> '("bad" "fena") '("horse" "at") '("this" "bu")))
> (input (read-from-minibuffer "? " nil nil nil nil nil t))
> match)
> (dolist (each dict (and match (message "Equal")))
> (when (member input each) (setq match t)))))
>
> Take 'child' and 'çocuk' for instance. Because the (turkish-postfix)
> input method is inherited in the minibuffer you have to type
> 'c h i 2 l d' to enter 'child' and a match is found, but when you
> enter 'çocuk' by typing 'c , o c u k', no match is found. Could this
> be a bug even?
It works for me.
--
__Pascal Bourguignon__ http://www.informatimago.com/
WARNING: This product warps space and time in its vicinity.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2006-08-21 13:15 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-13 11:44 Same non-ASCII characters not 'equal' Sebastian Tennant
2006-08-15 1:20 ` James Cloos
2006-08-17 7:23 ` Sebastian Tennant
2006-08-17 16:49 ` James Cloos
2006-08-21 12:25 ` Sebastian Tennant
2006-08-21 13:15 ` James Cloos
[not found] <mailman.5138.1155469477.9609.help-gnu-emacs@gnu.org>
2006-08-13 16:56 ` Pascal Bourguignon
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.