* bug#17412: 24.3; Unicode key events broken, not usable in input method
@ 2014-05-05 22:29 Stefan Dorn
2014-05-06 16:06 ` Stefan Monnier
0 siblings, 1 reply; 10+ messages in thread
From: Stefan Dorn @ 2014-05-05 22:29 UTC (permalink / raw)
To: 17412
My keyboard layout includes Unicode keys like "ł", U+0142 (l with
stroke), and combining diacritics (U+0300 etc). I've been trying to use
them in quail layouts, eg:
(quail-define-package
"custom" "custom layout" "^" t
"Proof-of-concept layout." nil t t nil nil nil nil nil nil nil t)
(quail-define-rules ("ł" ?l))
The key is never passed into the input-method-function, and so just
inserted literally. (Typing Unicode keys directly works fine.)
Digging around in keyboard.c, I found that read_char() only passes
events with keycode < 256 (line 3050ff) to input-method-function:
/* Pass this to the input method, if appropriate. */
if (INTEGERP (c)
&& ! NILP (Vinput_method_function)
/* Don't run the input method within a key sequence,
after the first event of the key sequence. */
&& NILP (prev_event)
&& ' ' <= XINT (c) && XINT (c) < 256 && XINT (c) != 127)
Using read-key-sequence, Emacs seems to parse "ł" as [322] (0x142 in
decimal). Disabling the condition in read_char() (so the key is
actually passed to quail) only seems to cause an infinite loop in quail
that I've not been able diagnose yet.
[322] as key event seems strange to me. The XLib keycode for "ł" (as
reported by xev) is 0x1000142. Maybe Emacs cuts off the leading bit?
Interestingly, quail shows the key in the guidance screen just fine, ie:
(quail-define-rules ("xł" ?l))
and typing "x" correctly suggest "xł" as a pattern; it's just impossible
to pass "ł" to quail and have it be parsed correctly.
In GNU Emacs 24.3.1 (x86_64-pc-linux-gnu, X toolkit)
of 2014-05-05 on scabeiathrax
Windowing system distributor `The X.Org Foundation', version 11.0.11500000
System Description: NAME=Gentoo
Configured using:
`configure '--prefix=/usr' '--build=x86_64-pc-linux-gnu'
'--host=x86_64-pc-linux-gnu' '--mandir=/usr/share/man'
'--infodir=/usr/share/info' '--datadir=/usr/share' '--sysconfdir=/etc'
'--localstatedir=/var/lib' '--libdir=/usr/lib64'
'--disable-silent-rules' '--disable-dependency-tracking'
'--program-suffix=-emacs-24' '--infodir=/usr/share/info/emacs-24'
'--localstatedir=/var'
'--enable-locallisppath=/etc/emacs:/usr/share/emacs/site-lisp'
'--with-crt-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.8.2/../../../../lib64'
'--with-gameuser=games' '--without-compress-info' '--without-hesiod'
'--without-kerberos' '--without-kerberos5' '--with-gpm' '--with-dbus'
'--without-gnutls' '--without-xml2' '--without-selinux'
'--without-wide-int' '--with-sound' '--with-x' '--without-ns'
'--without-gconf' '--without-gsettings' '--without-toolkit-scroll-bars'
'--with-gif' '--with-jpeg' '--with-png' '--with-rsvg' '--with-tiff'
'--with-xpm' '--without-imagemagick' '--with-xft' '--with-libotf'
'--with-m17n-flt' '--with-x-toolkit=lucid' '--with-xaw3d'
'GENTOO_PACKAGE=app-editors/emacs-24.3-r4'
'build_alias=x86_64-pc-linux-gnu' 'host_alias=x86_64-pc-linux-gnu'
'CFLAGS=-O2 -pipe -march=core2' 'LDFLAGS=-Wl,-O1 -Wl,--sort-common
-Wl,--hash-style=gnu -Wl,--as-needed' 'CPPFLAGS=''
Important settings:
value of $LANG: en_US.UTF-8
value of $XMODIFIERS: @im=ibus
locale-coding-system: utf-8-unix
default enable-multibyte-characters: t
Major mode: Lisp Interaction
Minor modes in effect:
tooltip-mode: t
mouse-wheel-mode: t
tool-bar-mode: t
menu-bar-mode: t
file-name-shadow-mode: t
global-font-lock-mode: t
font-lock-mode: t
blink-cursor-mode: t
auto-composition-mode: t
auto-encryption-mode: t
auto-compression-mode: t
line-number-mode: t
transient-mark-mode: t
Recent input:
( r e d a <backspace> <backspace> a d - k e y - s e
q u e n c e SPC " k e y : SPC " ) <left> <left> C-M-x
̈ C-M-x ł C-M-x e M-x b u g <tab> <backspace> <backspace>
<backspace> <backspace> <backspace> <backspace> <backspace>
<backspace> <backspace> <backspace> <backspace> <backspace>
<backspace> <backspace> r e p o <tab> r t <tab> <r
eturn>
Recent messages:
For information about GNU Emacs and the GNU system, type C-h C-a.
[776]
[322]
"e"
Making completion list...
Load-path shadows:
None found.
Features:
(shadow sort gnus-util mail-extr emacsbug message format-spec rfc822 mml
mml-sec mm-decode mm-bodies mm-encode mail-parse rfc2231 mailabbrev
gmm-utils mailheader sendmail rfc2047 rfc2045 ietf-drums mm-util
mail-prsvr mail-utils help-mode easymenu time-date tooltip ediff-hook
vc-hooks lisp-float-type mwheel x-win x-dnd tool-bar dnd fontset image
regexp-opt fringe tabulated-list newcomment lisp-mode register page
menu-bar rfn-eshadow timer select scroll-bar mouse jit-lock font-lock
syntax facemenu font-core frame cham georgian utf-8-lang misc-lang
vietnamese tibetan thai tai-viet lao korean japanese hebrew greek
romanian slovak czech european ethiopic indian cyrillic chinese
case-table epa-hook jka-cmpr-hook help simple abbrev minibuffer loaddefs
button faces cus-face macroexp files text-properties overlay sha1 md5
base64 format env code-pages mule custom widget hashtable-print-readable
backquote make-network-process dbusbind dynamic-setting
font-render-setting x-toolkit x multi-tty emacs)
^ permalink raw reply [flat|nested] 10+ messages in thread
* bug#17412: 24.3; Unicode key events broken, not usable in input method
2014-05-05 22:29 bug#17412: 24.3; Unicode key events broken, not usable in input method Stefan Dorn
@ 2014-05-06 16:06 ` Stefan Monnier
2014-05-06 18:38 ` Stefan Dorn
2014-05-12 23:23 ` K. Handa
0 siblings, 2 replies; 10+ messages in thread
From: Stefan Monnier @ 2014-05-06 16:06 UTC (permalink / raw)
To: Stefan Dorn; +Cc: 17412
> Digging around in keyboard.c, I found that read_char() only passes
> events with keycode < 256 (line 3050ff) to input-method-function:
Indeed, this has been in the input-method design from the start.
I'd be interested to know why. Handa?
> [322] as key event seems strange to me. The XLib keycode for "ł" (as
> reported by xev) is 0x1000142. Maybe Emacs cuts off the leading bit?
322 = U+0142, so it's really not strange at all: Emacs uses
Unicode internally.
Stefan
^ permalink raw reply [flat|nested] 10+ messages in thread
* bug#17412: 24.3; Unicode key events broken, not usable in input method
2014-05-06 16:06 ` Stefan Monnier
@ 2014-05-06 18:38 ` Stefan Dorn
2014-05-06 18:55 ` Eli Zaretskii
2014-05-12 23:23 ` K. Handa
1 sibling, 1 reply; 10+ messages in thread
From: Stefan Dorn @ 2014-05-06 18:38 UTC (permalink / raw)
To: Stefan Monnier, 17412
>> Digging around in keyboard.c, I found that read_char() only passes
>> events with keycode < 256 (line 3050ff) to input-method-function:
>
> Indeed, this has been in the input-method design from the start.
> I'd be interested to know why. Handa?
I write a lot of linguistic analysis, and so added common IPA symbols
to my core keyboard layout, like ß, ł or æ. (I could type them through
an input method, but that would be slower and force me to use a
different typing method inside and outside of Emacs, which would slow
me down a lot.)
I recently set up a Cyrillic input method, but was surprised I
arbitrarily could use ß in quail but not ł, just because ß is below
the magic threshold. Unfortunately, merely turning off the conditional
in read_char() is not enough to get it to work.
More importantly, I also have most combining diacritic characters
(U+0301 ff) on keys and use them a lot. Switching them to some
"similar looking punctuation -> diacritic" input method would be
seriously annoying due to lots of conflicts (quoting a letter vs
umlauting it etc).
Most search features in Emacs don't do Unicode normalization, so ä (a
with umlaut) and ä (a with combining diacritic umlaut) don't match. I
added some normalization hacks to isearch and just force-normalize the
buffer when I save it, but wanted a more universal and clean solution.
I thought I could just set up a "letter + combining diacritic" ->
"normalized character" input method to fix most of this, but again
arbitrarily can't use any of the diacritics in quail.
>> [322] as key event seems strange to me. The XLib keycode for "ł" (as
>> reported by xev) is 0x1000142. Maybe Emacs cuts off the leading bit?
>
> 322 = U+0142, so it's really not strange at all: Emacs uses
> Unicode internally.
Ah, cool.
^ permalink raw reply [flat|nested] 10+ messages in thread
* bug#17412: 24.3; Unicode key events broken, not usable in input method
2014-05-06 18:38 ` Stefan Dorn
@ 2014-05-06 18:55 ` Eli Zaretskii
2014-05-06 20:12 ` Stefan Monnier
0 siblings, 1 reply; 10+ messages in thread
From: Eli Zaretskii @ 2014-05-06 18:55 UTC (permalink / raw)
To: Stefan Dorn; +Cc: 17412
> From: Stefan Dorn <mail@muflax.com>
> Date: Tue, 6 May 2014 19:38:24 +0100
>
> Most search features in Emacs don't do Unicode normalization, so ä (a
> with umlaut) and ä (a with combining diacritic umlaut) don't match. I
> added some normalization hacks to isearch and just force-normalize the
> buffer when I save it, but wanted a more universal and clean solution.
>
> I thought I could just set up a "letter + combining diacritic" ->
> "normalized character" input method to fix most of this, but again
> arbitrarily can't use any of the diacritics in quail.
That's not how to add normalization support to Emacs search. It is
much better to define a case-table that maps each normalization
variant to a single canonical one, and then search functions will (or
at least should: I didn't actually try that) automatically do the
mapping for you, both in the search string and in the buffer/string
text you are searching through.
^ permalink raw reply [flat|nested] 10+ messages in thread
* bug#17412: 24.3; Unicode key events broken, not usable in input method
2014-05-06 18:55 ` Eli Zaretskii
@ 2014-05-06 20:12 ` Stefan Monnier
2014-05-06 20:14 ` Daniel Colascione
2014-05-07 18:13 ` Eli Zaretskii
0 siblings, 2 replies; 10+ messages in thread
From: Stefan Monnier @ 2014-05-06 20:12 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Stefan Dorn, 17412
> That's not how to add normalization support to Emacs search. It is
> much better to define a case-table that maps each normalization
> variant to a single canonical one, and then search functions will (or
> at least should: I didn't actually try that) automatically do the
Can case-tables do such normalization? Last I checked, they work "one
char at a time" and can't handle multi-char mappings at all (neither as
input nor as output).
Stefan
^ permalink raw reply [flat|nested] 10+ messages in thread
* bug#17412: 24.3; Unicode key events broken, not usable in input method
2014-05-06 20:12 ` Stefan Monnier
@ 2014-05-06 20:14 ` Daniel Colascione
2014-05-07 18:13 ` Eli Zaretskii
1 sibling, 0 replies; 10+ messages in thread
From: Daniel Colascione @ 2014-05-06 20:14 UTC (permalink / raw)
To: Stefan Monnier, Eli Zaretskii; +Cc: Stefan Dorn, 17412
[-- Attachment #1: Type: text/plain, Size: 545 bytes --]
On 05/06/2014 01:12 PM, Stefan Monnier wrote:
>> That's not how to add normalization support to Emacs search. It is
>> much better to define a case-table that maps each normalization
>> variant to a single canonical one, and then search functions will (or
>> at least should: I didn't actually try that) automatically do the
>
> Can case-tables do such normalization? Last I checked, they work "one
> char at a time" and can't handle multi-char mappings at all (neither as
> input nor as output).
So why not make them stateful?
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* bug#17412: 24.3; Unicode key events broken, not usable in input method
2014-05-06 20:12 ` Stefan Monnier
2014-05-06 20:14 ` Daniel Colascione
@ 2014-05-07 18:13 ` Eli Zaretskii
1 sibling, 0 replies; 10+ messages in thread
From: Eli Zaretskii @ 2014-05-07 18:13 UTC (permalink / raw)
To: Stefan Monnier; +Cc: mail, 17412
> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Stefan Dorn <mail@muflax.com>, 17412@debbugs.gnu.org
> Date: Tue, 06 May 2014 16:12:13 -0400
>
> > That's not how to add normalization support to Emacs search. It is
> > much better to define a case-table that maps each normalization
> > variant to a single canonical one, and then search functions will (or
> > at least should: I didn't actually try that) automatically do the
>
> Can case-tables do such normalization? Last I checked, they work "one
> char at a time" and can't handle multi-char mappings at all (neither as
> input nor as output).
I meant the canonical slot of the case-tables. Of course, doing what
I suggested will need some changes on the C level, but they are
straightforward, I think.
^ permalink raw reply [flat|nested] 10+ messages in thread
* bug#17412: 24.3; Unicode key events broken, not usable in input method
2014-05-06 16:06 ` Stefan Monnier
2014-05-06 18:38 ` Stefan Dorn
@ 2014-05-12 23:23 ` K. Handa
2014-05-13 1:17 ` Stefan Monnier
1 sibling, 1 reply; 10+ messages in thread
From: K. Handa @ 2014-05-12 23:23 UTC (permalink / raw)
To: Stefan Monnier; +Cc: mail, 17412
In article <jwviopiki5k.fsf-monnier+emacsbugs@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes:
> > Digging around in keyboard.c, I found that read_char() only passes
> > events with keycode < 256 (line 3050ff) to input-method-function:
> Indeed, this has been in the input-method design from the start.
> I'd be interested to know why. Handa?
As far as I remember, the relevant code was written by RMS,
and I'm sorry but I don't remember what I discussed with RMS
at that time.
Perhaps we had expected that a user typed C as a character
if C >= 256, not as a key to input another character.
---
Kenichi Handa
handa@gnu.org
^ permalink raw reply [flat|nested] 10+ messages in thread
* bug#17412: 24.3; Unicode key events broken, not usable in input method
2014-05-12 23:23 ` K. Handa
@ 2014-05-13 1:17 ` Stefan Monnier
2014-05-13 12:11 ` K. Handa
0 siblings, 1 reply; 10+ messages in thread
From: Stefan Monnier @ 2014-05-13 1:17 UTC (permalink / raw)
To: K. Handa; +Cc: mail, 17412
> Perhaps we had expected that a user typed C as a character
> if C >= 256, not as a key to input another character.
Sounds like it, indeed, but since we have decoded chars by the time we
get to input-event processing, it doesn't seem very useful to prevent
users from using non-ASCII keys for input-methods.
IOW, we should try and lift this restriction,
Stefan
^ permalink raw reply [flat|nested] 10+ messages in thread
* bug#17412: 24.3; Unicode key events broken, not usable in input method
2014-05-13 1:17 ` Stefan Monnier
@ 2014-05-13 12:11 ` K. Handa
0 siblings, 0 replies; 10+ messages in thread
From: K. Handa @ 2014-05-13 12:11 UTC (permalink / raw)
To: Stefan Monnier; +Cc: mail, 17412
In article <jwvwqdqzdeh.fsf-monnier+emacsbugs@gnu.org>, Stefan Monnier <monnier@IRO.UMontreal.CA> writes:
> > Perhaps we had expected that a user typed C as a character
> > if C >= 256, not as a key to input another character.
> Sounds like it, indeed, but since we have decoded chars by the time we
> get to input-event processing, it doesn't seem very useful to prevent
> users from using non-ASCII keys for input-methods.
> IOW, we should try and lift this restriction,
Yes, I agree.
---
Kenichi Handa
handa@gnu.org
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2014-05-13 12:11 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-05 22:29 bug#17412: 24.3; Unicode key events broken, not usable in input method Stefan Dorn
2014-05-06 16:06 ` Stefan Monnier
2014-05-06 18:38 ` Stefan Dorn
2014-05-06 18:55 ` Eli Zaretskii
2014-05-06 20:12 ` Stefan Monnier
2014-05-06 20:14 ` Daniel Colascione
2014-05-07 18:13 ` Eli Zaretskii
2014-05-12 23:23 ` K. Handa
2014-05-13 1:17 ` Stefan Monnier
2014-05-13 12:11 ` K. Handa
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).