How to recognize keyboard insertion?

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* How to recognize keyboard insertion?
@ 2009-10-31 15:57 Eli Zaretskii
  2009-10-31 16:58 ` David De La Harpe Golden
  0 siblings, 1 reply; 28+ messages in thread
From: Eli Zaretskii @ 2009-10-31 15:57 UTC (permalink / raw)
  To: emacs-devel

Do we have infrastructure for detecting, inside one of the functions
that insert text into buffers, characters that were inserted via the
keyboard or keyboard macros?

Failing that, can I safely assume that self-insert-command and its
optimized variant in command_loop_1 are the only ways to insert
characters from keyboard and keyboard macros, and that
self-insert-command is only supposed to be invoked by characters typed
at the keyboard?

I'm asking because, in bidirectional editing, characters that are
mirrored at display time need to be mirrored at keyboard input time.
For example, when typing right-to-left text, the character `)' should
be mirrored so that what ends up in the buffer is `(', because what
the user means is to produce an open parenthesis.  (Displaying this
text will then mirror again, and display `)'; this last part already
works in the bidi Emacs I'm working on).

So I need to mirror characters typed at the keyboard, but not
characters yanked from the kill ring or pasted from X selections.  How
can I discern the first kind from the second?

TIA

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: How to recognize keyboard insertion?
  2009-10-31 15:57 How to recognize keyboard insertion? Eli Zaretskii
@ 2009-10-31 16:58 ` David De La Harpe Golden
  2009-10-31 17:20   ` Eli Zaretskii
  0 siblings, 1 reply; 28+ messages in thread
From: David De La Harpe Golden @ 2009-10-31 16:58 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii wrote:

> I'm asking because, in bidirectional editing, characters that are
> mirrored at display time need to be mirrored at keyboard input time.
> For example, when typing right-to-left text, the character `)' should
> be mirrored so that what ends up in the buffer is `(', because what
> the user means is to produce an open parenthesis.  (Displaying this
> text will then mirror again, and display `)'; this last part already
> works in the bidi Emacs I'm working on).
> 

Do you?  I'm not really knowledgeable about RtL, but reason I ask is 
because when I switch on an arabic OS-level keyboard layout, Shift-9 
actually generates a ) parenright keysym and shift-0 a ( parenleft, 
which I think is then displayed mirrored as per the last bit of your 
post in RtL contexts.

ثثثث(321)ثثث

You can see the transposition in /usr/share/X11/xkb/symbols on
typical gnu+linux distros

So the right place to do such keyboard  mirroring intra-emacs might be 
in  quail, i.e. for when people are trying to work RtL only intra-emacs
still with a western os-level keymap.
And indeed, the transposition is shown in the commentary in
emacs/leim/quail/arabic.el  ,
though I think it's missing from the actual map at present
(possibly because emacs lacks RtL until you're done!)

So, since one can assume either the OS keymap or quail will be 
pre-mirrroing in practice, you probably don't need to distinguish
keyboard vs. paste here.

N.B. I could be quite wrong here, not expert by any means.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: How to recognize keyboard insertion?
  2009-10-31 16:58 ` David De La Harpe Golden
@ 2009-10-31 17:20   ` Eli Zaretskii
  2009-10-31 17:37     ` David De La Harpe Golden
                       ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Eli Zaretskii @ 2009-10-31 17:20 UTC (permalink / raw)
  To: David De La Harpe Golden; +Cc: emacs-devel

> Date: Sat, 31 Oct 2009 16:58:45 +0000
> From: David De La Harpe Golden <david@harpegolden.net>
> Cc: emacs-devel@gnu.org
> 
> Do you?  I'm not really knowledgeable about RtL, but reason I ask is 
> because when I switch on an arabic OS-level keyboard layout, Shift-9 
> actually generates a ) parenright keysym and shift-0 a ( parenleft, 
> which I think is then displayed mirrored as per the last bit of your 
> post in RtL contexts.

But that is wrong: per the Unicode Bidirectional Algorithm
(a.k.a. UAX#9), a `(' should only be mirrored if its resolved
directionality is R:

 L4. A character is depicted by a mirrored glyph if and only if
    (a) the resolved directionality of that character is R, and
    (b) the Bidi_Mirrored property value of that character is true.

To simplify, this means that a `(' should be mirrored when surrounded
by strong R2L characters, but not when surrounded by Latin characters
or European digits.

What you describe above means that, when typing mixed Arabic and Latin
text, the user needs to switch back from Arabic when she types
mirrored characters, even if these characters are surrounded by
digits, for example.  In Emacs, this means that we would need to
switch away from the input method, even when typing characters whose
keys are not translated by the input method.  That sounds like a
nuisance.

Alternatively, we will need to mirror characters even if their
directionality is L, which is against UAX#9 and will cause incorrect
display in some not-so-rare cases.

For example, try typing "9*(4+5)" after switching to Arabic keyboard.
What do you get?

> So the right place to do such keyboard  mirroring intra-emacs might be 
> in  quail, i.e. for when people are trying to work RtL only intra-emacs
> still with a western os-level keymap.

Quail cannot easily know the context: it can only mirror these
characters always, which is not right, since the display will mirror
them only if they are surrounded by strong R2L characters.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: How to recognize keyboard insertion?
  2009-10-31 17:20   ` Eli Zaretskii
@ 2009-10-31 17:37     ` David De La Harpe Golden
  2009-10-31 17:43       ` David De La Harpe Golden
  2009-10-31 18:15       ` Eli Zaretskii
  2009-11-01  1:30     ` Jason Rumney
       [not found]     ` <837huac8gg.fsf@gnu.org>
  2 siblings, 2 replies; 28+ messages in thread
From: David De La Harpe Golden @ 2009-10-31 17:37 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1395 bytes --]

Eli Zaretskii wrote:

> To simplify, this means that a `(' should be mirrored when surrounded
> by strong R2L characters, but not when surrounded by Latin characters
> or European digits.
> 

It IS only mirrored when surrounded by rtl characters, that was
what I included the nonsense string for.  When not so surrounded,
it, Shift-9 generates ")" /and it is shown as ")"/.   Arabic keyboards 
still have ( printed on 9, so I guess they work majority-rtl (which 
would make sense).

> Alternatively, we will need to mirror characters even if their
> directionality is L

I don't see why. Other apps don't.

> For example, try typing "9*(4+5)" after switching to Arabic keyboard.
> What do you get?
>


9*)4+5(

surrounded:

ثثث9*)4+5(ثثث
- but that was when I typed the expression as if LtR (i.e. hitting 9 
first),  I suspect an arabic person might type

ثثث(5+4)*9ثثث

- i.e. hitting ")" first when transcribing "9*(4+5)".  Then it just 
works I think as above (I'm including a screenshot from icedove just in 
case)

> Quail cannot easily know the context: it can only mirror these
> characters always, which is not right, since the display will mirror
> them only if they are surrounded by strong R2L characters.
> 
> 

I expect that's in fact what arabic users expect, though an actual 
arabic person might want to speak up...




[-- Attachment #2: rtlicedove1.png --]
[-- Type: image/png, Size: 68247 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: How to recognize keyboard insertion?
  2009-10-31 17:37     ` David De La Harpe Golden
@ 2009-10-31 17:43       ` David De La Harpe Golden
  2009-10-31 18:15       ` Eli Zaretskii
  1 sibling, 0 replies; 28+ messages in thread
From: David De La Harpe Golden @ 2009-10-31 17:43 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

David De La Harpe Golden wrote:

> ثثث(5+4)*9ثثث
> 
> - i.e. hitting ")" first when transcribing "9*(4+5)".

Just in case, there I meant hitting KEY with ")" physically printed on 
it, producing  "(" of course, then shown as ")" ...






^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: How to recognize keyboard insertion?
  2009-10-31 17:37     ` David De La Harpe Golden
  2009-10-31 17:43       ` David De La Harpe Golden
@ 2009-10-31 18:15       ` Eli Zaretskii
  2009-10-31 19:26         ` David De La Harpe Golden
  2009-11-01  5:44         ` tomas
  1 sibling, 2 replies; 28+ messages in thread
From: Eli Zaretskii @ 2009-10-31 18:15 UTC (permalink / raw)
  To: David De La Harpe Golden; +Cc: emacs-devel

> Date: Sat, 31 Oct 2009 17:37:07 +0000
> From: David De La Harpe Golden <david@harpegolden.net>
> CC: emacs-devel@gnu.org
> 
> > To simplify, this means that a `(' should be mirrored when surrounded
> > by strong R2L characters, but not when surrounded by Latin characters
> > or European digits.
> > 
> 
> It IS only mirrored when surrounded by rtl characters, that was
> what I included the nonsense string for.  When not so surrounded,
> it, Shift-9 generates ")" /and it is shown as ")"/.

My understanding is that Shift-9 generates `(' or `)' depending on
whether the current keyboard is Latin or Arabic, not depending on the
characters surrounding the parenthesis.  All your examples show that
(and I see the same on my Windows box if I switch the keyboard to
Hebrew).  Do you agree?

> > For example, try typing "9*(4+5)" after switching to Arabic keyboard.
> > What do you get?
> >
> 
> 
> 9*)4+5(

Which is wrong, don't you think?

> surrounded:
> 
> ثثث9*)4+5(ثثث
> - but that was when I typed the expression as if LtR (i.e. hitting 9 
> first)

This is how digits and other mathematical expressions are typed in
bidirectional text.

> I suspect an arabic person might type
> 
> ثثث(5+4)*9ثثث
> 
> - i.e. hitting ")" first when transcribing "9*(4+5)".

Maybe if the digits are Arabic digits.  I don't know enough Arabic to
judge this example.  Hebrew uses European digits, and they are typed
left to right, exactly like in Latin scripts.  We could, of course,
tell users to switch off Hebrew input method when typing math, but
that's an annoyance, IMO.

> > Quail cannot easily know the context: it can only mirror these
> > characters always, which is not right, since the display will mirror
> > them only if they are surrounded by strong R2L characters.
> 
> I expect that's in fact what arabic users expect, though an actual 
> arabic person might want to speak up...

Maybe, I really don't know.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: How to recognize keyboard insertion?
  2009-10-31 18:15       ` Eli Zaretskii
@ 2009-10-31 19:26         ` David De La Harpe Golden
  2009-10-31 20:01           ` Eli Zaretskii
  2009-11-01  3:40           ` Stephen J. Turnbull
  2009-11-01  5:44         ` tomas
  1 sibling, 2 replies; 28+ messages in thread
From: David De La Harpe Golden @ 2009-10-31 19:26 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii wrote:

> My understanding is that Shift-9 generates `(' or `)' depending on
> whether the current keyboard is Latin or Arabic, not depending on the
> characters surrounding the parenthesis.  All your examples show that
> (and I see the same on my Windows box if I switch the keyboard to
> Hebrew).  Do you agree?

Probably - Shift-9 generates  parenleft (#x28) or parenright (#x29) 
depending on current keyboard layout. #x28 is then displayed as ( in ltr 
context, or ) in rtl context.

> 
>>> For example, try typing "9*(4+5)" after switching to Arabic keyboard.
>>> What do you get?
>>>
>>
>> 9*)4+5(
> 
> Which is wrong, don't you think?
>

It's clearly not a valid arithmetical expression... It is however how 
mature bidi capable apps I tried behave, for better or worse. I don't 
think this is an area where emacs, bidi latecomer, should diverge from 
established practice, especially not by default - rtl-native users 
presumably by now expect to press the key labelled ")" to get "(" when 
using their native keymap but in an ltr context. Maybe they regard that
as an annoyance, I dunno, or maybe it's a semantic-map feature, since 
the same shift-0 keypress still makes an opening paren (modern hebrew 
text, at least, seems to sometimes use () in text, not just arithmetic 
much like english, at least judging by wikipedia hebrew texts).

But therefore it's not necessary to track whether the character was 
entered by keyboard unless you want to provide a further unusual "smart"
layer that doesn't work like typical bidi apps.

>> I suspect an arabic person might type
>>
>> ثثث(5+4)*9ثثث
>>
>> - i.e. hitting ")" first when transcribing "9*(4+5)".
> 
> Maybe if the digits are Arabic digits.  I don't know enough Arabic to
> judge this example. 

Note that western "arabic numerals" vs. eastern arabic numerals is 
apparently a matter of font+bidi display (again for better or worse)  - 
i.e. if I then copy just the expression above from within the rtl string 
and paste it into a ltr context, I get:

(5+4)*9

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: How to recognize keyboard insertion?
  2009-10-31 19:26         ` David De La Harpe Golden
@ 2009-10-31 20:01           ` Eli Zaretskii
  2009-10-31 20:42             ` David De La Harpe Golden
  2009-11-01  3:40           ` Stephen J. Turnbull
  1 sibling, 1 reply; 28+ messages in thread
From: Eli Zaretskii @ 2009-10-31 20:01 UTC (permalink / raw)
  To: David De La Harpe Golden; +Cc: emacs-devel

> Date: Sat, 31 Oct 2009 19:26:02 +0000
> From: David De La Harpe Golden <david@harpegolden.net>
> CC: emacs-devel@gnu.org
> 
> But therefore it's not necessary to track whether the character was 
> entered by keyboard unless you want to provide a further unusual "smart"
> layer that doesn't work like typical bidi apps.

I'm not convinced, sorry.  There are mirrored characters that are not
part of the localized keyboards, at least.  They are also not
supported by most language-oriented input methods.  We still need to
DTRT with them, even if they are inserted as Unicode codepoints or in
some other way.




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: How to recognize keyboard insertion?
  2009-10-31 20:01           ` Eli Zaretskii
@ 2009-10-31 20:42             ` David De La Harpe Golden
  2009-10-31 21:23               ` Eli Zaretskii
  0 siblings, 1 reply; 28+ messages in thread
From: David De La Harpe Golden @ 2009-10-31 20:42 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii wrote:
>> Date: Sat, 31 Oct 2009 19:26:02 +0000
>> From: David De La Harpe Golden <david@harpegolden.net>
>> CC: emacs-devel@gnu.org
>>
>> But therefore it's not necessary to track whether the character was 
>> entered by keyboard unless you want to provide a further unusual "smart"
>> layer that doesn't work like typical bidi apps.
> 
> I'm not convinced, sorry.  There are mirrored characters that are not
> part of the localized keyboards, at least.  They are also not
> supported by most language-oriented input methods.  We still need to
> DTRT with them, even if they are inserted as Unicode codepoints or in
> some other way.
> 
> 

Well, you're writing the code, I'm in a ltr (not counting btt standing 
stones) area, I was just pointing out how the existing crop bidi apps do 
"handle" (i.e. not do anything clever) the issue.

Maybe one way to handle it would be to make an (emacs level) input 
method autoswitcher, that swaps emacs input methods as the rtl/ltr 
context switches with point movement.  i.e. allow (but don't require)
rtl and ltr contexts to have different emacs input methods.
Then there could be variant emacs input methods with various 
transpositions suitable for use with various os-level keymaps.  And you 
don't have to be able to _record_ whether  an inserted character came 
from the keyboard.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: How to recognize keyboard insertion?
  2009-10-31 20:42             ` David De La Harpe Golden
@ 2009-10-31 21:23               ` Eli Zaretskii
  2009-10-31 21:49                 ` David De La Harpe Golden
  0 siblings, 1 reply; 28+ messages in thread
From: Eli Zaretskii @ 2009-10-31 21:23 UTC (permalink / raw)
  To: David De La Harpe Golden; +Cc: emacs-devel

> Date: Sat, 31 Oct 2009 20:42:48 +0000
> From: David De La Harpe Golden <david@harpegolden.net>
> CC: emacs-devel@gnu.org
> 
> Maybe one way to handle it would be to make an (emacs level) input 
> method autoswitcher, that swaps emacs input methods as the rtl/ltr 
> context switches with point movement.  i.e. allow (but don't require)
> rtl and ltr contexts to have different emacs input methods.

We can have input methods switched on and off depending on surrounding
characters, but how will this solve the problem that different methods
of inputting the same character behave differently with mirrored
characters?  A user can conceptually type a character either via an
Emacs input method or via the OS keyboard, in the same place, can't
she?

> Then there could be variant emacs input methods with various
> transpositions suitable for use with various os-level keymaps.

Are you saying that Emacs should have a way of knowing which OS-level
keyboard layout (or keyboard language, in Windows parlance) was used
to insert the character?  If so, how to do that?

Or are you saying that switching on a suitable input method, depending
on surrounding characters, will eliminate the need to know how the
character was inserted?  If so, please explain why you think so,
because I don't follow.

> And you don't have to be able to _record_ whether an inserted
> character came from the keyboard.

I don't need to record that, I just need to know that when the
character is inserted.  After it's inserted, this information is not
needed anymore, because display-time mirroring has enough information
to DTRT.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: How to recognize keyboard insertion?
  2009-10-31 21:23               ` Eli Zaretskii
@ 2009-10-31 21:49                 ` David De La Harpe Golden
  2009-11-01  3:44                   ` Eli Zaretskii
  0 siblings, 1 reply; 28+ messages in thread
From: David De La Harpe Golden @ 2009-10-31 21:49 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii wrote:


> Are you saying that Emacs should have a way of knowing which OS-level
> keyboard layout (or keyboard language, in Windows parlance) was used
> to insert the character?  If so, how to do that?
> 

Not as such (though asking the os what the current keyboard layout is 
should be possible on any reasonable os via the platform analog of 
XkbGetKeyboard() ?)

- A user with a hebrew os keyboard layout who liked auto-switching could 
define that in an ltr context, an emacs input method revpar* should be 
switched to. That input method would yield "(" when the os sends ")" to 
emacs. A user with a us os keyboard layout who wanted to use the hebrew 
emacs input method and also auto-switch could use "hebrew" and 
"hebrew-revpar" rtl and ltr input methods.

> Or are you saying that switching on a suitable input method, depending
> on surrounding characters, will eliminate the need to know how the
> character was inserted?  If so, please explain why you think so,
> because I don't follow.

Because it would always insert the appropriate character?



*
(require quail)
(quail-define-package
  "revpar" "revpar" ")(" t
  "transpose inserted parens"
  nil nil nil nil nil nil nil nil nil nil t)
(quail-define-rules
  ("(" ?\))
  (")" ?\())




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: How to recognize keyboard insertion?
  2009-10-31 17:20   ` Eli Zaretskii
  2009-10-31 17:37     ` David De La Harpe Golden
@ 2009-11-01  1:30     ` Jason Rumney
  2009-11-01  4:02       ` Eli Zaretskii
       [not found]     ` <837huac8gg.fsf@gnu.org>
  2 siblings, 1 reply; 28+ messages in thread
From: Jason Rumney @ 2009-11-01  1:30 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, David De La Harpe Golden

Eli Zaretskii wrote:
> But that is wrong: per the Unicode Bidirectional Algorithm
> (a.k.a. UAX#9), a `(' should only be mirrored if its resolved
> directionality is R:
>   

I don't think you can do that mirroring on input, as the directionality 
will change as the user types (Assume below letters represent an Arabic 
or Hebrew character):

User types: ABCD(
Displayed as: )DCBA

User types: ABCD(4
Displayed as: 4)DCBA or (4DCBA? I suspect the first, as the user might 
type something other than a number next

User types: ABCD(4+5)
Displayed as: (4+5)DCBA regardless of how directionality of parens is 
interpreted.

User types: ABCD(4+5)*9
Displayed as: (4+5)*9DCBA Parens here must be LTR

I guess this is why the mirroring happens at keyboard driver level and 
applications do not try to do it correctly, because in practice doing it 
correctly results in text jumping around, confusing the user more than 
it confuses them to manually fix the problems of a dumb implementation. 
And there are always going to be ambiguous cases, where leaving the user 
to manually DTRT will be the only option.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: How to recognize keyboard insertion?
  2009-10-31 19:26         ` David De La Harpe Golden
  2009-10-31 20:01           ` Eli Zaretskii
@ 2009-11-01  3:40           ` Stephen J. Turnbull
  2009-11-01  5:46             ` David De La Harpe Golden
  1 sibling, 1 reply; 28+ messages in thread
From: Stephen J. Turnbull @ 2009-11-01  3:40 UTC (permalink / raw)
  To: David De La Harpe Golden; +Cc: Eli Zaretskii, emacs-devel

David De La Harpe Golden writes:
 > Eli Zaretskii wrote:

 > >>> For example, try typing "9*(4+5)" after switching to Arabic keyboard.
 > >>> What do you get?
 > >>
 > >> 9*)4+5(
 > > 
 > > Which is wrong, don't you think?
 > 
 > It's clearly not a valid arithmetical expression... It is however how 
 > mature bidi capable apps I tried behave, for better or worse.

I think you should name the apps, so that people can judge for
themselves whether those are "generally high quality" implementations
if they have experience with them.  I'm only interested in bidi in an
academic sense, but I see an analogy to development of MUA features
for handling mailing list traffic.  Many "mature" MUAs impose
substantial user pain because they don't recognize the RFC 2369
List-Post header as a signal to prefer to reply to list, although that
header was standardized in 1998, and making this the default would
essentially eliminate all demand for Reply-To munging.  (Eg,
Thunderbird 3 finally got this feature in the "Reply" button but it is
still not bound in the key shortcuts.)

It's possible that (like reply to list) the current audience of Emacs
would prefer to learn context-dependent typing idioms for mirrored
characters in bidi rather than be able to use the same "logical"
sequence of keystrokes for "9*(4+5)" regardless of context.  OTOH,
that may be a barrier to reaching a new audience.

An advanced algorithm certainly should be the default in betas.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: How to recognize keyboard insertion?
  2009-10-31 21:49                 ` David De La Harpe Golden
@ 2009-11-01  3:44                   ` Eli Zaretskii
  2009-11-01  5:24                     ` David De La Harpe Golden
  0 siblings, 1 reply; 28+ messages in thread
From: Eli Zaretskii @ 2009-11-01  3:44 UTC (permalink / raw)
  To: David De La Harpe Golden; +Cc: emacs-devel

> Date: Sat, 31 Oct 2009 21:49:20 +0000
> From: David De La Harpe Golden <david@harpegolden.net>
> CC: emacs-devel@gnu.org
> 
> - A user with a hebrew os keyboard layout who liked auto-switching could 
> define that in an ltr context, an emacs input method revpar* should be 
> switched to. That input method would yield "(" when the os sends ")" to 
> emacs. A user with a us os keyboard layout who wanted to use the hebrew 
> emacs input method and also auto-switch could use "hebrew" and 
> "hebrew-revpar" rtl and ltr input methods.

OK, but knowing whether to mirror or not requires information about
whether a given keyboard already mirrors characters.  Can this be
found somewhere, or queried at run time?




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: How to recognize keyboard insertion?
  2009-11-01  1:30     ` Jason Rumney
@ 2009-11-01  4:02       ` Eli Zaretskii
  2009-11-01  5:25         ` Stephen J. Turnbull
  0 siblings, 1 reply; 28+ messages in thread
From: Eli Zaretskii @ 2009-11-01  4:02 UTC (permalink / raw)
  To: Jason Rumney; +Cc: emacs-devel, david

> Date: Sun, 01 Nov 2009 09:30:35 +0800
> From: Jason Rumney <jasonr@gnu.org>
> CC: David De La Harpe Golden <david@harpegolden.net>, 
>  emacs-devel@gnu.org
> 
> User types: ABCD(
> Displayed as: )DCBA

Yes.

> User types: ABCD(4
> Displayed as: 4)DCBA or (4DCBA? I suspect the first, as the user might 
> type something other than a number next

The first, yes.

> User types: ABCD(4+5)
> Displayed as: (4+5)DCBA regardless of how directionality of parens is 
> interpreted.

Yes.  But I don't understand the ``regardless'' part.  If you want to
know the resolved directionality of each paren, I can tell you what
the current algorithm does (what UAX#9 requires).

> User types: ABCD(4+5)*9
> Displayed as: (4+5)*9DCBA Parens here must be LTR

No, it's displayed as 9*(4+5)DCBA.

> I guess this is why the mirroring happens at keyboard driver level and 
> applications do not try to do it correctly, because in practice doing it 
> correctly results in text jumping around, confusing the user more than 
> it confuses them to manually fix the problems of a dumb implementation. 

Sorry, I don't understand: what jumping around are we talking about,
and how is mirroring related to that?




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: How to recognize keyboard insertion?
  2009-11-01  3:44                   ` Eli Zaretskii
@ 2009-11-01  5:24                     ` David De La Harpe Golden
  2009-11-01 19:59                       ` Eli Zaretskii
  0 siblings, 1 reply; 28+ messages in thread
From: David De La Harpe Golden @ 2009-11-01  5:24 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii wrote:

> OK, but knowing whether to mirror or not requires information about
> whether a given keyboard already mirrors characters.  Can this be
> found somewhere, or queried at run time?

[Only if you want to do it automatically, if it was a user preference 
the user would just be setting ltr and rtl current input methods as 
desired? Or maybe it would be more intuitive to use just one input 
method, e.g. hebrew-parenjuggle, expanding the input method layer to 
support rtl/ltr context sensitive definitions for individual rules]

It is possible to find out if the current os keyboard layout is us or 
hebrew or whatever which could be coupled with prior information that it 
is standard for certain layouts to mirror.  I don't know exhaustively 
which ones do, though a lot could probably be extracted by inspection of 
the xkb database.

If OTOH you wanted to find out whether the code a keypress returns under 
the current os layout actually corresponds to the glyph printed on the 
keyboard, you can't really - Only the user knows that at present as 
current keyboards don't really inform the computer what glyphs they have 
physically printed on them AFAIK (though it would certainly be 
technically feasible for a keyboard to e.g. say "Hi, I am physically a 
standard british qwerty 105 key keyboard" to a computer with some 
well-defined wire protocol, I don't think typical PC ones do).

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: How to recognize keyboard insertion?
  2009-11-01  4:02       ` Eli Zaretskii
@ 2009-11-01  5:25         ` Stephen J. Turnbull
  2009-11-01 13:59           ` David De La Harpe Golden
  2009-11-01 19:57           ` Eli Zaretskii
  0 siblings, 2 replies; 28+ messages in thread
From: Stephen J. Turnbull @ 2009-11-01  5:25 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: david, emacs-devel, Jason Rumney

Eli Zaretskii writes:

 > > User types: ABCD(4+5)*9
 > > Displayed as: (4+5)*9DCBA Parens here must be LTR
 > 
 > No, it's displayed as 9*(4+5)DCBA.

That seems weird to me.  From my (probably imperfect) understanding of
UAX#9 I would expect the following sequence of displays starting with
an empty buffer (notation: uppercase letters are RTL, lowercase
letters and digits are LTR, -!- is point):

-!-
-!-A
-!-BA
-!-CBA
-!-DCBA
-!-)DCBA
4-!-)DCBA
4+-!-)DCBA
4+5-!-)DCBA      [1]
-!-(4+5)DCBA     <-- point jumps
(4+5)*-!-DCBA    <-- point jumps again
(4+5)*9-!-DCBA

I gather you're saying the correct interpretation of UAX#9 is
(starting from [1]):

4+5-!-)DCBA      [1]
-!-(4+5)DCBA     <-- point jumps
-!-*(4+5)DCBA
9-!-*(4+5)DCBA





^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: How to recognize keyboard insertion?
  2009-10-31 18:15       ` Eli Zaretskii
  2009-10-31 19:26         ` David De La Harpe Golden
@ 2009-11-01  5:44         ` tomas
  2009-11-01 18:48           ` Eli Zaretskii
  1 sibling, 1 reply; 28+ messages in thread
From: tomas @ 2009-11-01  5:44 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, David De La Harpe Golden

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sat, Oct 31, 2009 at 08:15:58PM +0200, Eli Zaretskii wrote:
> > Date: Sat, 31 Oct 2009 17:37:07 +0000
> > From: David De La Harpe Golden <david@harpegolden.net>

[...]

> > ثثث9*)4+5(ثثث
> > - but that was when I typed the expression as if LtR (i.e. hitting 9 
> > first)
> 
> This is how digits and other mathematical expressions are typed in
> bidirectional text.
> 
> > I suspect an arabic person might type
> > 
> > ثثث(5+4)*9ثثث
> > 
> > - i.e. hitting ")" first when transcribing "9*(4+5)".
> 
> Maybe if the digits are Arabic digits.  I don't know enough Arabic to
> judge this example [...]

I don't either, but FWIW, i can say that although Arabic uses different
glyphs to represent digits, the write direction for numerals is the same
as in Latin and Hebrew, i.e. most significant digit to the left.

Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFK7SBMBcgs9XrR2kYRAgNEAJ0dkBlhLxLZ26PoFJ4kMf0Pbg8BFgCfRHFu
iql3sxmw8+BKk/pwOoYk2tg=
=dJQ5
-----END PGP SIGNATURE-----




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: How to recognize keyboard insertion?
  2009-11-01  3:40           ` Stephen J. Turnbull
@ 2009-11-01  5:46             ` David De La Harpe Golden
  0 siblings, 0 replies; 28+ messages in thread
From: David De La Harpe Golden @ 2009-11-01  5:46 UTC (permalink / raw)
  To: Stephen J. Turnbull; +Cc: Eli Zaretskii, emacs-devel

Stephen J. Turnbull wrote:

> I think you should name the apps, so that people can judge for
> themselves whether those are "generally high quality" implementations
> if they have experience with them.

Actually, I think they just break down to gtk+/pango based, qt based, 
and openoffice.  qt and openoffice aren't magically displaying eastern 
arabic numerals sometimes like gkt+/pango does, but otherwise seem 
similar in my (cursory) tests.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: How to recognize keyboard insertion?
  2009-11-01  5:25         ` Stephen J. Turnbull
@ 2009-11-01 13:59           ` David De La Harpe Golden
  2009-11-01 19:57           ` Eli Zaretskii
  1 sibling, 0 replies; 28+ messages in thread
From: David De La Harpe Golden @ 2009-11-01 13:59 UTC (permalink / raw)
  To: Stephen J. Turnbull; +Cc: Eli Zaretskii, Jason Rumney, emacs-devel

Turning on arabic layout and using keypresses "ABCD(5+4)*96EF" i.e. 
pressing a then b then c then d then Shift-0 (so generating code for
"(" not ")", remembering the arabic layout has them reversed) then 5 
then + then 4 then Shift-9 (so code ")" not "(") then * then 9 then 6 
then e then f. And now using "(" and ")" to indicate  displayed 
orientation not underlying code:

-!- is cursor:

icedove / kmail / openoffice all do:

-!-
-!-A
-!-BA
-!-CBA
-!-DCBA
DCBA(-!-
5-!-)DCBA
5)DCBA+-!-
4-!-+5)DCBA
4+5)DCBA)-!-
4+5)DCBA)*-!-
9-!-*(4+5)DCBA
96-!-*(4+5)DCBA
-!-E96*(4+5)DCBA
-!-FE96*(4+5)DCBA

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: How to recognize keyboard insertion?
  2009-11-01  5:44         ` tomas
@ 2009-11-01 18:48           ` Eli Zaretskii
  2009-11-01 20:09             ` David De La Harpe Golden
  0 siblings, 1 reply; 28+ messages in thread
From: Eli Zaretskii @ 2009-11-01 18:48 UTC (permalink / raw)
  To: tomas; +Cc: emacs-devel, david

> Date: Sun, 1 Nov 2009 06:44:44 +0100
> From: tomas@tuxteam.de
> Cc: David De La Harpe Golden <david@harpegolden.net>, emacs-devel@gnu.org
> 
> > > I suspect an arabic person might type
> > > 
> > > ثثث(5+4)*9ثثث
> > > 
> > > - i.e. hitting ")" first when transcribing "9*(4+5)".
> > 
> > Maybe if the digits are Arabic digits.  I don't know enough Arabic to
> > judge this example [...]
> 
> I don't either, but FWIW, i can say that although Arabic uses different
> glyphs to represent digits, the write direction for numerals is the same
> as in Latin and Hebrew, i.e. most significant digit to the left.

Maybe so, but UAX#9 treats European digits and Arabic digits
differently.  They have different bidirectional properties.





^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: How to recognize keyboard insertion?
  2009-11-01  5:25         ` Stephen J. Turnbull
  2009-11-01 13:59           ` David De La Harpe Golden
@ 2009-11-01 19:57           ` Eli Zaretskii
  1 sibling, 0 replies; 28+ messages in thread
From: Eli Zaretskii @ 2009-11-01 19:57 UTC (permalink / raw)
  To: Stephen J. Turnbull; +Cc: david, emacs-devel, jasonr

> From: "Stephen J. Turnbull" <stephen@xemacs.org>
> Cc: Jason Rumney <jasonr@gnu.org>,
>     emacs-devel@gnu.org,
>     david@harpegolden.net
> Date: Sun, 01 Nov 2009 14:25:42 +0900
> 
> Eli Zaretskii writes:
> 
>  > > User types: ABCD(4+5)*9
>  > > Displayed as: (4+5)*9DCBA Parens here must be LTR
>  > 
>  > No, it's displayed as 9*(4+5)DCBA.
> 
> That seems weird to me.  From my (probably imperfect) understanding of
> UAX#9 I would expect the following sequence of displays starting with
> an empty buffer (notation: uppercase letters are RTL, lowercase
> letters and digits are LTR, -!- is point):
> 
> -!-
> -!-A
> -!-BA
> -!-CBA
> -!-DCBA
> -!-)DCBA
> 4-!-)DCBA
> 4+-!-)DCBA
> 4+5-!-)DCBA      [1]
> -!-(4+5)DCBA     <-- point jumps
> (4+5)*-!-DCBA    <-- point jumps again
> (4+5)*9-!-DCBA

(When you say "point jumps", you actually mean "cursor jumps", right?
Because point does not jump at all, it always is after the last
character typed on each of the above lines.)

> I gather you're saying the correct interpretation of UAX#9 is
> (starting from [1]):
> 
> 4+5-!-)DCBA      [1]
> -!-(4+5)DCBA     <-- point jumps
> -!-*(4+5)DCBA
> 9-!-*(4+5)DCBA

Not exactly.  I didn't say anything about point or cursor location.
UAX#9 does not specify where to put the cursor and how it should move
during text insertion, and different implementations do it differently
for various reasons, some valid, some less so.  (There are two equally
``correct'' locations of the cursor, because buffer position changes
non-linearly with screen position, and "between two adjacent
characters" is no longer well defined.)

I didn't yet implement in Emacs anything beyond basic logical-order
cursor motion, whereby C-f moves to the next character in the logical
order.  I expect some quite heated debates regarding this, when the
time comes.  But for now I'm deliberately ignoring this issue, because
it's not a fundamental one.  It's a usability and UI issue, and all I
care at this point is to provide enough infrastructure to implement
any behavior we will want (and probably more than one) when the time
comes.

Coming back to the example, cursor motion is not important here.
Assume that this text comes from a file, where you have ABCD(4+5)*9 in
logical order.  The way this will be displayed depends on the
properties of the characters.  The key reason for the fact that * and
9 are to the left of the (4+5) is that (, ), and + are all "neutral"
characters, in UAX#9 parlance, while * is a "weak separator"
character.  That, and the fact that numbers get higher resolved levels
than the surrounding text, see 3.3.5 in UAX#9.  That's why "*9" is not
rendered to the right of "(4+5)".

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: How to recognize keyboard insertion?
  2009-11-01  5:24                     ` David De La Harpe Golden
@ 2009-11-01 19:59                       ` Eli Zaretskii
  2009-11-01 20:19                         ` David De La Harpe Golden
  0 siblings, 1 reply; 28+ messages in thread
From: Eli Zaretskii @ 2009-11-01 19:59 UTC (permalink / raw)
  To: David De La Harpe Golden; +Cc: emacs-devel

> Date: Sun, 01 Nov 2009 05:24:44 +0000
> From: David De La Harpe Golden <david@harpegolden.net>
> CC: emacs-devel@gnu.org
> 
> If OTOH you wanted to find out whether the code a keypress returns under 
> the current os layout actually corresponds to the glyph printed on the 
> keyboard, you can't really

Isn't there some API to get the key's symbol, rather than the
character it produced?




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: How to recognize keyboard insertion?
  2009-11-01 18:48           ` Eli Zaretskii
@ 2009-11-01 20:09             ` David De La Harpe Golden
  2009-11-02  5:03               ` tomas
  0 siblings, 1 reply; 28+ messages in thread
From: David De La Harpe Golden @ 2009-11-01 20:09 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: tomas, emacs-devel

Eli Zaretskii wrote:

> Maybe so, but UAX#9 treats European digits and Arabic digits
> differently.  They have different bidirectional properties.

The eastern arabic digits U+06F0 to U+06F9 and usual western digits 
U+0030 to U+0039 do have separate code point ranges.  Apparently
the former code points are less used - the arabic keyboard layout 
returns the western codes for the 0-9 keypresses.

gtk+/pango seems to be choosing to merely _display_ the latter with the 
glyphs of the former depending on surrounding language.
Based on post dug up here http://markmail.org/message/72on34u7nupadioh
they might be availing of a "higher level protocol" freedom granted
to them. Or something.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: How to recognize keyboard insertion?
  2009-11-01 19:59                       ` Eli Zaretskii
@ 2009-11-01 20:19                         ` David De La Harpe Golden
  0 siblings, 0 replies; 28+ messages in thread
From: David De La Harpe Golden @ 2009-11-01 20:19 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii wrote:
>> Date: Sun, 01 Nov 2009 05:24:44 +0000
>> From: David De La Harpe Golden <david@harpegolden.net>
>> CC: emacs-devel@gnu.org
>>
>> If OTOH you wanted to find out whether the code a keypress returns under 
>> the current os layout actually corresponds to the glyph printed on the 
>> keyboard, you can't really
> 
> Isn't there some API to get the key's symbol, rather than the
> character it produced?

There are only properties of the logical os keyboard layout available, 
not the physical keyboard. If I set my os keyboard layout to "US", 
there's presently no way for the computer to interrogate my keyboard to 
find out I really have "£" printed above 3 not "#" despite my os 
keyboard layout setting. It'll just have to take my word for it that 
I've got a US keyboard.  Keyboards just don't say "I am physically 
british layout" down the wire to the computer.  They easily could and 
probably should, but don't.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: How to recognize keyboard insertion?
  2009-11-01 20:09             ` David De La Harpe Golden
@ 2009-11-02  5:03               ` tomas
  0 siblings, 0 replies; 28+ messages in thread
From: tomas @ 2009-11-02  5:03 UTC (permalink / raw)
  To: David De La Harpe Golden; +Cc: Eli Zaretskii, tomas, emacs-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sun, Nov 01, 2009 at 08:09:19PM +0000, David De La Harpe Golden wrote:
> Eli Zaretskii wrote:
>
>
>> Maybe so, but UAX#9 treats European digits and Arabic digits
>> differently.  They have different bidirectional properties.

Thanks for the info. Every day something new, I guess :-)

> The eastern arabic digits U+06F0 to U+06F9 and usual western digits U+0030 
> to U+0039 do have separate code point ranges.  Apparently
> the former code points are less used - the arabic keyboard layout returns 
> the western codes for the 0-9 keypresses.
>
> gtk+/pango seems to be choosing to merely _display_ the latter with the 
> glyphs of the former depending on surrounding language.
[...]

Yes, that's how I imagined things work, from my limited experience.
Seems my view was a bit naïve.

Thanks for the insights
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFK7mgtBcgs9XrR2kYRAkURAJ4vEqgx7PQdJz2Z+8xBklx5Bi3EGQCdH+09
KKdXu41zDoubLwjP1Q13Yz8=
=dJrR
-----END PGP SIGNATURE-----




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: How to recognize keyboard insertion?
       [not found]     ` <837huac8gg.fsf@gnu.org>
@ 2009-11-02 14:49       ` Ehud Karni
  2009-11-02 19:02         ` Eli Zaretskii
  0 siblings, 1 reply; 28+ messages in thread
From: Ehud Karni @ 2009-11-02 14:49 UTC (permalink / raw)
  To: eliz; +Cc: emacs-devel

On Sun, 01 Nov 2009 22:10:23 Eli Zaretskii wrote:
>
> Ehud, I'd appreciate your opinion on this matter.

I read all the messages in the thread from the beginning.

First, I want to remind that UAX#9 only deal with converting logical
order to visual order, and not how to create the "Logical" text.

I think that we should separate the ordering for display (your code)
from getting the input (input method or keyboard layout).

It seems that both Microsoft and the Xorg developers decided to use
mirroring for Hebrew keyboard (see /usr/share/X11/xkb/symbols/il).

So if the user uses an external "input method" (i.e. keyboard map)
the 4 pairs - () [] {} <>, are already mirrored.  If she prefers to
use an Emacs internal input method (like I use with my hebeng.el)
the mirroring should be an option.

Ehud

BTW. The keying of the RTL text and arithmetic expression discussed
     previously on the thread, Typing (from left to right):
     "A B C D ( 4 + 5 ) * 9"  results in:  9*)4+5(DCBA
     That is because of parens mirroring (at the keyboard).

--
 Ehud Karni           Tel: +972-3-7966-561  /"\
 Mivtach - Simon      Fax: +972-3-7976-561  \ /  ASCII Ribbon Campaign
 Insurance agencies   (USA) voice mail and   X   Against   HTML   Mail
 http://www.mvs.co.il  FAX:  1-815-5509341  / \
 GnuPG: 98EA398D <http://www.keyserver.net/>    Better Safe Than Sorry

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: How to recognize keyboard insertion?
  2009-11-02 14:49       ` Ehud Karni
@ 2009-11-02 19:02         ` Eli Zaretskii
  0 siblings, 0 replies; 28+ messages in thread
From: Eli Zaretskii @ 2009-11-02 19:02 UTC (permalink / raw)
  To: ehud; +Cc: emacs-devel

> Date: Mon, 2 Nov 2009 16:49:46 +0200
> From: "Ehud Karni" <ehud@unix.mvs.co.il>
> Cc: emacs-devel@gnu.org
> 
> It seems that both Microsoft and the Xorg developers decided to use
> mirroring for Hebrew keyboard (see /usr/share/X11/xkb/symbols/il).
> 
> So if the user uses an external "input method" (i.e. keyboard map)
> the 4 pairs - () [] {} <>, are already mirrored.  If she prefers to
> use an Emacs internal input method (like I use with my hebeng.el)
> the mirroring should be an option.

I tend to agree.  The automatic mirroring is not 100% correct, but
it's probably right 80% of the time, and the rest could be fixed by
introducing a command to mirror the character at point.

> BTW. The keying of the RTL text and arithmetic expression discussed
>      previously on the thread, Typing (from left to right):
>      "A B C D ( 4 + 5 ) * 9"  results in:  9*)4+5(DCBA
>      That is because of parens mirroring (at the keyboard).

Yes, that's exactly one manifestation of why the automatic mirroring
is wrong: it assumes too much about the application which will get
this input.




^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2009-11-02 19:02 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-31 15:57 How to recognize keyboard insertion? Eli Zaretskii
2009-10-31 16:58 ` David De La Harpe Golden
2009-10-31 17:20   ` Eli Zaretskii
2009-10-31 17:37     ` David De La Harpe Golden
2009-10-31 17:43       ` David De La Harpe Golden
2009-10-31 18:15       ` Eli Zaretskii
2009-10-31 19:26         ` David De La Harpe Golden
2009-10-31 20:01           ` Eli Zaretskii
2009-10-31 20:42             ` David De La Harpe Golden
2009-10-31 21:23               ` Eli Zaretskii
2009-10-31 21:49                 ` David De La Harpe Golden
2009-11-01  3:44                   ` Eli Zaretskii
2009-11-01  5:24                     ` David De La Harpe Golden
2009-11-01 19:59                       ` Eli Zaretskii
2009-11-01 20:19                         ` David De La Harpe Golden
2009-11-01  3:40           ` Stephen J. Turnbull
2009-11-01  5:46             ` David De La Harpe Golden
2009-11-01  5:44         ` tomas
2009-11-01 18:48           ` Eli Zaretskii
2009-11-01 20:09             ` David De La Harpe Golden
2009-11-02  5:03               ` tomas
2009-11-01  1:30     ` Jason Rumney
2009-11-01  4:02       ` Eli Zaretskii
2009-11-01  5:25         ` Stephen J. Turnbull
2009-11-01 13:59           ` David De La Harpe Golden
2009-11-01 19:57           ` Eli Zaretskii
     [not found]     ` <837huac8gg.fsf@gnu.org>
2009-11-02 14:49       ` Ehud Karni
2009-11-02 19:02         ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).