bidi and shaping problems in describe-input-method

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* bidi and shaping problems in describe-input-method
@ 2012-03-06 22:17 Mohsen BANAN
  2012-03-07  4:05 ` Eli Zaretskii
  2012-03-08  4:30 ` Miles Bader
  0 siblings, 2 replies; 38+ messages in thread
From: Mohsen BANAN @ 2012-03-06 22:17 UTC (permalink / raw)
  To: emacs-devel

There are two minor problems in
describe-input-method which I think we can easily
fix.

The first problem is bidi related:

Try:
(describe-input-method 'arabic)
and then try:
(describe-input-method 'hebrew)

In the case of 'arabic note how the entire
keyboard is flipped to the right.

The second problem is shaping related:

Inside of a cell on the keyboard layout, when
there are two characters that can be joined, they
are joined -- be default. They should not be.

Consider for example غإ which should have been غ‌إ
instead.

The fix can involve inserting a (ucs-insert 8204)‌
between the two characters in each cell. That is a
ZERO WIDTH NON-JOINER.

I can help with this and work with maintainer of
describe-input-method to add the above changes.

Good input methods help/documentation for bidi
languages are important when emacs24 comes out.

Thanks.

...Mohsen

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: bidi and shaping problems in describe-input-method
  2012-03-06 22:17 bidi and shaping problems in describe-input-method Mohsen BANAN
@ 2012-03-07  4:05 ` Eli Zaretskii
  2012-03-07 18:49   ` Eli Zaretskii
  2012-03-07 21:32   ` Mohsen BANAN
  2012-03-08  4:30 ` Miles Bader
  1 sibling, 2 replies; 38+ messages in thread
From: Eli Zaretskii @ 2012-03-07  4:05 UTC (permalink / raw)
  To: Mohsen BANAN; +Cc: emacs-devel

> From: Mohsen BANAN <list-general@mohsen.1.banan.byname.net>
> Date: Tue, 06 Mar 2012 14:17:41 -0800
> 
> Try:
> (describe-input-method 'arabic)
> and then try:
> (describe-input-method 'hebrew)
> 
> In the case of 'arabic note how the entire
> keyboard is flipped to the right.

That's easy to fix.

> The second problem is shaping related:
> 
> Inside of a cell on the keyboard layout, when
> there are two characters that can be joined, they
> are joined -- be default. They should not be.

How can one know when they should be joined and when not?



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: bidi and shaping problems in describe-input-method
  2012-03-07  4:05 ` Eli Zaretskii
@ 2012-03-07 18:49   ` Eli Zaretskii
  2012-03-07 21:32   ` Mohsen BANAN
  1 sibling, 0 replies; 38+ messages in thread
From: Eli Zaretskii @ 2012-03-07 18:49 UTC (permalink / raw)
  To: list-general; +Cc: emacs-devel

> Date: Wed, 07 Mar 2012 06:05:19 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: emacs-devel@gnu.org
> 
> > From: Mohsen BANAN <list-general@mohsen.1.banan.byname.net>
> > Date: Tue, 06 Mar 2012 14:17:41 -0800
> > 
> > Try:
> > (describe-input-method 'arabic)
> > and then try:
> > (describe-input-method 'hebrew)
> > 
> > In the case of 'arabic note how the entire
> > keyboard is flipped to the right.
> 
> That's easy to fix.

Done.



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: bidi and shaping problems in describe-input-method
  2012-03-07  4:05 ` Eli Zaretskii
  2012-03-07 18:49   ` Eli Zaretskii
@ 2012-03-07 21:32   ` Mohsen BANAN
  2012-03-08 15:30     ` Kenichi Handa
  1 sibling, 1 reply; 38+ messages in thread
From: Mohsen BANAN @ 2012-03-07 21:32 UTC (permalink / raw)
  To: emacs-devel

>>>>> On Wed, 07 Mar 2012 06:05:19 +0200, Eli Zaretskii <eliz@gnu.org> said:
  >> From: Mohsen BANAN <list-general@mohsen.1.banan.byname.net>

  Mohsen> In the case of 'arabic note how the entire
  Mohsen> keyboard is flipped to the right.

  Eli> That's easy to fix.

Great! Thanks for having taken care of that.

  Mohsen> The second problem is shaping related:
  Mohsen> 
  Mohsen> Inside of a cell on the keyboard layout, when
  Mohsen> there are two characters that can be joined, they
  Mohsen> are joined -- be default. They should not be.

  Eli> How can one know when they should be joined and when not?

I think the simple answer is: always isolated -- never joined.

For Persian and Arabic I am sure that they should
never be joined -- always isolated. 

For other shaped languages, it is hard to imagine
an input method designer would ever want them joined.

For non-shaped languages (e.g., latin keyboards)
the insertion of an zero width non-joiner between
lower and upper case is harmless and invisible.

So, the simplest fix (and perhaps
the-right-thing-to-do) is to ALWAYS insert a
(ucs-insert 8204)‌ -- zero width non-joiner --
between the two characters in each and every
keyboard cell.

Thanks,

...Mohsen

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: bidi and shaping problems in describe-input-method
  2012-03-06 22:17 bidi and shaping problems in describe-input-method Mohsen BANAN
  2012-03-07  4:05 ` Eli Zaretskii
@ 2012-03-08  4:30 ` Miles Bader
  1 sibling, 0 replies; 38+ messages in thread
From: Miles Bader @ 2012-03-08  4:30 UTC (permalink / raw)
  To: Mohsen BANAN; +Cc: emacs-devel

Incidentally, how does one enable `quail-show-keyboard-layout' support for a
given input method?

I think it would be super useful for e.g. `korean-hangul', but 

   (activate-input-method 'korean-hangul)
   (quail-show-keyboard-layout)

yields nonsense...

Thanks,

-miles

-- 
Sabbath, n. A weekly festival having its origin in the fact that God made the
world in six days and was arrested on the seventh.



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: bidi and shaping problems in describe-input-method
  2012-03-07 21:32   ` Mohsen BANAN
@ 2012-03-08 15:30     ` Kenichi Handa
  2012-03-08 18:24       ` Eli Zaretskii
  2012-03-08 18:30       ` Eli Zaretskii
  0 siblings, 2 replies; 38+ messages in thread
From: Kenichi Handa @ 2012-03-08 15:30 UTC (permalink / raw)
  To: Mohsen BANAN; +Cc: eliz, emacs-devel

In article <yx262eg9jxk.fsf@mohsen.1.banan.byname.net>, Mohsen BANAN <list-general@mohsen.1.banan.byname.net> writes:

  Mohsen> The second problem is shaping related:
  Mohsen> 
  Mohsen> Inside of a cell on the keyboard layout, when
  Mohsen> there are two characters that can be joined, they
  Mohsen> are joined -- be default. They should not be.

  Eli> How can one know when they should be joined and when not?

> I think the simple answer is: always isolated -- never joined.

> For Persian and Arabic I am sure that they should
> never be joined -- always isolated. 

Sure.

> For other shaped languages, it is hard to imagine
> an input method designer would ever want them joined.

I agree.

> For non-shaped languages (e.g., latin keyboards)
> the insertion of an zero width non-joiner between
> lower and upper case is harmless and invisible.

> So, the simplest fix (and perhaps
> the-right-thing-to-do) is to ALWAYS insert a
> (ucs-insert 8204)‌ -- zero width non-joiner --
> between the two characters in each and every
> keyboard cell.

If we insert something unconditionally, I think inserting
(propertize " " 'invisible t) is safer.  It should work on
tty terminal too.

By the way, for this bug:

  Mohsen> In the case of 'arabic note how the entire
  Mohsen> keyboard is flipped to the right.

just setting bidi-paragraph-direction to 'left-to-right is
not enough, because keyboard cells in a row are still
re-ordered.  For this, the easiest fix is to set
bidi-display-reordering to nil.  But, then we can't use
actual Arabic and Hebrew words in the docstrings of those
input methods.  What we want is to display bidi reordering
only for the keyboard layout part.  Eli, don't you have any
good idea?

---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: bidi and shaping problems in describe-input-method
  2012-03-08 15:30     ` Kenichi Handa
@ 2012-03-08 18:24       ` Eli Zaretskii
  2012-03-08 23:48         ` Kenichi Handa
  2012-03-08 18:30       ` Eli Zaretskii
  1 sibling, 1 reply; 38+ messages in thread
From: Eli Zaretskii @ 2012-03-08 18:24 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: list-general, emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> Cc: emacs-devel@gnu.org, eliz@gnu.org
> Date: Fri, 09 Mar 2012 00:30:25 +0900
> 
> By the way, for this bug:
> 
>   Mohsen> In the case of 'arabic note how the entire
>   Mohsen> keyboard is flipped to the right.
> 
> just setting bidi-paragraph-direction to 'left-to-right is
> not enough, because keyboard cells in a row are still
> re-ordered.

Right.  I didn't notice it because I don't read Arabic.

> For this, the easiest fix is to set bidi-display-reordering to nil.
> But, then we can't use actual Arabic and Hebrew words in the
> docstrings of those input methods.  What we want is to display bidi
> reordering only for the keyboard layout part.  Eli, don't you have
> any good idea?

Revision 107535 is the best I can do.  I'll let Mohsen judge if it's
good enough.



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: bidi and shaping problems in describe-input-method
  2012-03-08 15:30     ` Kenichi Handa
  2012-03-08 18:24       ` Eli Zaretskii
@ 2012-03-08 18:30       ` Eli Zaretskii
  2012-03-08 18:53         ` Eli Zaretskii
  2012-03-08 23:19         ` Kenichi Handa
  1 sibling, 2 replies; 38+ messages in thread
From: Eli Zaretskii @ 2012-03-08 18:30 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: list-general, emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> Cc: emacs-devel@gnu.org, eliz@gnu.org
> Date: Fri, 09 Mar 2012 00:30:25 +0900
> 
> > So, the simplest fix (and perhaps
> > the-right-thing-to-do) is to ALWAYS insert a
> > (ucs-insert 8204)‌ -- zero width non-joiner --
> > between the two characters in each and every
> > keyboard cell.
> 
> If we insert something unconditionally, I think inserting
> (propertize " " 'invisible t) is safer.

Unfortunately, this doesn't work: invisible characters are not handed
to the shaping engine, they are silently skipped by the display
engine.  So the characters are still joined.

We need something smarter here.  I'll let you and Mohsen find the
solution to this one.




^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: bidi and shaping problems in describe-input-method
  2012-03-08 18:30       ` Eli Zaretskii
@ 2012-03-08 18:53         ` Eli Zaretskii
  2012-03-08 23:27           ` Kenichi Handa
  2012-03-08 23:19         ` Kenichi Handa
  1 sibling, 1 reply; 38+ messages in thread
From: Eli Zaretskii @ 2012-03-08 18:53 UTC (permalink / raw)
  To: handa; +Cc: list-general, emacs-devel

> Date: Thu, 08 Mar 2012 20:30:51 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: list-general@mohsen.1.banan.byname.net, emacs-devel@gnu.org
> 
> > If we insert something unconditionally, I think inserting
> > (propertize " " 'invisible t) is safer.
> 
> Unfortunately, this doesn't work: invisible characters are not handed
> to the shaping engine, they are silently skipped by the display
> engine.  So the characters are still joined.
> 
> We need something smarter here.

One obvious possibility is to turn off auto-composition-mode.  But
when I tried that, unexpected characters showed up in some cells,
e.g. in the T cell an in the G cell.  I guess some characters shown in
the Arabic keyboard layout do need auto-composition-mode?



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: bidi and shaping problems in describe-input-method
  2012-03-08 18:30       ` Eli Zaretskii
  2012-03-08 18:53         ` Eli Zaretskii
@ 2012-03-08 23:19         ` Kenichi Handa
  2012-03-09  8:15           ` Eli Zaretskii
  1 sibling, 1 reply; 38+ messages in thread
From: Kenichi Handa @ 2012-03-08 23:19 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: list-general, emacs-devel

In article <83haxz0wtg.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> > If we insert something unconditionally, I think inserting
> > (propertize " " 'invisible t) is safer.

> Unfortunately, this doesn't work: invisible characters are not handed
> to the shaping engine, they are silently skipped by the display
> engine.  So the characters are still joined.

No, the shaping engine checks buffer/string contents.  So,
if there's a space between A and B, the rule for shaping AB
sequence is not activated.  Please try these two:

(insert #x642 #x64C)
(insert #x642 (propertize " " 'invisible t) #x64C)

---
Kenichi Handa
handa@m17n.org



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: bidi and shaping problems in describe-input-method
  2012-03-08 18:53         ` Eli Zaretskii
@ 2012-03-08 23:27           ` Kenichi Handa
  0 siblings, 0 replies; 38+ messages in thread
From: Kenichi Handa @ 2012-03-08 23:27 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: list-general, emacs-devel

In article <83fwdi2ac3.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> One obvious possibility is to turn off auto-composition-mode.  But
> when I tried that, unexpected characters showed up in some cells,
> e.g. in the T cell an in the G cell.  I guess some characters shown in
> the Arabic keyboard layout do need auto-composition-mode?

Yes.  Some keys insert two characters that are composed into
one glyph.

---
Kenichi Handa
handa@m17n.org



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: bidi and shaping problems in describe-input-method
  2012-03-08 18:24       ` Eli Zaretskii
@ 2012-03-08 23:48         ` Kenichi Handa
  2012-03-09  8:11           ` Eli Zaretskii
  2012-03-09  8:17           ` Eli Zaretskii
  0 siblings, 2 replies; 38+ messages in thread
From: Kenichi Handa @ 2012-03-08 23:48 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: list-general, emacs-devel

In article <83ipif0x46.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> > just setting bidi-paragraph-direction to 'left-to-right is
> > not enough, because keyboard cells in a row are still
> > re-ordered.

> Right.  I didn't notice it because I don't read Arabic.

That's re-ordering happens for Hebrew input method too. :-p

> > For this, the easiest fix is to set bidi-display-reordering to nil.
> > But, then we can't use actual Arabic and Hebrew words in the
> > docstrings of those input methods.  What we want is to display bidi
> > reordering only for the keyboard layout part.  Eli, don't you have
> > any good idea?

> Revision 107535 is the best I can do.  I'll let Mohsen judge if it's
> good enough.

If possible, I'd like to avoid inserting LRM
unconditionally.  Is it possible to have this kind of
function?

(defun quail-help-require-LRM (char)
   (or (eq (get-char-code-property char 'bidi-class) 'L)
       ...))

Then, we can use it in quail-insert-kbd-layout as below:

   (if (quail-help-require-LRM (if (stringp lower) (aref lower 0) lower))
       (insert #x200e))
   (insert lower)
   (if (quail-help-require-LRM (if (stringp upper) (aref upper 0) upper))
       (insert #x200e))
   (insert upper)

---
Kenichi Handa
handa@m17n.org



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: bidi and shaping problems in describe-input-method
  2012-03-08 23:48         ` Kenichi Handa
@ 2012-03-09  8:11           ` Eli Zaretskii
  2012-03-09 14:03             ` Kenichi Handa
  2012-03-09  8:17           ` Eli Zaretskii
  1 sibling, 1 reply; 38+ messages in thread
From: Eli Zaretskii @ 2012-03-09  8:11 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: list-general, emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> Cc: list-general@mohsen.1.banan.byname.net, emacs-devel@gnu.org
> Date: Fri, 09 Mar 2012 08:48:35 +0900
> 
> If possible, I'd like to avoid inserting LRM unconditionally.

Why?  They are invisible, so they are not displayed at all.

> Is it possible to have this kind of function?
> 
> (defun quail-help-require-LRM (char)
>    (or (eq (get-char-code-property char 'bidi-class) 'L)
>        ...))

It's possible, but why bother?  And with this function you will insert
the LRM for many characters that don't need that, like punctuation,
numbers, etc.

Also, `lower' and `upper' could be strings, in which case you need a
more complex test.



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: bidi and shaping problems in describe-input-method
  2012-03-08 23:19         ` Kenichi Handa
@ 2012-03-09  8:15           ` Eli Zaretskii
  2012-03-09  9:01             ` Juanma Barranquero
  2012-03-09 13:54             ` Kenichi Handa
  0 siblings, 2 replies; 38+ messages in thread
From: Eli Zaretskii @ 2012-03-09  8:15 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: list-general, emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> Cc: list-general@mohsen.1.banan.byname.net, emacs-devel@gnu.org
> Date: Fri, 09 Mar 2012 08:19:20 +0900
> 
> No, the shaping engine checks buffer/string contents.  So,
> if there's a space between A and B, the rule for shaping AB
> sequence is not activated.  Please try these two:
> 
> (insert #x642 #x64C)
> (insert #x642 (propertize " " 'invisible t) #x64C)

This looks exactly identical to me (on MS-Windows), except that the
second one causes annoying behavior of cursor motion around the
inserted text.

Does it work for you on GNU/Linux?  If so, does it work for you to
change quail-insert-kbd-layout to use this trick in order to separate
the `lower' from the `upper' in the key cells?  I tried that on my
machine, and it didn't have the desired effect.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: bidi and shaping problems in describe-input-method
  2012-03-08 23:48         ` Kenichi Handa
  2012-03-09  8:11           ` Eli Zaretskii
@ 2012-03-09  8:17           ` Eli Zaretskii
  1 sibling, 0 replies; 38+ messages in thread
From: Eli Zaretskii @ 2012-03-09  8:17 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: list-general, emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> Cc: list-general@mohsen.1.banan.byname.net, emacs-devel@gnu.org
> Date: Fri, 09 Mar 2012 08:48:35 +0900
> 
> In article <83ipif0x46.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:
> 
> > > just setting bidi-paragraph-direction to 'left-to-right is
> > > not enough, because keyboard cells in a row are still
> > > re-ordered.
> 
> > Right.  I didn't notice it because I don't read Arabic.
> 
> That's re-ordering happens for Hebrew input method too. :-p

But it's all but impossible to notice it there, because it only
happens for a few key cells in the middle of a boring keyboard layout.
You cannot see it unless you move the cursor past those cells.



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: bidi and shaping problems in describe-input-method
  2012-03-09  8:15           ` Eli Zaretskii
@ 2012-03-09  9:01             ` Juanma Barranquero
  2012-03-09  9:45               ` Eli Zaretskii
  2012-03-09 13:54             ` Kenichi Handa
  1 sibling, 1 reply; 38+ messages in thread
From: Juanma Barranquero @ 2012-03-09  9:01 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, list-general, Kenichi Handa

[-- Attachment #1: Type: text/plain, Size: 304 bytes --]

On Fri, Mar 9, 2012 at 09:15, Eli Zaretskii <eliz@gnu.org> wrote:

> This looks exactly identical to me (on MS-Windows), except that the
> second one causes annoying behavior of cursor motion around the
> inserted text.

It does not look identical to me on W7. See attached image.

    Juanma

[-- Attachment #2: bug.png --]
[-- Type: image/png, Size: 2205 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: bidi and shaping problems in describe-input-method
  2012-03-09  9:01             ` Juanma Barranquero
@ 2012-03-09  9:45               ` Eli Zaretskii
  2012-03-09 10:02                 ` Eli Zaretskii
  2012-03-09 11:19                 ` Juanma Barranquero
  0 siblings, 2 replies; 38+ messages in thread
From: Eli Zaretskii @ 2012-03-09  9:45 UTC (permalink / raw)
  To: Juanma Barranquero; +Cc: emacs-devel, list-general, handa

> From: Juanma Barranquero <lekktu@gmail.com>
> Date: Fri, 9 Mar 2012 10:01:31 +0100
> Cc: Kenichi Handa <handa@m17n.org>, list-general@mohsen.1.banan.byname.net, 
> 	emacs-devel@gnu.org
> 
> > This looks exactly identical to me (on MS-Windows), except that the
> > second one causes annoying behavior of cursor motion around the
> > inserted text.
> 
> It does not look identical to me on W7. See attached image.

What font is used on your machine to render the #x64C character?  On
my machine it is this:

  uniscribe:-outline-Courier New-normal-normal-normal-mono-13-*-*-*-c-*-iso10646-1 (#x2F2)

If the font doesn't explain that, then perhaps what I see is a bug in
the version of Uniscribe on XP.

Btw, at least on the screenshot you sent, the display of #x64C is
incorrect.  Compare with what you see when you type "C-u C-x =" for
that character.



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: bidi and shaping problems in describe-input-method
  2012-03-09  9:45               ` Eli Zaretskii
@ 2012-03-09 10:02                 ` Eli Zaretskii
  2012-03-09 14:11                   ` Kenichi Handa
  2012-03-09 11:19                 ` Juanma Barranquero
  1 sibling, 1 reply; 38+ messages in thread
From: Eli Zaretskii @ 2012-03-09 10:02 UTC (permalink / raw)
  To: handa; +Cc: lekktu, list-general, emacs-devel

> Date: Fri, 09 Mar 2012 11:45:46 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: emacs-devel@gnu.org, list-general@mohsen.1.banan.byname.net, handa@m17n.org
> 
> Btw, at least on the screenshot you sent, the display of #x64C is
> incorrect.  Compare with what you see when you type "C-u C-x =" for
> that character.

The display in "C-u C-x =" is generated by this snippet (from
descr-text.el):

  (insert (char-code-property-description 'decomposition '(#x64C)))

Somehow, using this produces a correct display of the character
(albeit enclosed in quotes) without any problems.  Perhaps Handa-san
could explain what kind of magic the above does, as compared to simply
inserting the same character into the buffer.  The only sign of magic
I see is this:

  (char-code-property-description 'decomposition '(#x64C))

   => #("'ٌ'" 1 2 (composition ((1 . "	ٌ	"))))

So the string produced by char-code-property-description has the
`composition' text property on the character we want to display.
The value of the text property, in case you wonder, is this:

   ((1 . "\t\x64C\t"))

But how does this countermand the problems is a mystery to me; the
ELisp manual says about the value of this property:

  `composition'
       This text property is used to display a sequence of characters as a
       single glyph composed from components.  But the value of the
       property itself is completely internal to Emacs and should not be
       manipulated directly by, for instance, `put-text-property'.

(A.k.a.: "this is need-to-know only, and you don't need to know".)

Anyway, maybe we could use something like this in generating the
keyboard layouts by quail.el.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: bidi and shaping problems in describe-input-method
  2012-03-09  9:45               ` Eli Zaretskii
  2012-03-09 10:02                 ` Eli Zaretskii
@ 2012-03-09 11:19                 ` Juanma Barranquero
  2012-03-09 11:41                   ` Eli Zaretskii
  1 sibling, 1 reply; 38+ messages in thread
From: Juanma Barranquero @ 2012-03-09 11:19 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: handa, list-general, emacs-devel

On Fri, Mar 9, 2012 at 10:45, Eli Zaretskii <eliz@gnu.org> wrote:

> What font is used on your machine to render the #x64C character?

- In the first case:

             position: 212 of 267 (79%), column: 20
            character: ق‎ (displayed as ق‎) (codepoint 1602, #o3102, #x642)
    preferred charset: unicode (Unicode (ISO10646))
code point in charset: 0x0642
               syntax: w 	which means: word
             category: .:Base, R:Right-to-left (strong), b:Arabic
          buffer code: #xD9 #x82
            file code: #xD9 #x82 (encoded by coding system nil)
              display: composed to form "قٌ" (see below)

Composed with the following character(s) "ٌ" using this font:
  uniscribe:-outline-Courier
New-normal-normal-normal-mono-13-*-*-*-c-*-iso10646-1
by these glyphs:
  [0 1 1602 981 8 1 8 12 4 nil]
  [0 1 1602 754 0 2 6 12 4 [1 -1 0]]

Character code properties: customize what to show
  name: ARABIC LETTER QAF
  general-category: Lo (Letter, Other)
  decomposition: (1602) ('ق')

There are text properties here:
  fontified            t

- In the second one:

             position: 265 of 267 (99%), column: 50
            character: ق‎ (displayed as ق‎) (codepoint 1602, #o3102, #x642)
    preferred charset: unicode (Unicode (ISO10646))
code point in charset: 0x0642
               syntax: w 	which means: word
             category: .:Base, R:Right-to-left (strong), b:Arabic
          buffer code: #xD9 #x82
            file code: #xD9 #x82 (encoded by coding system nil)
              display: by this font (glyph code)
    uniscribe:-outline-Courier
New-normal-normal-normal-mono-13-*-*-*-c-*-iso10646-1 (#x3D5)

Character code properties: customize what to show
  name: ARABIC LETTER QAF
  general-category: Lo (Letter, Other)
  decomposition: (1602) ('ق')

There are text properties here:
  fontified            t

    Juanma



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: bidi and shaping problems in describe-input-method
  2012-03-09 11:19                 ` Juanma Barranquero
@ 2012-03-09 11:41                   ` Eli Zaretskii
  2012-03-09 14:56                     ` Juanma Barranquero
  0 siblings, 1 reply; 38+ messages in thread
From: Eli Zaretskii @ 2012-03-09 11:41 UTC (permalink / raw)
  To: Juanma Barranquero; +Cc: handa, list-general, emacs-devel

> From: Juanma Barranquero <lekktu@gmail.com>
> Date: Fri, 9 Mar 2012 12:19:41 +0100
> Cc: emacs-devel@gnu.org, list-general@mohsen.1.banan.byname.net, 
> 	handa@m17n.org
> 
> On Fri, Mar 9, 2012 at 10:45, Eli Zaretskii <eliz@gnu.org> wrote:
> 
> > What font is used on your machine to render the #x64C character?
> 
> - In the first case:
> 
>              position: 212 of 267 (79%), column: 20
>             character: ق‎ (displayed as ق‎) (codepoint 1602, #o3102, #x642)
>     preferred charset: unicode (Unicode (ISO10646))
> code point in charset: 0x0642
>                syntax: w 	which means: word
>              category: .:Base, R:Right-to-left (strong), b:Arabic
>           buffer code: #xD9 #x82
>             file code: #xD9 #x82 (encoded by coding system nil)
>               display: composed to form "قٌ" (see below)
> 
> Composed with the following character(s) "ٌ" using this font:
>   uniscribe:-outline-Courier
> New-normal-normal-normal-mono-13-*-*-*-c-*-iso10646-1
> by these glyphs:
>   [0 1 1602 981 8 1 8 12 4 nil]
>   [0 1 1602 754 0 2 6 12 4 [1 -1 0]]
> 
> Character code properties: customize what to show
>   name: ARABIC LETTER QAF
>   general-category: Lo (Letter, Other)
>   decomposition: (1602) ('ق')
> 
> There are text properties here:
>   fontified            t
> 
> - In the second one:
> 
>              position: 265 of 267 (99%), column: 50
>             character: ق‎ (displayed as ق‎) (codepoint 1602, #o3102, #x642)
>     preferred charset: unicode (Unicode (ISO10646))
> code point in charset: 0x0642
>                syntax: w 	which means: word
>              category: .:Base, R:Right-to-left (strong), b:Arabic
>           buffer code: #xD9 #x82
>             file code: #xD9 #x82 (encoded by coding system nil)
>               display: by this font (glyph code)
>     uniscribe:-outline-Courier
> New-normal-normal-normal-mono-13-*-*-*-c-*-iso10646-1 (#x3D5)
> 
> Character code properties: customize what to show
>   name: ARABIC LETTER QAF
>   general-category: Lo (Letter, Other)
>   decomposition: (1602) ('ق')

I asked about the #x64C character (1612), not about #x642.  The latter
is displayed just fine here, it's the former that causes some kind of
trouble.

Thanks.




^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: bidi and shaping problems in describe-input-method
  2012-03-09  8:15           ` Eli Zaretskii
  2012-03-09  9:01             ` Juanma Barranquero
@ 2012-03-09 13:54             ` Kenichi Handa
  2012-03-09 16:15               ` Eli Zaretskii
  1 sibling, 1 reply; 38+ messages in thread
From: Kenichi Handa @ 2012-03-09 13:54 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: list-general, emacs-devel

In article <83boo61972.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> > (insert #x642 #x64C)
> > (insert #x642 (propertize " " 'invisible t) #x64C)

> This looks exactly identical to me (on MS-Windows), except that the
> second one causes annoying behavior of cursor motion around the
> inserted text.

> Does it work for you on GNU/Linux?

Yes.

> If so, does it work for you to change
> quail-insert-kbd-layout to use this trick in order to
> separate the `lower' from the `upper' in the key cells?

Yes.  But it depends on the font selected for arabic and the
shaping engine for that font.  Some shapers display a glyph
for an independent combining character with dotted circle (if
the width of the glyph is zero).

The better result is done by this:

(insert #x642 (compose-string "\x64C" 0 1 "\t\x64C\t"))

I tried it with 4 fonts on GNU/Linux and all were ok.

> I tried that on my machine, and it didn't have the desired
> effect.

Please try above.

---
Kenichi Handa
handa@m17n.org



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: bidi and shaping problems in describe-input-method
  2012-03-09  8:11           ` Eli Zaretskii
@ 2012-03-09 14:03             ` Kenichi Handa
  2012-03-09 16:12               ` Eli Zaretskii
  0 siblings, 1 reply; 38+ messages in thread
From: Kenichi Handa @ 2012-03-09 14:03 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: list-general, emacs-devel

In article <83d38m19dk.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> > If possible, I'd like to avoid inserting LRM unconditionally.

> Why?  They are invisible, so they are not displayed at all.

In general, it's smarter to use LRM only where necessary.
And when one cut&paste the keyboard layout (or some part of
it) of l2r characters, he will be surprized by LRM
characters.

> > Is it possible to have this kind of function?
> > 
> > (defun quail-help-require-LRM (char)
> >    (or (eq (get-char-code-property char 'bidi-class) 'L)
> >        ...))

> It's possible, but why bother?  And with this function you will insert
> the LRM for many characters that don't need that, like punctuation,
> numbers, etc.

??? I want a function that returns t only for a character
that require preceding LRM in the keyboard layout.

> Also, `lower' and `upper' could be strings, in which case you need a
> more complex test.

We can give (if (string lower) (aref lower 0) lower) to that
function.

---
Kenichi Handa
handa@m17n.org




^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: bidi and shaping problems in describe-input-method
  2012-03-09 10:02                 ` Eli Zaretskii
@ 2012-03-09 14:11                   ` Kenichi Handa
  0 siblings, 0 replies; 38+ messages in thread
From: Kenichi Handa @ 2012-03-09 14:11 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: lekktu, list-general, emacs-devel

In article <83399i149j.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:
>   (insert (char-code-property-description 'decomposition '(#x64C)))

> Somehow, using this produces a correct display of the character
> (albeit enclosed in quotes) without any problems.  Perhaps Handa-san
> could explain what kind of magic the above does, as compared to simply
> inserting the same character into the buffer.  The only sign of magic
> I see is this:

>   (char-code-property-description 'decomposition '(#x64C))

>>> #("'ٌ'" 1 2 (composition ((1 . "	ٌ	"))))

Yes, that function inserts a static composition that uses
this magic (excerpt from the docstring of compose-region):

------------------------------------------------------------
If it is a string, the elements are alternate characters.  In
this case, TAB element has a special meaning.  If the first
character is TAB, the glyphs are displayed with left padding space
so that no pixel overlaps with the previous column.  If the last
character is TAB, the glyphs are displayed with right padding
space so that no pixel overlaps with the following column.
------------------------------------------------------------

And if there's a static composition, the automatic (dynamic)
composition is surpressed.

> Anyway, maybe we could use something like this in generating the
> keyboard layouts by quail.el.

I agree.  I'll commit a proper fix soon.

---
Kenichi Handa
handa@m17n.org



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: bidi and shaping problems in describe-input-method
  2012-03-09 11:41                   ` Eli Zaretskii
@ 2012-03-09 14:56                     ` Juanma Barranquero
  0 siblings, 0 replies; 38+ messages in thread
From: Juanma Barranquero @ 2012-03-09 14:56 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: handa, list-general, emacs-devel

On Fri, Mar 9, 2012 at 12:41, Eli Zaretskii <eliz@gnu.org> wrote:

> I asked about the #x64C character (1612), not about #x642.

Yes, sorry. In the second case, describe-char says that it is a space:

              position: 266 of 267 (99%), column: 51
             character: SPC (displayed as SPC) (codepoint 32, #o40, #x20)
     preferred charset: ascii (ASCII (ISO646 IRV))
 code point in charset: 0x20
                syntax:   	which means: whitespace
              category: .:Base, a:ASCII, l:Latin
           buffer code: #x20
             file code: #x20 (encoded by coding system nil)
               display: by this font (glyph code)
     uniscribe:-outline-Courier
New-normal-normal-normal-mono-13-*-*-*-c-*-iso8859-1 (#x03)

 Character code properties: customize what to show
   name: SPACE
   general-category: Zs (Separator, Space)
   decomposition: (32) (' ')

 There are text properties here:
   fontified            t
   invisible            t



If inserted alone:

             position: 285 of 289 (98%), column: 0
            character: ٌ (displayed as ٌ) (codepoint 1612, #o3114, #x64c)
    preferred charset: unicode (Unicode (ISO10646))
code point in charset: 0x064C
               syntax: w 	which means: word
             category: b:Arabic
          buffer code: #xD9 #x8C
            file code: #xD9 #x8C (encoded by coding system nil)
              display: composed to form "ٌ" (see below)

Composed using this font:
  uniscribe:-outline-Courier
New-normal-normal-normal-mono-13-*-*-*-c-*-iso10646-1
by these glyphs:
  [0 0 1612 2673 8 0 8 12 4 nil]
  [0 0 1612 754 0 2 6 12 4 nil]

Character code properties: customize what to show
  name: ARABIC DAMMATAN
  general-category: Mn (Mark, Nonspacing)
  decomposition: (1612) ('ٌ')

There are text properties here:
  fontified            t



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: bidi and shaping problems in describe-input-method
  2012-03-09 14:03             ` Kenichi Handa
@ 2012-03-09 16:12               ` Eli Zaretskii
  2012-03-10  2:55                 ` Kenichi Handa
  0 siblings, 1 reply; 38+ messages in thread
From: Eli Zaretskii @ 2012-03-09 16:12 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: list-general, emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> Cc: list-general@mohsen.1.banan.byname.net, emacs-devel@gnu.org
> Date: Fri, 09 Mar 2012 23:03:32 +0900
> 
> In article <83d38m19dk.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:
> 
> > > If possible, I'd like to avoid inserting LRM unconditionally.
> 
> > Why?  They are invisible, so they are not displayed at all.
> 
> In general, it's smarter to use LRM only where necessary.

Testing whether they are necessary is a problem in itself.  You can
easily avoid inserting the marks for strong L2R characters, but they
are the minority.  Most of the characters are not in that category.
And of course keyboard layouts include such characters.

> > > (defun quail-help-require-LRM (char)
> > >    (or (eq (get-char-code-property char 'bidi-class) 'L)
> > >        ...))
> 
> > It's possible, but why bother?  And with this function you will insert
> > the LRM for many characters that don't need that, like punctuation,
> > numbers, etc.
> 
> ??? I want a function that returns t only for a character
> that require preceding LRM in the keyboard layout.

Yes, I understand that.  But the test you are suggesting, i.e. avoid
the LRM only for characters whose bidi-class is L, will not catch
numbers, punctuation, and other non-L characters.

> > Also, `lower' and `upper' could be strings, in which case you need a
> > more complex test.
> 
> We can give (if (string lower) (aref lower 0) lower) to that
> function.

But that doesn't DTRT.  Here's an example where it will fail: ".A".

AFAIK, the only reliable way of telling whether a given string will be
reordered is to actually reorder it, and then compare with the
logical-order original.  That's a nuisance, and also the results may
well depend on the characters before and after the string in the
buffer, so you need to know the context in advance, which you normally
don't.

I tried also a different solution: enclose each row of the keyboard
layout in an L2R override embedding, LRO..PDF.  This inserts only 2
control characters per row, and doesn't insert them inside the
keyboard cells, so it is cleaner, I think.  But using this means that
no key description in the layout can be a string that requires
reordering individually.  (By contrast, inserting an LRM between the
lower and the upper key still allows each description to be
reordered.)  Can we live with such a restriction?  I don't know enough
about Quail to tell.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: bidi and shaping problems in describe-input-method
  2012-03-09 13:54             ` Kenichi Handa
@ 2012-03-09 16:15               ` Eli Zaretskii
  0 siblings, 0 replies; 38+ messages in thread
From: Eli Zaretskii @ 2012-03-09 16:15 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: list-general, emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> Cc: list-general@mohsen.1.banan.byname.net, emacs-devel@gnu.org
> Date: Fri, 09 Mar 2012 22:54:53 +0900
> 
> In article <83boo61972.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:
> 
> > > (insert #x642 #x64C)
> > > (insert #x642 (propertize " " 'invisible t) #x64C)
> 
> > This looks exactly identical to me (on MS-Windows), except that the
> > second one causes annoying behavior of cursor motion around the
> > inserted text.
> 
> > Does it work for you on GNU/Linux?
> 
> Yes.
> 
> > If so, does it work for you to change
> > quail-insert-kbd-layout to use this trick in order to
> > separate the `lower' from the `upper' in the key cells?
> 
> Yes.  But it depends on the font selected for arabic and the
> shaping engine for that font.  Some shapers display a glyph
> for an independent combining character with dotted circle (if
> the width of the glyph is zero).

It looks like Uniscribe on Windows, or at least its version supplied
with XP, doesn't live in peace with zero-width combining characters,
which is why I don't see the effect of inserting an invisible space.

> The better result is done by this:
> 
> (insert #x642 (compose-string "\x64C" 0 1 "\t\x64C\t"))

Yes, that's what "C-u C-x =" does, and it works for me as well.



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: bidi and shaping problems in describe-input-method
  2012-03-09 16:12               ` Eli Zaretskii
@ 2012-03-10  2:55                 ` Kenichi Handa
  2012-03-10 10:27                   ` Eli Zaretskii
  0 siblings, 1 reply; 38+ messages in thread
From: Kenichi Handa @ 2012-03-10  2:55 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: list-general, emacs-devel

In article <83pqclzrb5.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> > In general, it's smarter to use LRM only where necessary.

> Testing whether they are necessary is a problem in itself.  You can
> easily avoid inserting the marks for strong L2R characters, but they
> are the minority.  Most of the characters are not in that category.
> And of course keyboard layouts include such characters.

> > > > (defun quail-help-require-LRM (char)
> > > >    (or (eq (get-char-code-property char 'bidi-class) 'L)
> > > >        ...))
> > 
> > > It's possible, but why bother?  And with this function you will insert
> > > the LRM for many characters that don't need that, like punctuation,
> > > numbers, etc.
> > 
> > ??? I want a function that returns t only for a character
> > that require preceding LRM in the keyboard layout.

> Yes, I understand that.  But the test you are suggesting, i.e. avoid
> the LRM only for characters whose bidi-class is L, will not catch
> numbers, punctuation, and other non-L characters.

The function body I wrote is just an idea, not a complete
solution, and of cource checking against L is apparently
a bug.  At least we must check against R (and AL).

> > > Also, `lower' and `upper' could be strings, in which case you need a
> > > more complex test.
> > 
> > We can give (if (string lower) (aref lower 0) lower) to that
> > function.

> But that doesn't DTRT.  Here's an example where it will fail: ".A".

Why?  Keyboard cells in the keyboard layout has typically
this form: (L is for lower key, U is for upper (shifted) key)

... | LU | LU | ...

What we want is to display the left LU to the left of the
right LU, and display each L (character or string) to the
right of the corresponding U.

Even if the L (of the left LU) is ".A", we don't need LRM
for it.  We have to insert LRM only before a character that
may reorder the previous characters, and after a character that
may reorder the following character.  Isn't it right?

> AFAIK, the only reliable way of telling whether a given string will be
> reordered is to actually reorder it, and then compare with the
> logical-order original.  That's a nuisance, and also the results may
> well depend on the characters before and after the string in the
> buffer, so you need to know the context in advance, which you normally
> don't.

> I tried also a different solution: enclose each row of the keyboard
> layout in an L2R override embedding, LRO..PDF.  This inserts only 2
> control characters per row, and doesn't insert them inside the
> keyboard cells, so it is cleaner, I think.  But using this means that
> no key description in the layout can be a string that requires
> reordering individually.  (By contrast, inserting an LRM between the
> lower and the upper key still allows each description to be
> reordered.)  Can we live with such a restriction?  I don't know enough
> about Quail to tell.

As it's possible to assign a string to a key, there will be
the case that the characters in the string must be
reordered.  In the above case, if L is a hebrew "שלום", it
must be reordered.  But, even if we surround that word with
LRE and PDF, the word itself is reordered correctly, right?

---
Kenichi Handa
handa@m17n.org




^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: bidi and shaping problems in describe-input-method
  2012-03-10  2:55                 ` Kenichi Handa
@ 2012-03-10 10:27                   ` Eli Zaretskii
  2012-03-12  7:47                     ` Kenichi Handa
  0 siblings, 1 reply; 38+ messages in thread
From: Eli Zaretskii @ 2012-03-10 10:27 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: list-general, emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> Date: Sat, 10 Mar 2012 11:55:54 +0900
> Cc: list-general@mohsen.1.banan.byname.net, emacs-devel@gnu.org
> 
> The function body I wrote is just an idea, not a complete
> solution, and of cource checking against L is apparently
> a bug.  At least we must check against R (and AL).
> 
> > > > Also, `lower' and `upper' could be strings, in which case you need a
> > > > more complex test.
> > > 
> > > We can give (if (string lower) (aref lower 0) lower) to that
> > > function.
> 
> > But that doesn't DTRT.  Here's an example where it will fail: ".A".
> 
> Why?

I was explaining why testing for L is not TRT.

> ... | LU | LU | ...
> 
> What we want is to display the left LU to the left of the
> right LU, and display each L (character or string) to the
> right of the corresponding U.
> 
> Even if the L (of the left LU) is ".A", we don't need LRM
> for it.  We have to insert LRM only before a character that
> may reorder the previous characters, and after a character that
> may reorder the following character.  Isn't it right?

You are describing what bidi-string-mark-left-to-right does, I
believe.  Note that it will still insert LRM in some cases where it is
not strictly needed.

> > I tried also a different solution: enclose each row of the keyboard
> > layout in an L2R override embedding, LRO..PDF.  This inserts only 2
> > control characters per row, and doesn't insert them inside the
> > keyboard cells, so it is cleaner, I think.  But using this means that
> > no key description in the layout can be a string that requires
> > reordering individually.  (By contrast, inserting an LRM between the
> > lower and the upper key still allows each description to be
> > reordered.)  Can we live with such a restriction?  I don't know enough
> > about Quail to tell.
> 
> As it's possible to assign a string to a key, there will be
> the case that the characters in the string must be
> reordered.  In the above case, if L is a hebrew "שלום", it
> must be reordered.  But, even if we surround that word with
> LRE and PDF, the word itself is reordered correctly, right?

Yes.  But surrounding each `lower' and `upper' key labels in the
layout with LRE..PDF inserts even more bidirectional control
characters than just inserting LRM.  By contrast, using LRO..PDF
around the whole row of keys inserts just 2 such characters, so if it
were not for the need to reorder the individual key labels, LRO..PDF
would be a better alternative.  I mentioned it because it does exactly
what you originally asked for: it effectively disables
bidi-display-reordering inside the embedded text, while still leaving
the rest of the buffer reordered as usual.




^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: bidi and shaping problems in describe-input-method
  2012-03-10 10:27                   ` Eli Zaretskii
@ 2012-03-12  7:47                     ` Kenichi Handa
  2012-03-12 17:42                       ` Eli Zaretskii
  2012-03-13  5:46                       ` Mohsen BANAN
  0 siblings, 2 replies; 38+ messages in thread
From: Kenichi Handa @ 2012-03-12  7:47 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: list-general, emacs-devel

In article <8362eczr73.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> Yes.  But surrounding each `lower' and `upper' key labels in the
> layout with LRE..PDF inserts even more bidirectional control
> characters than just inserting LRM.  By contrast, using LRO..PDF
> around the whole row of keys inserts just 2 such characters, so if it
> were not for the need to reorder the individual key labels, LRO..PDF
> would be a better alternative.  I mentioned it because it does exactly
> what you originally asked for: it effectively disables
> bidi-display-reordering inside the embedded text, while still leaving
> the rest of the buffer reordered as usual.

I mixed up with LRE and LRO, sorry.  Anyway, if LRO..PDF
works, it is surely better than many LRMs.  I've just
installed a proper change including the magic of
compose-string.  Please try the latest code.

---
Kenichi Handa
handa@m17n.org



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: bidi and shaping problems in describe-input-method
  2012-03-12  7:47                     ` Kenichi Handa
@ 2012-03-12 17:42                       ` Eli Zaretskii
  2012-03-13  0:58                         ` Kenichi Handa
  2012-03-13  5:46                       ` Mohsen BANAN
  1 sibling, 1 reply; 38+ messages in thread
From: Eli Zaretskii @ 2012-03-12 17:42 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: list-general, emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> Cc: list-general@mohsen.1.banan.byname.net, emacs-devel@gnu.org
> Date: Mon, 12 Mar 2012 16:47:11 +0900
> 
> In article <8362eczr73.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:
> 
> > Yes.  But surrounding each `lower' and `upper' key labels in the
> > layout with LRE..PDF inserts even more bidirectional control
> > characters than just inserting LRM.  By contrast, using LRO..PDF
> > around the whole row of keys inserts just 2 such characters, so if it
> > were not for the need to reorder the individual key labels, LRO..PDF
> > would be a better alternative.  I mentioned it because it does exactly
> > what you originally asked for: it effectively disables
> > bidi-display-reordering inside the embedded text, while still leaving
> > the rest of the buffer reordered as usual.
> 
> I mixed up with LRE and LRO, sorry.  Anyway, if LRO..PDF
> works, it is surely better than many LRMs.  I've just
> installed a proper change including the magic of
> compose-string.  Please try the latest code.

It works fine for me, thanks.

However, using LRO..PDF means that no label on a key can use a string
that needs to be reordered.  That's because the LRO overrides the
bidirectional properties of all the following characters to be strong
L.  If we can live with this limitation, I agree that this is better.
But I think you said earlier that such a restriction is more than we
can bear.



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: bidi and shaping problems in describe-input-method
  2012-03-12 17:42                       ` Eli Zaretskii
@ 2012-03-13  0:58                         ` Kenichi Handa
  2012-03-13  3:58                           ` Eli Zaretskii
  0 siblings, 1 reply; 38+ messages in thread
From: Kenichi Handa @ 2012-03-13  0:58 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: list-general, emacs-devel

In article <8362e9yaum.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> However, using LRO..PDF means that no label on a key can use a string
> that needs to be reordered.  That's because the LRO overrides the
> bidirectional properties of all the following characters to be strong
> L.  

Ahh, ummm, that's not good.  I'm still misunderstanding LRO. :-(

> If we can live with this limitation, I agree that this is better.
> But I think you said earlier that such a restriction is more than we
> can bear.

What we need is to display (only capital letters are Hebrew):
    ... | HEB REW | ABC DEF | ...
as
    ... | BEH WER | CBA FED | ...

If none of LRO..PDF, LRE..PDF work, and if there's no easy
way to determine when to insert LRM, the only way is to
insert LRMs unconditionally.

---
Kenichi Handa
handa@m17n.org



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: bidi and shaping problems in describe-input-method
  2012-03-13  0:58                         ` Kenichi Handa
@ 2012-03-13  3:58                           ` Eli Zaretskii
  2012-03-22  4:26                             ` Kenichi Handa
  0 siblings, 1 reply; 38+ messages in thread
From: Eli Zaretskii @ 2012-03-13  3:58 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: list-general, emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> Cc: list-general@mohsen.1.banan.byname.net, emacs-devel@gnu.org
> Date: Tue, 13 Mar 2012 09:58:46 +0900
> 
> What we need is to display (only capital letters are Hebrew):
>     ... | HEB REW | ABC DEF | ...
> as
>     ... | BEH WER | CBA FED | ...

Right.

> If none of LRO..PDF, LRE..PDF work, and if there's no easy
> way to determine when to insert LRM, the only way is to
> insert LRMs unconditionally.

You can use bidi-string-mark-left-to-right, I think, which will
refrain from inserting the LRM characters where possible.



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: bidi and shaping problems in describe-input-method
  2012-03-12  7:47                     ` Kenichi Handa
  2012-03-12 17:42                       ` Eli Zaretskii
@ 2012-03-13  5:46                       ` Mohsen BANAN
  1 sibling, 0 replies; 38+ messages in thread
From: Mohsen BANAN @ 2012-03-13  5:46 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: Eli Zaretskii, list-general, emacs-devel

>>>>> On Mon, 12 Mar 2012 16:47:11 +0900, Kenichi Handa <handa@m17n.org> said:

  Kenichi> I've just installed a proper change
  Kenichi> including the magic of compose-string.
  Kenichi> Please try the latest code.

I tried the latest code and both shaping and bidi
describe-input-method problems are properly fixed
for both persian and arabic keybaords.

Thanks Kenichi. Thanks Eli.

...Mohsen

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: bidi and shaping problems in describe-input-method
  2012-03-13  3:58                           ` Eli Zaretskii
@ 2012-03-22  4:26                             ` Kenichi Handa
  2012-03-22 17:23                               ` Eli Zaretskii
  2012-03-22 21:59                               ` Mohsen BANAN
  0 siblings, 2 replies; 38+ messages in thread
From: Kenichi Handa @ 2012-03-22  4:26 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: list-general, emacs-devel

In article <83ty1tw3rs.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> > If none of LRO..PDF, LRE..PDF work, and if there's no easy
> > way to determine when to insert LRM, the only way is to
> > insert LRMs unconditionally.

> You can use bidi-string-mark-left-to-right, I think, which will
> refrain from inserting the LRM characters where possible.

I see.  I've just committed a change to use
bidi-string-mark-left-to-right.

Mohsen, could you please try again with the latest code?

---
Kenichi Handa
handa@m17n.org



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: bidi and shaping problems in describe-input-method
  2012-03-22  4:26                             ` Kenichi Handa
@ 2012-03-22 17:23                               ` Eli Zaretskii
  2012-03-23  1:41                                 ` Kenichi Handa
  2012-03-22 21:59                               ` Mohsen BANAN
  1 sibling, 1 reply; 38+ messages in thread
From: Eli Zaretskii @ 2012-03-22 17:23 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: list-general, emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> Cc: list-general@mohsen.1.banan.byname.net, emacs-devel@gnu.org
> Date: Thu, 22 Mar 2012 13:26:32 +0900
> 
> In article <83ty1tw3rs.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:
> 
> > > If none of LRO..PDF, LRE..PDF work, and if there's no easy
> > > way to determine when to insert LRM, the only way is to
> > > insert LRMs unconditionally.
> 
> > You can use bidi-string-mark-left-to-right, I think, which will
> > refrain from inserting the LRM characters where possible.
> 
> I see.  I've just committed a change to use
> bidi-string-mark-left-to-right.

Looks good to me (but Mohsen should tell).

Btw, there's some strange problem in displaying one label of the
hebrew-biblical-tiro input method: the character u+05ba (inserted by
Shift-5 key) is displayed as a blank rectangle.  It looks like my
fonts have no glyph for this character, but then why don't we display
this like any other glyphless character: as a hex code inside a small
rectangle?  That's what I get if I insert this character into a
buffer, but somehow the way we display it in the keyboard layout (and
in the "C-u C-x =" display under "decomposition") behaves differently.
Why is that?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: bidi and shaping problems in describe-input-method
  2012-03-22  4:26                             ` Kenichi Handa
  2012-03-22 17:23                               ` Eli Zaretskii
@ 2012-03-22 21:59                               ` Mohsen BANAN
  1 sibling, 0 replies; 38+ messages in thread
From: Mohsen BANAN @ 2012-03-22 21:59 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: Eli Zaretskii, list-general, emacs-devel



>>>>> On Thu, 22 Mar 2012 13:26:32 +0900, Kenichi Handa <handa@m17n.org> said:

  Kenichi> In article <83ty1tw3rs.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

  >> You can use bidi-string-mark-left-to-right, I think, which will
  >> refrain from inserting the LRM characters where possible.

  Kenichi> I see.  I've just committed a change to use
  Kenichi> bidi-string-mark-left-to-right.

  Kenichi> Mohsen, could you please try again with the latest code?

I tried describe-input-method for arabic and
persian input methods.

All 3 looked correct.

Thanks.

...Mohsen





^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: bidi and shaping problems in describe-input-method
  2012-03-22 17:23                               ` Eli Zaretskii
@ 2012-03-23  1:41                                 ` Kenichi Handa
  2012-03-23 10:12                                   ` Eli Zaretskii
  0 siblings, 1 reply; 38+ messages in thread
From: Kenichi Handa @ 2012-03-23  1:41 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: list-general, emacs-devel

In article <83fwd0wnwl.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> Btw, there's some strange problem in displaying one label of the
> hebrew-biblical-tiro input method: the character u+05ba (inserted by
> Shift-5 key) is displayed as a blank rectangle.  It looks like my
> fonts have no glyph for this character, but then why don't we display
> this like any other glyphless character: as a hex code inside a small
> rectangle?  That's what I get if I insert this character into a
> buffer, but somehow the way we display it in the keyboard layout (and
> in the "C-u C-x =" display under "decomposition") behaves differently.
> Why is that?

As that character is a non-spacing modifier, we display it
with a static composition, and a glyph in a static
composition are displayed by a blank rectangle if no font is
available.  This is because a hex code makes the resulting
display of composition (several glyphs may occupy a single
column) unreadable.

It may be possible to change the current code to use a hex
code displaying if a composition contains just one glyph and
that glyph has no font, but it may be for 24.2.

---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: bidi and shaping problems in describe-input-method
  2012-03-23  1:41                                 ` Kenichi Handa
@ 2012-03-23 10:12                                   ` Eli Zaretskii
  0 siblings, 0 replies; 38+ messages in thread
From: Eli Zaretskii @ 2012-03-23 10:12 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: list-general, emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> Cc: list-general@mohsen.1.banan.byname.net, emacs-devel@gnu.org
> Date: Fri, 23 Mar 2012 10:41:07 +0900
> 
> In article <83fwd0wnwl.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:
> 
> > Btw, there's some strange problem in displaying one label of the
> > hebrew-biblical-tiro input method: the character u+05ba (inserted by
> > Shift-5 key) is displayed as a blank rectangle.  It looks like my
> > fonts have no glyph for this character, but then why don't we display
> > this like any other glyphless character: as a hex code inside a small
> > rectangle?  That's what I get if I insert this character into a
> > buffer, but somehow the way we display it in the keyboard layout (and
> > in the "C-u C-x =" display under "decomposition") behaves differently.
> > Why is that?
> 
> As that character is a non-spacing modifier, we display it
> with a static composition, and a glyph in a static
> composition are displayed by a blank rectangle if no font is
> available.  This is because a hex code makes the resulting
> display of composition (several glyphs may occupy a single
> column) unreadable.
> 
> It may be possible to change the current code to use a hex
> code displaying if a composition contains just one glyph and
> that glyph has no font, but it may be for 24.2.

Fair enough.  I filed a bug report about this, so it doesn't get
forgotten.



^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2012-03-23 10:12 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-06 22:17 bidi and shaping problems in describe-input-method Mohsen BANAN
2012-03-07  4:05 ` Eli Zaretskii
2012-03-07 18:49   ` Eli Zaretskii
2012-03-07 21:32   ` Mohsen BANAN
2012-03-08 15:30     ` Kenichi Handa
2012-03-08 18:24       ` Eli Zaretskii
2012-03-08 23:48         ` Kenichi Handa
2012-03-09  8:11           ` Eli Zaretskii
2012-03-09 14:03             ` Kenichi Handa
2012-03-09 16:12               ` Eli Zaretskii
2012-03-10  2:55                 ` Kenichi Handa
2012-03-10 10:27                   ` Eli Zaretskii
2012-03-12  7:47                     ` Kenichi Handa
2012-03-12 17:42                       ` Eli Zaretskii
2012-03-13  0:58                         ` Kenichi Handa
2012-03-13  3:58                           ` Eli Zaretskii
2012-03-22  4:26                             ` Kenichi Handa
2012-03-22 17:23                               ` Eli Zaretskii
2012-03-23  1:41                                 ` Kenichi Handa
2012-03-23 10:12                                   ` Eli Zaretskii
2012-03-22 21:59                               ` Mohsen BANAN
2012-03-13  5:46                       ` Mohsen BANAN
2012-03-09  8:17           ` Eli Zaretskii
2012-03-08 18:30       ` Eli Zaretskii
2012-03-08 18:53         ` Eli Zaretskii
2012-03-08 23:27           ` Kenichi Handa
2012-03-08 23:19         ` Kenichi Handa
2012-03-09  8:15           ` Eli Zaretskii
2012-03-09  9:01             ` Juanma Barranquero
2012-03-09  9:45               ` Eli Zaretskii
2012-03-09 10:02                 ` Eli Zaretskii
2012-03-09 14:11                   ` Kenichi Handa
2012-03-09 11:19                 ` Juanma Barranquero
2012-03-09 11:41                   ` Eli Zaretskii
2012-03-09 14:56                     ` Juanma Barranquero
2012-03-09 13:54             ` Kenichi Handa
2012-03-09 16:15               ` Eli Zaretskii
2012-03-08  4:30 ` Miles Bader

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).