Re: [Emacs-diffs] /srv/bzr/emacs/trunk r105429: New function `string-mark-left-to-right' for handling LRMs.

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Re: [Emacs-diffs] /srv/bzr/emacs/trunk r105429: New function `string-mark-left-to-right' for handling LRMs.
       [not found] <E1QrEHF-0003qX-I0@vcs.savannah.gnu.org>
@ 2011-08-11  2:14 ` Stefan Monnier
  2011-08-11  3:02   ` Eli Zaretskii
  0 siblings, 1 reply; 22+ messages in thread
From: Stefan Monnier @ 2011-08-11  2:14 UTC (permalink / raw)
  To: Chong Yidong; +Cc: emacs-devel

> +** New function `string-mark-left-to-right' appends a Unicode LRM
> +(left-to-right mark) character to a string if it terminates in
> +right-to-left script.  This is useful when the buffer has overall
> +left-to-right paragraph direction and you need to insert a string
> +whose contents (and directionality) are not known in advance.

This is too low-level a description I think.  It's understandable by
people who understand what LRM does, but I think we should try and make
it clearer.  Same for its docstring, of course.
Maybe something along the lines of

  "Add whatever is necessary for STRING to make sure its content is not
  reordered with surrounding text"

Though I think this assumes that the surrounding text is L2R, in which
case the description should also say so.


        Stefan



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Emacs-diffs] /srv/bzr/emacs/trunk r105429: New function `string-mark-left-to-right' for handling LRMs.
  2011-08-11  2:14 ` [Emacs-diffs] /srv/bzr/emacs/trunk r105429: New function `string-mark-left-to-right' for handling LRMs Stefan Monnier
@ 2011-08-11  3:02   ` Eli Zaretskii
  2011-08-11  4:48     ` Eli Zaretskii
  0 siblings, 1 reply; 22+ messages in thread
From: Eli Zaretskii @ 2011-08-11  3:02 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: cyd, emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Wed, 10 Aug 2011 22:14:28 -0400
> Cc: emacs-devel@gnu.org
> 
> > +** New function `string-mark-left-to-right' appends a Unicode LRM
> > +(left-to-right mark) character to a string if it terminates in
> > +right-to-left script.

This algorithm (which the code implements) is wrong: the unwanted
reordering can happen even if the string does not end in a strong R
character.  It could end in a series of weak characters, if the strong
character preceding that is R, for example.

The precise definition of the necessary conditions is complicated.
That is why I suggested to test _all_ the characters for being strong
R.  Why wasn't that implemented?  It might catch more strings that
need this, but at least it won't miss any.

If we really want only a 100% accurate solution, I will need to code
something non-trivial.  Let me know.

>                           This is useful when the buffer has overall
> > +left-to-right paragraph direction and you need to insert a string
> > +whose contents (and directionality) are not known in advance.
> 
> This is too low-level a description I think.  It's understandable by
> people who understand what LRM does

I think even people who know about that won't realize the purpose.

> Maybe something along the lines of
> 
>   "Add whatever is necessary for STRING to make sure its content is not
>   reordered with surrounding text"

This is also incorrect or at least inaccurate.  The problem is with
reordering the text following the offending string, not surrounding
it, nor with reordering the string content itself.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Emacs-diffs] /srv/bzr/emacs/trunk r105429: New function `string-mark-left-to-right' for handling LRMs.
  2011-08-11  3:02   ` Eli Zaretskii
@ 2011-08-11  4:48     ` Eli Zaretskii
  2011-08-11 19:01       ` Chong Yidong
  0 siblings, 1 reply; 22+ messages in thread
From: Eli Zaretskii @ 2011-08-11  4:48 UTC (permalink / raw)
  To: monnier, cyd; +Cc: emacs-devel

> Date: Thu, 11 Aug 2011 06:02:39 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: cyd@stupidchicken.com, emacs-devel@gnu.org
> 
> > From: Stefan Monnier <monnier@iro.umontreal.ca>
> > Date: Wed, 10 Aug 2011 22:14:28 -0400
> > Cc: emacs-devel@gnu.org
> > 
> > > +** New function `string-mark-left-to-right' appends a Unicode LRM
> > > +(left-to-right mark) character to a string if it terminates in
> > > +right-to-left script.
> 
> This algorithm (which the code implements) is wrong: the unwanted
> reordering can happen even if the string does not end in a strong R
> character.  It could end in a series of weak characters, if the strong
> character preceding that is R, for example.

And since buffer-menu.el was already modified to use this function, it
is easy to see how this algorithm fails: make a buffer whose name is
made of all R2L characters with the "<1>" tail appended, then type
"C-x C-b" and watch the messed-up display.  The original code treated
this case correctly.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Emacs-diffs] /srv/bzr/emacs/trunk r105429: New function `string-mark-left-to-right' for handling LRMs.
  2011-08-11  4:48     ` Eli Zaretskii
@ 2011-08-11 19:01       ` Chong Yidong
  2011-08-12  7:21         ` Eli Zaretskii
  0 siblings, 1 reply; 22+ messages in thread
From: Chong Yidong @ 2011-08-11 19:01 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: monnier, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> This algorithm (which the code implements) is wrong: the unwanted
>> reordering can happen even if the string does not end in a strong R
>> character.  It could end in a series of weak characters, if the strong
>> character preceding that is R, for example.

OK, so basically we have to scan the entire string like this, right?

  (let ((len (length str))
	(n 0)
	rtl-found)
    (while (and (not rtl-found) (< n len))
      (setq rtl-found (memq (get-char-code-property
			     (aref str n) 'bidi-class) '(R AL))
	    n (1+ n)))
    (if rtl-found
	(concat str (propertize (string ?\x200e) 'invisible t))
      str)))

> And since buffer-menu.el was already modified to use this function, it
> is easy to see how this algorithm fails: make a buffer whose name is
> made of all R2L characters with the "<1>" tail appended, then type
> "C-x C-b" and watch the messed-up display.  The original code treated
> this case correctly.

Actually, it looks as though the <1> is not treated properly, even with
the old code.  If I do

    (rename-buffer (concat "السّلام عليكم"
			   "<1>"))

then the buffer is displayed as 1>[some Arabic text]>, both in the
mode-line and in the buffer menu.  I guess the code that appends the
"<n>" needs to use string-mark-left-to-right as well.

For extra hilarity, get rid of the newline in this code fragment and
watch as the redisplayed Emacs Lisp code turns into gibberish...



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Emacs-diffs] /srv/bzr/emacs/trunk r105429: New function `string-mark-left-to-right' for handling LRMs.
  2011-08-11 19:01       ` Chong Yidong
@ 2011-08-12  7:21         ` Eli Zaretskii
  2011-08-12 15:47           ` Chong Yidong
  2011-08-13  7:00           ` Kenichi Handa
  0 siblings, 2 replies; 22+ messages in thread
From: Eli Zaretskii @ 2011-08-12  7:21 UTC (permalink / raw)
  To: Chong Yidong; +Cc: monnier, emacs-devel

> From: Chong Yidong <cyd@stupidchicken.com>
> Cc: monnier@iro.umontreal.ca, emacs-devel@gnu.org
> Date: Thu, 11 Aug 2011 15:01:16 -0400
> 
> OK, so basically we have to scan the entire string like this, right?

Yes.

Btw, is there a way to regex-search for a character by its bidi
category?  That would make the code more elegant, and probably quite
faster as well.

>   (let ((len (length str))
> 	(n 0)
> 	rtl-found)
>     (while (and (not rtl-found) (< n len))
>       (setq rtl-found (memq (get-char-code-property
> 			     (aref str n) 'bidi-class) '(R AL))
                                                       ^^^^^^^
Make that '(R AL RLO), since the RLO character overrides the
bidirectional properties of all the following characters with R.

> > And since buffer-menu.el was already modified to use this function, it
> > is easy to see how this algorithm fails: make a buffer whose name is
> > made of all R2L characters with the "<1>" tail appended, then type
> > "C-x C-b" and watch the messed-up display.  The original code treated
> > this case correctly.
> 
> Actually, it looks as though the <1> is not treated properly, even with
> the old code.  If I do
> 
>     (rename-buffer (concat "السّلام عليكم"
> 			   "<1>"))
> 
> then the buffer is displayed as 1>[some Arabic text]>, both in the
> mode-line and in the buffer menu.

I didn't mean the display of the buffer name itself; that is a
separate issue, as discussed here:

  https://lists.gnu.org/archive/html/emacs-devel/2011-06/msg00712.html

I meant the buffer menu layout: with the current bzr tip, the buffer
size was displayed to the left of the buffer name, whereas the
previous code displayed it to the right of the name, as with buffer
names without R2L characters.

> For extra hilarity, get rid of the newline in this code fragment and
> watch as the redisplayed Emacs Lisp code turns into gibberish...

That's one of the problems that are not yet solved in Emacs: how to
display code with comments and strings in R2L languages without
messing up the display and without giving up the reordering of the
strings and comments.




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Emacs-diffs] /srv/bzr/emacs/trunk r105429: New function `string-mark-left-to-right' for handling LRMs.
  2011-08-12  7:21         ` Eli Zaretskii
@ 2011-08-12 15:47           ` Chong Yidong
  2011-08-12 15:54             ` Eli Zaretskii
  2011-08-13  7:00           ` Kenichi Handa
  1 sibling, 1 reply; 22+ messages in thread
From: Chong Yidong @ 2011-08-12 15:47 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: monnier, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> Btw, is there a way to regex-search for a character by its bidi
> category?  That would make the code more elegant, and probably quite
> faster as well.

I don't think so, and anyway such a regex-search would still have to go
through the functions of mule-cmds.el.  A better optimization might be
to provide the equivalents of `next-property-change' etc. for char code
properties.

In practice, testing indicates string-mark-left-to-right has acceptable
speed for its present uses in buffer menu and tabulated list mode.
Unless a real problem shows up (e.g. when we apply the fix to Gnus),
let's revisit this issue later.

>> 			     (aref str n) 'bidi-class) '(R AL))
>                                                        ^^^^^^^
> Make that '(R AL RLO), since the RLO character overrides the
> bidirectional properties of all the following characters with R.

OK, committed along with doc changes.

>> then the buffer is displayed as 1>[some Arabic text]>, both in the
>> mode-line and in the buffer menu.
>
> I didn't mean the display of the buffer name itself; that is a
> separate issue, as discussed here:

Was there any resolution on that thread?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Emacs-diffs] /srv/bzr/emacs/trunk r105429: New function `string-mark-left-to-right' for handling LRMs.
  2011-08-12 15:47           ` Chong Yidong
@ 2011-08-12 15:54             ` Eli Zaretskii
  2011-08-12 16:00               ` Chong Yidong
  0 siblings, 1 reply; 22+ messages in thread
From: Eli Zaretskii @ 2011-08-12 15:54 UTC (permalink / raw)
  To: Chong Yidong; +Cc: monnier, emacs-devel

> From: Chong Yidong <cyd@stupidchicken.com>
> Cc: monnier@iro.umontreal.ca, emacs-devel@gnu.org
> Date: Fri, 12 Aug 2011 11:47:00 -0400
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > Btw, is there a way to regex-search for a character by its bidi
> > category?  That would make the code more elegant, and probably quite
> > faster as well.
> 
> I don't think so, and anyway such a regex-search would still have to go
> through the functions of mule-cmds.el.

Which ones?

>  A better optimization might be to provide the equivalents of
> `next-property-change' etc. for char code properties.

That's almost trivial, but we are in feature freeze, right?

> >> then the buffer is displayed as 1>[some Arabic text]>, both in the
> >> mode-line and in the buffer menu.
> >
> > I didn't mean the display of the buffer name itself; that is a
> > separate issue, as discussed here:
> 
> Was there any resolution on that thread?

I was under the impression that Someone(TM) volunteered to write a
function that returns buffer names decorated with the directional
control characters to make them display reasonably.  But maybe I was
mistaken.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Emacs-diffs] /srv/bzr/emacs/trunk r105429: New function `string-mark-left-to-right' for handling LRMs.
  2011-08-12 15:54             ` Eli Zaretskii
@ 2011-08-12 16:00               ` Chong Yidong
  2011-08-12 17:25                 ` Eli Zaretskii
  0 siblings, 1 reply; 22+ messages in thread
From: Chong Yidong @ 2011-08-12 16:00 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: monnier, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> I don't think so, and anyway such a regex-search would still have to go
>> through the functions of mule-cmds.el.
>
> Which ones?

get-char-code-property, presumably (unless you want to replicate all the
char-code table logic in C).

>>  A better optimization might be to provide the equivalents of
>> `next-property-change' etc. for char code properties.
>
> That's almost trivial, but we are in feature freeze, right?

Right, so I would put this off unless it is demonstrably needed.

>> Was there any resolution on that thread?
>
> I was under the impression that Someone(TM) volunteered to write a
> function that returns buffer names decorated with the directional
> control characters to make them display reasonably.  But maybe I was
> mistaken.

I can take a look.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Emacs-diffs] /srv/bzr/emacs/trunk r105429: New function `string-mark-left-to-right' for handling LRMs.
  2011-08-12 16:00               ` Chong Yidong
@ 2011-08-12 17:25                 ` Eli Zaretskii
  0 siblings, 0 replies; 22+ messages in thread
From: Eli Zaretskii @ 2011-08-12 17:25 UTC (permalink / raw)
  To: Chong Yidong; +Cc: monnier, emacs-devel

> From: Chong Yidong <cyd@stupidchicken.com>
> Cc: monnier@iro.umontreal.ca, emacs-devel@gnu.org
> Date: Fri, 12 Aug 2011 12:00:45 -0400
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> I don't think so, and anyway such a regex-search would still have to go
> >> through the functions of mule-cmds.el.
> >
> > Which ones?
> 
> get-char-code-property, presumably (unless you want to replicate all the
> char-code table logic in C).

That char-code already exists on the C level: how do you think bidi.c
knows the bidirectional properties of each character it encounters?

> > I was under the impression that Someone(TM) volunteered to write a
> > function that returns buffer names decorated with the directional
> > control characters to make them display reasonably.  But maybe I was
> > mistaken.
> 
> I can take a look.

Thanks.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Emacs-diffs] /srv/bzr/emacs/trunk r105429: New function `string-mark-left-to-right' for handling LRMs.
  2011-08-12  7:21         ` Eli Zaretskii
  2011-08-12 15:47           ` Chong Yidong
@ 2011-08-13  7:00           ` Kenichi Handa
  2011-08-13  7:11             ` Eli Zaretskii
  1 sibling, 1 reply; 22+ messages in thread
From: Kenichi Handa @ 2011-08-13  7:00 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: cyd, monnier, emacs-devel

In article <8362m3xfgz.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:
> Btw, is there a way to regex-search for a character by its bidi
> category?  That would make the code more elegant, and probably quite
> faster as well.

You can do that as follows:

(1) Generate a special category table.

(defvar special-category-table-for-bidi
  (let ((category-table (make-category-table))
	(uniprop-table (unicode-property-table-internal 'bidi-class)))
    (define-category ?r "Bidi class R, AL, or RL" category-table)
    (map-char-table
     #'(lambda (key val)
	 (if (memq val '(R AL RL))
	     (modify-category-entry key ?r category-table)))
     uniprop-table)
    category-table))

(2) Check if a string or buffer contains a special bidi-related charaters.

;; For string...
(defun check-special-bidi-character (str)
  (with-category-table special-category-table-for-bidi
    (string-match "\\cr" str)))

(check-special-bidi-character "abc") => nil
(check-special-bidi-character "abc א") => 4

---
Kenichi Handa
handa@m17n.org



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Emacs-diffs] /srv/bzr/emacs/trunk r105429: New function `string-mark-left-to-right' for handling LRMs.
  2011-08-13  7:00           ` Kenichi Handa
@ 2011-08-13  7:11             ` Eli Zaretskii
  2011-08-13  7:42               ` Kenichi Handa
  0 siblings, 1 reply; 22+ messages in thread
From: Eli Zaretskii @ 2011-08-13  7:11 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: cyd, monnier, emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> Cc: cyd@stupidchicken.com, monnier@iro.umontreal.ca, emacs-devel@gnu.org
> Date: Sat, 13 Aug 2011 16:00:40 +0900
> 
> ;; For string...
> (defun check-special-bidi-character (str)
>   (with-category-table special-category-table-for-bidi
>     (string-match "\\cr" str)))
> 
> (check-special-bidi-character "abc") => nil
> (check-special-bidi-character "abc א")‎ => 4

Thanks!  I think we should have a few of such category-tables in Emacs
by default.




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Emacs-diffs] /srv/bzr/emacs/trunk r105429: New function `string-mark-left-to-right' for handling LRMs.
  2011-08-13  7:11             ` Eli Zaretskii
@ 2011-08-13  7:42               ` Kenichi Handa
  2011-08-13 13:53                 ` Stefan Monnier
  2011-08-16  7:44                 ` Eli Zaretskii
  0 siblings, 2 replies; 22+ messages in thread
From: Kenichi Handa @ 2011-08-13  7:42 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: cyd, monnier, emacs-devel

In article <83hb5lwzt2.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> > From: Kenichi Handa <handa@m17n.org>
> > Cc: cyd@stupidchicken.com, monnier@iro.umontreal.ca, emacs-devel@gnu.org
> > Date: Sat, 13 Aug 2011 16:00:40 +0900
> > 
> > ;; For string...
> > (defun check-special-bidi-character (str)
> >   (with-category-table special-category-table-for-bidi
> >     (string-match "\\cr" str)))
> > 
> > (check-special-bidi-character "abc") => nil
> > (check-special-bidi-character "abc א")‎ => 4

> Thanks!  I think we should have a few of such category-tables in Emacs
> by default.

As categories are not exclusive (i.e. one character can have
multiple categories), I think you need just one
category-table.  In which, each character has a category
uniquely corresponding to a bidi class (L, AL, etc), in
addition, all some character has a category whose meaning
is, for instance (one of R, AL, or RLO).

For instance, if you define a cateogry ?R as bidi class R,
and define a category ?r as one of (R, AL, or RLO), the
character `א' has two categories ?R and ?r, which means

(with-category-table special-category-table-for-bidi
  (cons (string-match "\\cR" "א")  (string-match "\\cr" "א")))
  => (0 . 0)

As we can define 95 different categories in a single
category table, I think the number of categories are
sufficient.

---
Kenichi Handa
handa@m17n.org



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Emacs-diffs] /srv/bzr/emacs/trunk r105429: New function `string-mark-left-to-right' for handling LRMs.
  2011-08-13  7:42               ` Kenichi Handa
@ 2011-08-13 13:53                 ` Stefan Monnier
  2011-08-14 16:21                   ` Chong Yidong
  2011-08-16  7:44                 ` Eli Zaretskii
  1 sibling, 1 reply; 22+ messages in thread
From: Stefan Monnier @ 2011-08-13 13:53 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: Eli Zaretskii, cyd, emacs-devel

> As categories are not exclusive (i.e. one character can have
> multiple categories), I think you need just one
> category-table.  In which, each character has a category
> uniquely corresponding to a bidi class (L, AL, etc), in
> addition, all some character has a category whose meaning
> is, for instance (one of R, AL, or RLO).

We could also add a new special regexp construct, similar to \c but for
Unicode properties.
The advantage being that it lets us use the Unicode tables without
having to build a new category table that keeps another copy of it, and
without having to use "(with-category-table special-category-table-for-bidi".


        Stefan



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Emacs-diffs] /srv/bzr/emacs/trunk r105429: New function `string-mark-left-to-right' for handling LRMs.
  2011-08-13 13:53                 ` Stefan Monnier
@ 2011-08-14 16:21                   ` Chong Yidong
  0 siblings, 0 replies; 22+ messages in thread
From: Chong Yidong @ 2011-08-14 16:21 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, emacs-devel, Kenichi Handa

Stefan Monnier <monnier@iro.umontreal.ca> writes:

> We could also add a new special regexp construct, similar to \c but
> for Unicode properties.  The advantage being that it lets us use the
> Unicode tables without having to build a new category table that keeps
> another copy of it, and without having to use "(with-category-table
> special-category-table-for-bidi".

We can put it in the standard category table.  This works:

=== modified file 'lisp/international/characters.el'
*** lisp/international/characters.el	2011-07-06 22:43:48 +0000
--- lisp/international/characters.el	2011-08-14 02:21:19 +0000
***************
*** 114,119 ****
--- 114,123 ----
  Base characters (Unicode General Category L,N,P,S,Zs)")
  (define-category ?^ "Combining
  Combining diacritic or mark (Unicode General Category M)")
+ 
+ ;; RTL scripts
+ (define-category ?R "Right-to-left
+ Characters with R, AL, or RLO bidi type.")
  \f
  ;;; Setting syntax and category.
  
***************
*** 478,483 ****
--- 482,494 ----
  		  (modify-category-entry x category))
  	      chars)))))
  
+ ;; RTL scripts.
+ 
+ (map-char-table (lambda (key val)
+ 		  (if (memq val '(R AL RLO))
+ 		      (modify-category-entry key ?R)))
+ 		(unicode-property-table-internal 'bidi-class))
+ 
  ;; Latin
  
  (modify-category-entry '(#x80 . #x024F) ?l)

=== modified file 'lisp/subr.el'
*** lisp/subr.el	2011-08-12 15:43:30 +0000
--- lisp/subr.el	2011-08-13 15:42:16 +0000
***************
*** 3553,3568 ****
  If STR contains no RTL characters, return STR."
    (unless (stringp str)
      (signal 'wrong-type-argument (list 'stringp str)))
!   (let ((len (length str))
! 	(n 0)
! 	rtl-found)
!     (while (and (not rtl-found) (< n len))
!       (setq rtl-found (memq (get-char-code-property
! 			     (aref str n) 'bidi-class) '(R AL RLO))
! 	    n (1+ n)))
!     (if rtl-found
! 	(concat str (propertize (string ?\x200e) 'invisible t))
!       str)))
  \f
  ;;;; invisibility specs
  
--- 3553,3561 ----
  If STR contains no RTL characters, return STR."
    (unless (stringp str)
      (signal 'wrong-type-argument (list 'stringp str)))
!   (if (string-match "\\cR" str)
!       (concat str (propertize (string ?\x200e) 'invisible t))
!     str))
  \f
  ;;;; invisibility specs
  




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Emacs-diffs] /srv/bzr/emacs/trunk r105429: New function `string-mark-left-to-right' for handling LRMs.
  2011-08-13  7:42               ` Kenichi Handa
  2011-08-13 13:53                 ` Stefan Monnier
@ 2011-08-16  7:44                 ` Eli Zaretskii
  2011-08-16 23:57                   ` Kenichi Handa
  1 sibling, 1 reply; 22+ messages in thread
From: Eli Zaretskii @ 2011-08-16  7:44 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: cyd, monnier, emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> Cc: cyd@stupidchicken.com, monnier@iro.umontreal.ca, emacs-devel@gnu.org
> Date: Sat, 13 Aug 2011 16:42:43 +0900
> 
> > > (defun check-special-bidi-character (str)
> > >   (with-category-table special-category-table-for-bidi
> > >     (string-match "\\cr" str)))
> > > 
> > > (check-special-bidi-character "abc") => nil
> > > (check-special-bidi-character "abc א")‎ => 4
> 
> > Thanks!  I think we should have a few of such category-tables in Emacs
> > by default.
> 
> As categories are not exclusive (i.e. one character can have
> multiple categories), I think you need just one
> category-table.

Would it be a good idea to add such categories to the standard
category table?  IOW, why do we need a special category table to
search for these characters?




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Emacs-diffs] /srv/bzr/emacs/trunk r105429: New function `string-mark-left-to-right' for handling LRMs.
  2011-08-16  7:44                 ` Eli Zaretskii
@ 2011-08-16 23:57                   ` Kenichi Handa
  2011-08-17  5:49                     ` Eli Zaretskii
  0 siblings, 1 reply; 22+ messages in thread
From: Kenichi Handa @ 2011-08-16 23:57 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: cyd, monnier, emacs-devel

In article <83r54lu7f4.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> > > > (defun check-special-bidi-character (str)
> > > >   (with-category-table special-category-table-for-bidi
> > > >     (string-match "\\cr" str)))
> > > > 
> > > > (check-special-bidi-character "abc") => nil
> > > > (check-special-bidi-character "abc א")‎ => 4
> > 
> > > Thanks!  I think we should have a few of such category-tables in Emacs
> > > by default.
> > 
> > As categories are not exclusive (i.e. one character can have
> > multiple categories), I think you need just one
> > category-table.

> Would it be a good idea to add such categories to the standard
> category table?  IOW, why do we need a special category table to
> search for these characters?

We can define at most 95 categories in one table, and, in
the standard category table, we already defined 41
categories.

For bidi, we need at least 18 categories (there are 18 bidi
classes) and a few more for combinations.  Adding all of
them to the standard category table makes the remaining
category space less than half of the whole space.  So, I
think we should be careful.

In addtion, adding them to the standard category table means
we can't select a proper category mnemonic character.

---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Emacs-diffs] /srv/bzr/emacs/trunk r105429: New function `string-mark-left-to-right' for handling LRMs.
  2011-08-16 23:57                   ` Kenichi Handa
@ 2011-08-17  5:49                     ` Eli Zaretskii
  2011-08-17  7:21                       ` Kenichi Handa
  0 siblings, 1 reply; 22+ messages in thread
From: Eli Zaretskii @ 2011-08-17  5:49 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: cyd, monnier, emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> Cc: cyd@stupidchicken.com, monnier@iro.umontreal.ca, emacs-devel@gnu.org
> Date: Wed, 17 Aug 2011 08:57:49 +0900
> 
> > Would it be a good idea to add such categories to the standard
> > category table?  IOW, why do we need a special category table to
> > search for these characters?
> 
> We can define at most 95 categories in one table, and, in
> the standard category table, we already defined 41
> categories.
> 
> For bidi, we need at least 18 categories (there are 18 bidi
> classes) and a few more for combinations.  Adding all of
> them to the standard category table makes the remaining
> category space less than half of the whole space.  So, I
> think we should be careful.

I didn't mean to add each bidi type as a separate category (there are
19 of them, btw).  I did mean to carefully define the most frequently
needed categories, like the one which started this discussion, and add
only those.  The gain would be that we won't need to use
with-category-table around code which needs to search for characters
by their bidi types, and we will be able to combine bidi-related
categories with other standard categories in the same regular
expression.

One possible set of categories is just the 3 bidi categories defined
by UAX#9: Strong, Weak, and Neutral.  We'd probably need to split the
first one in two, depending on directionality, so Strong_R, Strong_L,
Weak, and Neutral would be my initial guess.

However, we should gather more experience before we decide.

> In addtion, adding them to the standard category table means
> we can't select a proper category mnemonic character.

?? We can use any one that is currently unused, no?  Those that are
used are shown by describe-categories, right?  Or am I missing
something?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Emacs-diffs] /srv/bzr/emacs/trunk r105429: New function `string-mark-left-to-right' for handling LRMs.
  2011-08-17  5:49                     ` Eli Zaretskii
@ 2011-08-17  7:21                       ` Kenichi Handa
  2011-08-17  9:15                         ` Eli Zaretskii
  2011-08-17 21:12                         ` Chong Yidong
  0 siblings, 2 replies; 22+ messages in thread
From: Kenichi Handa @ 2011-08-17  7:21 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: cyd, monnier, emacs-devel

In article <834o1gtwne.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> I didn't mean to add each bidi type as a separate category (there are
> 19 of them, btw).

Oops, sorry I mis-counted them.

> I did mean to carefully define the most frequently
> needed categories, like the one which started this discussion, and add
> only those.  The gain would be that we won't need to use
> with-category-table around code which needs to search for characters
> by their bidi types, and we will be able to combine bidi-related
> categories with other standard categories in the same regular
> expression.

> One possible set of categories is just the 3 bidi categories defined
> by UAX#9: Strong, Weak, and Neutral.  We'd probably need to split the
> first one in two, depending on directionality, so Strong_R, Strong_L,
> Weak, and Neutral would be my initial guess.

Ah, I see.  It may be ok to add just a few categories to the
standard categories table.

> However, we should gather more experience before we decide.

> > In addtion, adding them to the standard category table means
> > we can't select a proper category mnemonic character.

> ?? We can use any one that is currently unused, no?  Those that are
> used are shown by describe-categories, right?

Yes.  I just thought that it's difficult to find proper
mnemonics for all 19 bidi classes among the unsed ones.

By the way, Stefan' suggestion of extending regexp is also
worth considering (though I have no idea what kind of format
we can use for them).

One more tip: It may be a little bit faster to use a
bidi-specific category table with with-category-table
because, in most cases, we can find a category set for a
specific character faster.  In a bidi-specific category
table, most characters (e.g. all han characters) will have
the same category set and thus the set is recorded for a
group of characters.

---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Emacs-diffs] /srv/bzr/emacs/trunk r105429: New function `string-mark-left-to-right' for handling LRMs.
  2011-08-17  7:21                       ` Kenichi Handa
@ 2011-08-17  9:15                         ` Eli Zaretskii
  2011-08-18  2:13                           ` Kenichi Handa
  2011-08-17 21:12                         ` Chong Yidong
  1 sibling, 1 reply; 22+ messages in thread
From: Eli Zaretskii @ 2011-08-17  9:15 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: cyd, monnier, emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> Date: Wed, 17 Aug 2011 16:21:44 +0900
> Cc: cyd@stupidchicken.com, monnier@iro.umontreal.ca, emacs-devel@gnu.org
> 
> By the way, Stefan' suggestion of extending regexp is also
> worth considering (though I have no idea what kind of format
> we can use for them).

If the important categories are part of the standard category-table,
then I don't see any advantages to Stefan's proposal.  The underlying
implementation will be the same: access to uniprop tables.

> One more tip: It may be a little bit faster to use a
> bidi-specific category table with with-category-table
> because, in most cases, we can find a category set for a
> specific character faster.  In a bidi-specific category
> table, most characters (e.g. all han characters) will have
> the same category set and thus the set is recorded for a
> group of characters.

You mean, because CHAR_TABLE_REF will be faster?



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Emacs-diffs] /srv/bzr/emacs/trunk r105429: New function `string-mark-left-to-right' for handling LRMs.
  2011-08-17  9:15                         ` Eli Zaretskii
@ 2011-08-18  2:13                           ` Kenichi Handa
  0 siblings, 0 replies; 22+ messages in thread
From: Kenichi Handa @ 2011-08-18  2:13 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: cyd, monnier, emacs-devel

In article <83y5yss8j5.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> > One more tip: It may be a little bit faster to use a
> > bidi-specific category table with with-category-table
> > because, in most cases, we can find a category set for a
> > specific character faster.  In a bidi-specific category
> > table, most characters (e.g. all han characters) will have
> > the same category set and thus the set is recorded for a
> > group of characters.

> You mean, because CHAR_TABLE_REF will be faster?

Yes.

---
Kenichi Handa
handa@m17n.org



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Emacs-diffs] /srv/bzr/emacs/trunk r105429: New function `string-mark-left-to-right' for handling LRMs.
  2011-08-17  7:21                       ` Kenichi Handa
  2011-08-17  9:15                         ` Eli Zaretskii
@ 2011-08-17 21:12                         ` Chong Yidong
  2011-08-18  7:09                           ` Eli Zaretskii
  1 sibling, 1 reply; 22+ messages in thread
From: Chong Yidong @ 2011-08-17 21:12 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: Eli Zaretskii, monnier, emacs-devel

Kenichi Handa <handa@m17n.org> writes:

> One more tip: It may be a little bit faster to use a bidi-specific
> category table with with-category-table because, in most cases, we can
> find a category set for a specific character faster.  In a
> bidi-specific category table, most characters (e.g. all han
> characters) will have the same category set and thus the set is
> recorded for a group of characters.

Currently, we are using this only for string-mark-left-to-right, and
performance does not seem to be a problem for that usage.

Also, we only need the "strong R" category.  We could add the "strong L"
category for symmetry, but I don't see the need to add the other bidi
categories until they are called for.

So, I propose adding

 ?L - Strong-L bidi types (L, LRE, LRO)
 ?R - Strong-R bidi types (R, AL, RLE, RLO)

to the standard category table.

Sound fine?



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Emacs-diffs] /srv/bzr/emacs/trunk r105429: New function `string-mark-left-to-right' for handling LRMs.
  2011-08-17 21:12                         ` Chong Yidong
@ 2011-08-18  7:09                           ` Eli Zaretskii
  0 siblings, 0 replies; 22+ messages in thread
From: Eli Zaretskii @ 2011-08-18  7:09 UTC (permalink / raw)
  To: Chong Yidong; +Cc: emacs-devel, monnier, handa

> From: Chong Yidong <cyd@stupidchicken.com>
> Cc: Eli Zaretskii <eliz@gnu.org>, monnier@iro.umontreal.ca,
>         emacs-devel@gnu.org
> Date: Wed, 17 Aug 2011 17:12:16 -0400
> 
> Also, we only need the "strong R" category.  We could add the "strong L"
> category for symmetry, but I don't see the need to add the other bidi
> categories until they are called for.

We will need the weak and neutral categories (or at least a single
category for both of them) if we ever want to become more accurate
about the need for placing LRM/RLM to fix the display.  That's because
the display becomes "messed-up" when a strong R character is followed
by weak (e.g. digits) or neutral (whitespace, punctuation) characters.

But it is okay to add that later, if you don't want to do that now.

> So, I propose adding
> 
>  ?L - Strong-L bidi types (L, LRE, LRO)
>  ?R - Strong-R bidi types (R, AL, RLE, RLO)
> 
> to the standard category table.
> 
> Sound fine?

Yep.

Thanks.



^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2011-08-18  7:09 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <E1QrEHF-0003qX-I0@vcs.savannah.gnu.org>
2011-08-11  2:14 ` [Emacs-diffs] /srv/bzr/emacs/trunk r105429: New function `string-mark-left-to-right' for handling LRMs Stefan Monnier
2011-08-11  3:02   ` Eli Zaretskii
2011-08-11  4:48     ` Eli Zaretskii
2011-08-11 19:01       ` Chong Yidong
2011-08-12  7:21         ` Eli Zaretskii
2011-08-12 15:47           ` Chong Yidong
2011-08-12 15:54             ` Eli Zaretskii
2011-08-12 16:00               ` Chong Yidong
2011-08-12 17:25                 ` Eli Zaretskii
2011-08-13  7:00           ` Kenichi Handa
2011-08-13  7:11             ` Eli Zaretskii
2011-08-13  7:42               ` Kenichi Handa
2011-08-13 13:53                 ` Stefan Monnier
2011-08-14 16:21                   ` Chong Yidong
2011-08-16  7:44                 ` Eli Zaretskii
2011-08-16 23:57                   ` Kenichi Handa
2011-08-17  5:49                     ` Eli Zaretskii
2011-08-17  7:21                       ` Kenichi Handa
2011-08-17  9:15                         ` Eli Zaretskii
2011-08-18  2:13                           ` Kenichi Handa
2011-08-17 21:12                         ` Chong Yidong
2011-08-18  7:09                           ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).