* Re: [Emacs-diffs] /srv/bzr/emacs/trunk r105429: New function `string-mark-left-to-right' for handling LRMs. [not found] <E1QrEHF-0003qX-I0@vcs.savannah.gnu.org> @ 2011-08-11 2:14 ` Stefan Monnier 2011-08-11 3:02 ` Eli Zaretskii 0 siblings, 1 reply; 22+ messages in thread From: Stefan Monnier @ 2011-08-11 2:14 UTC (permalink / raw) To: Chong Yidong; +Cc: emacs-devel > +** New function `string-mark-left-to-right' appends a Unicode LRM > +(left-to-right mark) character to a string if it terminates in > +right-to-left script. This is useful when the buffer has overall > +left-to-right paragraph direction and you need to insert a string > +whose contents (and directionality) are not known in advance. This is too low-level a description I think. It's understandable by people who understand what LRM does, but I think we should try and make it clearer. Same for its docstring, of course. Maybe something along the lines of "Add whatever is necessary for STRING to make sure its content is not reordered with surrounding text" Though I think this assumes that the surrounding text is L2R, in which case the description should also say so. Stefan ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Emacs-diffs] /srv/bzr/emacs/trunk r105429: New function `string-mark-left-to-right' for handling LRMs. 2011-08-11 2:14 ` [Emacs-diffs] /srv/bzr/emacs/trunk r105429: New function `string-mark-left-to-right' for handling LRMs Stefan Monnier @ 2011-08-11 3:02 ` Eli Zaretskii 2011-08-11 4:48 ` Eli Zaretskii 0 siblings, 1 reply; 22+ messages in thread From: Eli Zaretskii @ 2011-08-11 3:02 UTC (permalink / raw) To: Stefan Monnier; +Cc: cyd, emacs-devel > From: Stefan Monnier <monnier@iro.umontreal.ca> > Date: Wed, 10 Aug 2011 22:14:28 -0400 > Cc: emacs-devel@gnu.org > > > +** New function `string-mark-left-to-right' appends a Unicode LRM > > +(left-to-right mark) character to a string if it terminates in > > +right-to-left script. This algorithm (which the code implements) is wrong: the unwanted reordering can happen even if the string does not end in a strong R character. It could end in a series of weak characters, if the strong character preceding that is R, for example. The precise definition of the necessary conditions is complicated. That is why I suggested to test _all_ the characters for being strong R. Why wasn't that implemented? It might catch more strings that need this, but at least it won't miss any. If we really want only a 100% accurate solution, I will need to code something non-trivial. Let me know. > This is useful when the buffer has overall > > +left-to-right paragraph direction and you need to insert a string > > +whose contents (and directionality) are not known in advance. > > This is too low-level a description I think. It's understandable by > people who understand what LRM does I think even people who know about that won't realize the purpose. > Maybe something along the lines of > > "Add whatever is necessary for STRING to make sure its content is not > reordered with surrounding text" This is also incorrect or at least inaccurate. The problem is with reordering the text following the offending string, not surrounding it, nor with reordering the string content itself. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Emacs-diffs] /srv/bzr/emacs/trunk r105429: New function `string-mark-left-to-right' for handling LRMs. 2011-08-11 3:02 ` Eli Zaretskii @ 2011-08-11 4:48 ` Eli Zaretskii 2011-08-11 19:01 ` Chong Yidong 0 siblings, 1 reply; 22+ messages in thread From: Eli Zaretskii @ 2011-08-11 4:48 UTC (permalink / raw) To: monnier, cyd; +Cc: emacs-devel > Date: Thu, 11 Aug 2011 06:02:39 +0300 > From: Eli Zaretskii <eliz@gnu.org> > Cc: cyd@stupidchicken.com, emacs-devel@gnu.org > > > From: Stefan Monnier <monnier@iro.umontreal.ca> > > Date: Wed, 10 Aug 2011 22:14:28 -0400 > > Cc: emacs-devel@gnu.org > > > > > +** New function `string-mark-left-to-right' appends a Unicode LRM > > > +(left-to-right mark) character to a string if it terminates in > > > +right-to-left script. > > This algorithm (which the code implements) is wrong: the unwanted > reordering can happen even if the string does not end in a strong R > character. It could end in a series of weak characters, if the strong > character preceding that is R, for example. And since buffer-menu.el was already modified to use this function, it is easy to see how this algorithm fails: make a buffer whose name is made of all R2L characters with the "<1>" tail appended, then type "C-x C-b" and watch the messed-up display. The original code treated this case correctly. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Emacs-diffs] /srv/bzr/emacs/trunk r105429: New function `string-mark-left-to-right' for handling LRMs. 2011-08-11 4:48 ` Eli Zaretskii @ 2011-08-11 19:01 ` Chong Yidong 2011-08-12 7:21 ` Eli Zaretskii 0 siblings, 1 reply; 22+ messages in thread From: Chong Yidong @ 2011-08-11 19:01 UTC (permalink / raw) To: Eli Zaretskii; +Cc: monnier, emacs-devel Eli Zaretskii <eliz@gnu.org> writes: >> This algorithm (which the code implements) is wrong: the unwanted >> reordering can happen even if the string does not end in a strong R >> character. It could end in a series of weak characters, if the strong >> character preceding that is R, for example. OK, so basically we have to scan the entire string like this, right? (let ((len (length str)) (n 0) rtl-found) (while (and (not rtl-found) (< n len)) (setq rtl-found (memq (get-char-code-property (aref str n) 'bidi-class) '(R AL)) n (1+ n))) (if rtl-found (concat str (propertize (string ?\x200e) 'invisible t)) str))) > And since buffer-menu.el was already modified to use this function, it > is easy to see how this algorithm fails: make a buffer whose name is > made of all R2L characters with the "<1>" tail appended, then type > "C-x C-b" and watch the messed-up display. The original code treated > this case correctly. Actually, it looks as though the <1> is not treated properly, even with the old code. If I do (rename-buffer (concat "السّلام عليكم" "<1>")) then the buffer is displayed as 1>[some Arabic text]>, both in the mode-line and in the buffer menu. I guess the code that appends the "<n>" needs to use string-mark-left-to-right as well. For extra hilarity, get rid of the newline in this code fragment and watch as the redisplayed Emacs Lisp code turns into gibberish... ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Emacs-diffs] /srv/bzr/emacs/trunk r105429: New function `string-mark-left-to-right' for handling LRMs. 2011-08-11 19:01 ` Chong Yidong @ 2011-08-12 7:21 ` Eli Zaretskii 2011-08-12 15:47 ` Chong Yidong 2011-08-13 7:00 ` Kenichi Handa 0 siblings, 2 replies; 22+ messages in thread From: Eli Zaretskii @ 2011-08-12 7:21 UTC (permalink / raw) To: Chong Yidong; +Cc: monnier, emacs-devel > From: Chong Yidong <cyd@stupidchicken.com> > Cc: monnier@iro.umontreal.ca, emacs-devel@gnu.org > Date: Thu, 11 Aug 2011 15:01:16 -0400 > > OK, so basically we have to scan the entire string like this, right? Yes. Btw, is there a way to regex-search for a character by its bidi category? That would make the code more elegant, and probably quite faster as well. > (let ((len (length str)) > (n 0) > rtl-found) > (while (and (not rtl-found) (< n len)) > (setq rtl-found (memq (get-char-code-property > (aref str n) 'bidi-class) '(R AL)) ^^^^^^^ Make that '(R AL RLO), since the RLO character overrides the bidirectional properties of all the following characters with R. > > And since buffer-menu.el was already modified to use this function, it > > is easy to see how this algorithm fails: make a buffer whose name is > > made of all R2L characters with the "<1>" tail appended, then type > > "C-x C-b" and watch the messed-up display. The original code treated > > this case correctly. > > Actually, it looks as though the <1> is not treated properly, even with > the old code. If I do > > (rename-buffer (concat "السّلام عليكم" > "<1>")) > > then the buffer is displayed as 1>[some Arabic text]>, both in the > mode-line and in the buffer menu. I didn't mean the display of the buffer name itself; that is a separate issue, as discussed here: https://lists.gnu.org/archive/html/emacs-devel/2011-06/msg00712.html I meant the buffer menu layout: with the current bzr tip, the buffer size was displayed to the left of the buffer name, whereas the previous code displayed it to the right of the name, as with buffer names without R2L characters. > For extra hilarity, get rid of the newline in this code fragment and > watch as the redisplayed Emacs Lisp code turns into gibberish... That's one of the problems that are not yet solved in Emacs: how to display code with comments and strings in R2L languages without messing up the display and without giving up the reordering of the strings and comments. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Emacs-diffs] /srv/bzr/emacs/trunk r105429: New function `string-mark-left-to-right' for handling LRMs. 2011-08-12 7:21 ` Eli Zaretskii @ 2011-08-12 15:47 ` Chong Yidong 2011-08-12 15:54 ` Eli Zaretskii 2011-08-13 7:00 ` Kenichi Handa 1 sibling, 1 reply; 22+ messages in thread From: Chong Yidong @ 2011-08-12 15:47 UTC (permalink / raw) To: Eli Zaretskii; +Cc: monnier, emacs-devel Eli Zaretskii <eliz@gnu.org> writes: > Btw, is there a way to regex-search for a character by its bidi > category? That would make the code more elegant, and probably quite > faster as well. I don't think so, and anyway such a regex-search would still have to go through the functions of mule-cmds.el. A better optimization might be to provide the equivalents of `next-property-change' etc. for char code properties. In practice, testing indicates string-mark-left-to-right has acceptable speed for its present uses in buffer menu and tabulated list mode. Unless a real problem shows up (e.g. when we apply the fix to Gnus), let's revisit this issue later. >> (aref str n) 'bidi-class) '(R AL)) > ^^^^^^^ > Make that '(R AL RLO), since the RLO character overrides the > bidirectional properties of all the following characters with R. OK, committed along with doc changes. >> then the buffer is displayed as 1>[some Arabic text]>, both in the >> mode-line and in the buffer menu. > > I didn't mean the display of the buffer name itself; that is a > separate issue, as discussed here: Was there any resolution on that thread? ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Emacs-diffs] /srv/bzr/emacs/trunk r105429: New function `string-mark-left-to-right' for handling LRMs. 2011-08-12 15:47 ` Chong Yidong @ 2011-08-12 15:54 ` Eli Zaretskii 2011-08-12 16:00 ` Chong Yidong 0 siblings, 1 reply; 22+ messages in thread From: Eli Zaretskii @ 2011-08-12 15:54 UTC (permalink / raw) To: Chong Yidong; +Cc: monnier, emacs-devel > From: Chong Yidong <cyd@stupidchicken.com> > Cc: monnier@iro.umontreal.ca, emacs-devel@gnu.org > Date: Fri, 12 Aug 2011 11:47:00 -0400 > > Eli Zaretskii <eliz@gnu.org> writes: > > > Btw, is there a way to regex-search for a character by its bidi > > category? That would make the code more elegant, and probably quite > > faster as well. > > I don't think so, and anyway such a regex-search would still have to go > through the functions of mule-cmds.el. Which ones? > A better optimization might be to provide the equivalents of > `next-property-change' etc. for char code properties. That's almost trivial, but we are in feature freeze, right? > >> then the buffer is displayed as 1>[some Arabic text]>, both in the > >> mode-line and in the buffer menu. > > > > I didn't mean the display of the buffer name itself; that is a > > separate issue, as discussed here: > > Was there any resolution on that thread? I was under the impression that Someone(TM) volunteered to write a function that returns buffer names decorated with the directional control characters to make them display reasonably. But maybe I was mistaken. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Emacs-diffs] /srv/bzr/emacs/trunk r105429: New function `string-mark-left-to-right' for handling LRMs. 2011-08-12 15:54 ` Eli Zaretskii @ 2011-08-12 16:00 ` Chong Yidong 2011-08-12 17:25 ` Eli Zaretskii 0 siblings, 1 reply; 22+ messages in thread From: Chong Yidong @ 2011-08-12 16:00 UTC (permalink / raw) To: Eli Zaretskii; +Cc: monnier, emacs-devel Eli Zaretskii <eliz@gnu.org> writes: >> I don't think so, and anyway such a regex-search would still have to go >> through the functions of mule-cmds.el. > > Which ones? get-char-code-property, presumably (unless you want to replicate all the char-code table logic in C). >> A better optimization might be to provide the equivalents of >> `next-property-change' etc. for char code properties. > > That's almost trivial, but we are in feature freeze, right? Right, so I would put this off unless it is demonstrably needed. >> Was there any resolution on that thread? > > I was under the impression that Someone(TM) volunteered to write a > function that returns buffer names decorated with the directional > control characters to make them display reasonably. But maybe I was > mistaken. I can take a look. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Emacs-diffs] /srv/bzr/emacs/trunk r105429: New function `string-mark-left-to-right' for handling LRMs. 2011-08-12 16:00 ` Chong Yidong @ 2011-08-12 17:25 ` Eli Zaretskii 0 siblings, 0 replies; 22+ messages in thread From: Eli Zaretskii @ 2011-08-12 17:25 UTC (permalink / raw) To: Chong Yidong; +Cc: monnier, emacs-devel > From: Chong Yidong <cyd@stupidchicken.com> > Cc: monnier@iro.umontreal.ca, emacs-devel@gnu.org > Date: Fri, 12 Aug 2011 12:00:45 -0400 > > Eli Zaretskii <eliz@gnu.org> writes: > > >> I don't think so, and anyway such a regex-search would still have to go > >> through the functions of mule-cmds.el. > > > > Which ones? > > get-char-code-property, presumably (unless you want to replicate all the > char-code table logic in C). That char-code already exists on the C level: how do you think bidi.c knows the bidirectional properties of each character it encounters? > > I was under the impression that Someone(TM) volunteered to write a > > function that returns buffer names decorated with the directional > > control characters to make them display reasonably. But maybe I was > > mistaken. > > I can take a look. Thanks. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Emacs-diffs] /srv/bzr/emacs/trunk r105429: New function `string-mark-left-to-right' for handling LRMs. 2011-08-12 7:21 ` Eli Zaretskii 2011-08-12 15:47 ` Chong Yidong @ 2011-08-13 7:00 ` Kenichi Handa 2011-08-13 7:11 ` Eli Zaretskii 1 sibling, 1 reply; 22+ messages in thread From: Kenichi Handa @ 2011-08-13 7:00 UTC (permalink / raw) To: Eli Zaretskii; +Cc: cyd, monnier, emacs-devel In article <8362m3xfgz.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > Btw, is there a way to regex-search for a character by its bidi > category? That would make the code more elegant, and probably quite > faster as well. You can do that as follows: (1) Generate a special category table. (defvar special-category-table-for-bidi (let ((category-table (make-category-table)) (uniprop-table (unicode-property-table-internal 'bidi-class))) (define-category ?r "Bidi class R, AL, or RL" category-table) (map-char-table #'(lambda (key val) (if (memq val '(R AL RL)) (modify-category-entry key ?r category-table))) uniprop-table) category-table)) (2) Check if a string or buffer contains a special bidi-related charaters. ;; For string... (defun check-special-bidi-character (str) (with-category-table special-category-table-for-bidi (string-match "\\cr" str))) (check-special-bidi-character "abc") => nil (check-special-bidi-character "abc א") => 4 --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Emacs-diffs] /srv/bzr/emacs/trunk r105429: New function `string-mark-left-to-right' for handling LRMs. 2011-08-13 7:00 ` Kenichi Handa @ 2011-08-13 7:11 ` Eli Zaretskii 2011-08-13 7:42 ` Kenichi Handa 0 siblings, 1 reply; 22+ messages in thread From: Eli Zaretskii @ 2011-08-13 7:11 UTC (permalink / raw) To: Kenichi Handa; +Cc: cyd, monnier, emacs-devel > From: Kenichi Handa <handa@m17n.org> > Cc: cyd@stupidchicken.com, monnier@iro.umontreal.ca, emacs-devel@gnu.org > Date: Sat, 13 Aug 2011 16:00:40 +0900 > > ;; For string... > (defun check-special-bidi-character (str) > (with-category-table special-category-table-for-bidi > (string-match "\\cr" str))) > > (check-special-bidi-character "abc") => nil > (check-special-bidi-character "abc א") => 4 Thanks! I think we should have a few of such category-tables in Emacs by default. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Emacs-diffs] /srv/bzr/emacs/trunk r105429: New function `string-mark-left-to-right' for handling LRMs. 2011-08-13 7:11 ` Eli Zaretskii @ 2011-08-13 7:42 ` Kenichi Handa 2011-08-13 13:53 ` Stefan Monnier 2011-08-16 7:44 ` Eli Zaretskii 0 siblings, 2 replies; 22+ messages in thread From: Kenichi Handa @ 2011-08-13 7:42 UTC (permalink / raw) To: Eli Zaretskii; +Cc: cyd, monnier, emacs-devel In article <83hb5lwzt2.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > > From: Kenichi Handa <handa@m17n.org> > > Cc: cyd@stupidchicken.com, monnier@iro.umontreal.ca, emacs-devel@gnu.org > > Date: Sat, 13 Aug 2011 16:00:40 +0900 > > > > ;; For string... > > (defun check-special-bidi-character (str) > > (with-category-table special-category-table-for-bidi > > (string-match "\\cr" str))) > > > > (check-special-bidi-character "abc") => nil > > (check-special-bidi-character "abc א") => 4 > Thanks! I think we should have a few of such category-tables in Emacs > by default. As categories are not exclusive (i.e. one character can have multiple categories), I think you need just one category-table. In which, each character has a category uniquely corresponding to a bidi class (L, AL, etc), in addition, all some character has a category whose meaning is, for instance (one of R, AL, or RLO). For instance, if you define a cateogry ?R as bidi class R, and define a category ?r as one of (R, AL, or RLO), the character `א' has two categories ?R and ?r, which means (with-category-table special-category-table-for-bidi (cons (string-match "\\cR" "א") (string-match "\\cr" "א"))) => (0 . 0) As we can define 95 different categories in a single category table, I think the number of categories are sufficient. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Emacs-diffs] /srv/bzr/emacs/trunk r105429: New function `string-mark-left-to-right' for handling LRMs. 2011-08-13 7:42 ` Kenichi Handa @ 2011-08-13 13:53 ` Stefan Monnier 2011-08-14 16:21 ` Chong Yidong 2011-08-16 7:44 ` Eli Zaretskii 1 sibling, 1 reply; 22+ messages in thread From: Stefan Monnier @ 2011-08-13 13:53 UTC (permalink / raw) To: Kenichi Handa; +Cc: Eli Zaretskii, cyd, emacs-devel > As categories are not exclusive (i.e. one character can have > multiple categories), I think you need just one > category-table. In which, each character has a category > uniquely corresponding to a bidi class (L, AL, etc), in > addition, all some character has a category whose meaning > is, for instance (one of R, AL, or RLO). We could also add a new special regexp construct, similar to \c but for Unicode properties. The advantage being that it lets us use the Unicode tables without having to build a new category table that keeps another copy of it, and without having to use "(with-category-table special-category-table-for-bidi". Stefan ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Emacs-diffs] /srv/bzr/emacs/trunk r105429: New function `string-mark-left-to-right' for handling LRMs. 2011-08-13 13:53 ` Stefan Monnier @ 2011-08-14 16:21 ` Chong Yidong 0 siblings, 0 replies; 22+ messages in thread From: Chong Yidong @ 2011-08-14 16:21 UTC (permalink / raw) To: Stefan Monnier; +Cc: Eli Zaretskii, emacs-devel, Kenichi Handa Stefan Monnier <monnier@iro.umontreal.ca> writes: > We could also add a new special regexp construct, similar to \c but > for Unicode properties. The advantage being that it lets us use the > Unicode tables without having to build a new category table that keeps > another copy of it, and without having to use "(with-category-table > special-category-table-for-bidi". We can put it in the standard category table. This works: === modified file 'lisp/international/characters.el' *** lisp/international/characters.el 2011-07-06 22:43:48 +0000 --- lisp/international/characters.el 2011-08-14 02:21:19 +0000 *************** *** 114,119 **** --- 114,123 ---- Base characters (Unicode General Category L,N,P,S,Zs)") (define-category ?^ "Combining Combining diacritic or mark (Unicode General Category M)") + + ;; RTL scripts + (define-category ?R "Right-to-left + Characters with R, AL, or RLO bidi type.") \f ;;; Setting syntax and category. *************** *** 478,483 **** --- 482,494 ---- (modify-category-entry x category)) chars))))) + ;; RTL scripts. + + (map-char-table (lambda (key val) + (if (memq val '(R AL RLO)) + (modify-category-entry key ?R))) + (unicode-property-table-internal 'bidi-class)) + ;; Latin (modify-category-entry '(#x80 . #x024F) ?l) === modified file 'lisp/subr.el' *** lisp/subr.el 2011-08-12 15:43:30 +0000 --- lisp/subr.el 2011-08-13 15:42:16 +0000 *************** *** 3553,3568 **** If STR contains no RTL characters, return STR." (unless (stringp str) (signal 'wrong-type-argument (list 'stringp str))) ! (let ((len (length str)) ! (n 0) ! rtl-found) ! (while (and (not rtl-found) (< n len)) ! (setq rtl-found (memq (get-char-code-property ! (aref str n) 'bidi-class) '(R AL RLO)) ! n (1+ n))) ! (if rtl-found ! (concat str (propertize (string ?\x200e) 'invisible t)) ! str))) \f ;;;; invisibility specs --- 3553,3561 ---- If STR contains no RTL characters, return STR." (unless (stringp str) (signal 'wrong-type-argument (list 'stringp str))) ! (if (string-match "\\cR" str) ! (concat str (propertize (string ?\x200e) 'invisible t)) ! str)) \f ;;;; invisibility specs ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Emacs-diffs] /srv/bzr/emacs/trunk r105429: New function `string-mark-left-to-right' for handling LRMs. 2011-08-13 7:42 ` Kenichi Handa 2011-08-13 13:53 ` Stefan Monnier @ 2011-08-16 7:44 ` Eli Zaretskii 2011-08-16 23:57 ` Kenichi Handa 1 sibling, 1 reply; 22+ messages in thread From: Eli Zaretskii @ 2011-08-16 7:44 UTC (permalink / raw) To: Kenichi Handa; +Cc: cyd, monnier, emacs-devel > From: Kenichi Handa <handa@m17n.org> > Cc: cyd@stupidchicken.com, monnier@iro.umontreal.ca, emacs-devel@gnu.org > Date: Sat, 13 Aug 2011 16:42:43 +0900 > > > > (defun check-special-bidi-character (str) > > > (with-category-table special-category-table-for-bidi > > > (string-match "\\cr" str))) > > > > > > (check-special-bidi-character "abc") => nil > > > (check-special-bidi-character "abc א") => 4 > > > Thanks! I think we should have a few of such category-tables in Emacs > > by default. > > As categories are not exclusive (i.e. one character can have > multiple categories), I think you need just one > category-table. Would it be a good idea to add such categories to the standard category table? IOW, why do we need a special category table to search for these characters? ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Emacs-diffs] /srv/bzr/emacs/trunk r105429: New function `string-mark-left-to-right' for handling LRMs. 2011-08-16 7:44 ` Eli Zaretskii @ 2011-08-16 23:57 ` Kenichi Handa 2011-08-17 5:49 ` Eli Zaretskii 0 siblings, 1 reply; 22+ messages in thread From: Kenichi Handa @ 2011-08-16 23:57 UTC (permalink / raw) To: Eli Zaretskii; +Cc: cyd, monnier, emacs-devel In article <83r54lu7f4.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > > > > (defun check-special-bidi-character (str) > > > > (with-category-table special-category-table-for-bidi > > > > (string-match "\\cr" str))) > > > > > > > > (check-special-bidi-character "abc") => nil > > > > (check-special-bidi-character "abc א") => 4 > > > > > Thanks! I think we should have a few of such category-tables in Emacs > > > by default. > > > > As categories are not exclusive (i.e. one character can have > > multiple categories), I think you need just one > > category-table. > Would it be a good idea to add such categories to the standard > category table? IOW, why do we need a special category table to > search for these characters? We can define at most 95 categories in one table, and, in the standard category table, we already defined 41 categories. For bidi, we need at least 18 categories (there are 18 bidi classes) and a few more for combinations. Adding all of them to the standard category table makes the remaining category space less than half of the whole space. So, I think we should be careful. In addtion, adding them to the standard category table means we can't select a proper category mnemonic character. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Emacs-diffs] /srv/bzr/emacs/trunk r105429: New function `string-mark-left-to-right' for handling LRMs. 2011-08-16 23:57 ` Kenichi Handa @ 2011-08-17 5:49 ` Eli Zaretskii 2011-08-17 7:21 ` Kenichi Handa 0 siblings, 1 reply; 22+ messages in thread From: Eli Zaretskii @ 2011-08-17 5:49 UTC (permalink / raw) To: Kenichi Handa; +Cc: cyd, monnier, emacs-devel > From: Kenichi Handa <handa@m17n.org> > Cc: cyd@stupidchicken.com, monnier@iro.umontreal.ca, emacs-devel@gnu.org > Date: Wed, 17 Aug 2011 08:57:49 +0900 > > > Would it be a good idea to add such categories to the standard > > category table? IOW, why do we need a special category table to > > search for these characters? > > We can define at most 95 categories in one table, and, in > the standard category table, we already defined 41 > categories. > > For bidi, we need at least 18 categories (there are 18 bidi > classes) and a few more for combinations. Adding all of > them to the standard category table makes the remaining > category space less than half of the whole space. So, I > think we should be careful. I didn't mean to add each bidi type as a separate category (there are 19 of them, btw). I did mean to carefully define the most frequently needed categories, like the one which started this discussion, and add only those. The gain would be that we won't need to use with-category-table around code which needs to search for characters by their bidi types, and we will be able to combine bidi-related categories with other standard categories in the same regular expression. One possible set of categories is just the 3 bidi categories defined by UAX#9: Strong, Weak, and Neutral. We'd probably need to split the first one in two, depending on directionality, so Strong_R, Strong_L, Weak, and Neutral would be my initial guess. However, we should gather more experience before we decide. > In addtion, adding them to the standard category table means > we can't select a proper category mnemonic character. ?? We can use any one that is currently unused, no? Those that are used are shown by describe-categories, right? Or am I missing something? ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Emacs-diffs] /srv/bzr/emacs/trunk r105429: New function `string-mark-left-to-right' for handling LRMs. 2011-08-17 5:49 ` Eli Zaretskii @ 2011-08-17 7:21 ` Kenichi Handa 2011-08-17 9:15 ` Eli Zaretskii 2011-08-17 21:12 ` Chong Yidong 0 siblings, 2 replies; 22+ messages in thread From: Kenichi Handa @ 2011-08-17 7:21 UTC (permalink / raw) To: Eli Zaretskii; +Cc: cyd, monnier, emacs-devel In article <834o1gtwne.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > I didn't mean to add each bidi type as a separate category (there are > 19 of them, btw). Oops, sorry I mis-counted them. > I did mean to carefully define the most frequently > needed categories, like the one which started this discussion, and add > only those. The gain would be that we won't need to use > with-category-table around code which needs to search for characters > by their bidi types, and we will be able to combine bidi-related > categories with other standard categories in the same regular > expression. > One possible set of categories is just the 3 bidi categories defined > by UAX#9: Strong, Weak, and Neutral. We'd probably need to split the > first one in two, depending on directionality, so Strong_R, Strong_L, > Weak, and Neutral would be my initial guess. Ah, I see. It may be ok to add just a few categories to the standard categories table. > However, we should gather more experience before we decide. > > In addtion, adding them to the standard category table means > > we can't select a proper category mnemonic character. > ?? We can use any one that is currently unused, no? Those that are > used are shown by describe-categories, right? Yes. I just thought that it's difficult to find proper mnemonics for all 19 bidi classes among the unsed ones. By the way, Stefan' suggestion of extending regexp is also worth considering (though I have no idea what kind of format we can use for them). One more tip: It may be a little bit faster to use a bidi-specific category table with with-category-table because, in most cases, we can find a category set for a specific character faster. In a bidi-specific category table, most characters (e.g. all han characters) will have the same category set and thus the set is recorded for a group of characters. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Emacs-diffs] /srv/bzr/emacs/trunk r105429: New function `string-mark-left-to-right' for handling LRMs. 2011-08-17 7:21 ` Kenichi Handa @ 2011-08-17 9:15 ` Eli Zaretskii 2011-08-18 2:13 ` Kenichi Handa 2011-08-17 21:12 ` Chong Yidong 1 sibling, 1 reply; 22+ messages in thread From: Eli Zaretskii @ 2011-08-17 9:15 UTC (permalink / raw) To: Kenichi Handa; +Cc: cyd, monnier, emacs-devel > From: Kenichi Handa <handa@m17n.org> > Date: Wed, 17 Aug 2011 16:21:44 +0900 > Cc: cyd@stupidchicken.com, monnier@iro.umontreal.ca, emacs-devel@gnu.org > > By the way, Stefan' suggestion of extending regexp is also > worth considering (though I have no idea what kind of format > we can use for them). If the important categories are part of the standard category-table, then I don't see any advantages to Stefan's proposal. The underlying implementation will be the same: access to uniprop tables. > One more tip: It may be a little bit faster to use a > bidi-specific category table with with-category-table > because, in most cases, we can find a category set for a > specific character faster. In a bidi-specific category > table, most characters (e.g. all han characters) will have > the same category set and thus the set is recorded for a > group of characters. You mean, because CHAR_TABLE_REF will be faster? ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Emacs-diffs] /srv/bzr/emacs/trunk r105429: New function `string-mark-left-to-right' for handling LRMs. 2011-08-17 9:15 ` Eli Zaretskii @ 2011-08-18 2:13 ` Kenichi Handa 0 siblings, 0 replies; 22+ messages in thread From: Kenichi Handa @ 2011-08-18 2:13 UTC (permalink / raw) To: Eli Zaretskii; +Cc: cyd, monnier, emacs-devel In article <83y5yss8j5.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > > One more tip: It may be a little bit faster to use a > > bidi-specific category table with with-category-table > > because, in most cases, we can find a category set for a > > specific character faster. In a bidi-specific category > > table, most characters (e.g. all han characters) will have > > the same category set and thus the set is recorded for a > > group of characters. > You mean, because CHAR_TABLE_REF will be faster? Yes. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Emacs-diffs] /srv/bzr/emacs/trunk r105429: New function `string-mark-left-to-right' for handling LRMs. 2011-08-17 7:21 ` Kenichi Handa 2011-08-17 9:15 ` Eli Zaretskii @ 2011-08-17 21:12 ` Chong Yidong 2011-08-18 7:09 ` Eli Zaretskii 1 sibling, 1 reply; 22+ messages in thread From: Chong Yidong @ 2011-08-17 21:12 UTC (permalink / raw) To: Kenichi Handa; +Cc: Eli Zaretskii, monnier, emacs-devel Kenichi Handa <handa@m17n.org> writes: > One more tip: It may be a little bit faster to use a bidi-specific > category table with with-category-table because, in most cases, we can > find a category set for a specific character faster. In a > bidi-specific category table, most characters (e.g. all han > characters) will have the same category set and thus the set is > recorded for a group of characters. Currently, we are using this only for string-mark-left-to-right, and performance does not seem to be a problem for that usage. Also, we only need the "strong R" category. We could add the "strong L" category for symmetry, but I don't see the need to add the other bidi categories until they are called for. So, I propose adding ?L - Strong-L bidi types (L, LRE, LRO) ?R - Strong-R bidi types (R, AL, RLE, RLO) to the standard category table. Sound fine? ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Emacs-diffs] /srv/bzr/emacs/trunk r105429: New function `string-mark-left-to-right' for handling LRMs. 2011-08-17 21:12 ` Chong Yidong @ 2011-08-18 7:09 ` Eli Zaretskii 0 siblings, 0 replies; 22+ messages in thread From: Eli Zaretskii @ 2011-08-18 7:09 UTC (permalink / raw) To: Chong Yidong; +Cc: emacs-devel, monnier, handa > From: Chong Yidong <cyd@stupidchicken.com> > Cc: Eli Zaretskii <eliz@gnu.org>, monnier@iro.umontreal.ca, > emacs-devel@gnu.org > Date: Wed, 17 Aug 2011 17:12:16 -0400 > > Also, we only need the "strong R" category. We could add the "strong L" > category for symmetry, but I don't see the need to add the other bidi > categories until they are called for. We will need the weak and neutral categories (or at least a single category for both of them) if we ever want to become more accurate about the need for placing LRM/RLM to fix the display. That's because the display becomes "messed-up" when a strong R character is followed by weak (e.g. digits) or neutral (whitespace, punctuation) characters. But it is okay to add that later, if you don't want to do that now. > So, I propose adding > > ?L - Strong-L bidi types (L, LRE, LRO) > ?R - Strong-R bidi types (R, AL, RLE, RLO) > > to the standard category table. > > Sound fine? Yep. Thanks. ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2011-08-18 7:09 UTC | newest] Thread overview: 22+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <E1QrEHF-0003qX-I0@vcs.savannah.gnu.org> 2011-08-11 2:14 ` [Emacs-diffs] /srv/bzr/emacs/trunk r105429: New function `string-mark-left-to-right' for handling LRMs Stefan Monnier 2011-08-11 3:02 ` Eli Zaretskii 2011-08-11 4:48 ` Eli Zaretskii 2011-08-11 19:01 ` Chong Yidong 2011-08-12 7:21 ` Eli Zaretskii 2011-08-12 15:47 ` Chong Yidong 2011-08-12 15:54 ` Eli Zaretskii 2011-08-12 16:00 ` Chong Yidong 2011-08-12 17:25 ` Eli Zaretskii 2011-08-13 7:00 ` Kenichi Handa 2011-08-13 7:11 ` Eli Zaretskii 2011-08-13 7:42 ` Kenichi Handa 2011-08-13 13:53 ` Stefan Monnier 2011-08-14 16:21 ` Chong Yidong 2011-08-16 7:44 ` Eli Zaretskii 2011-08-16 23:57 ` Kenichi Handa 2011-08-17 5:49 ` Eli Zaretskii 2011-08-17 7:21 ` Kenichi Handa 2011-08-17 9:15 ` Eli Zaretskii 2011-08-18 2:13 ` Kenichi Handa 2011-08-17 21:12 ` Chong Yidong 2011-08-18 7:09 ` Eli Zaretskii
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).