From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Composing Hebrew diacriticals Date: Fri, 14 May 2010 13:02:13 +0300 Message-ID: <837hn64x96.fsf@gnu.org> References: <83mxwlw2c0.fsf@gnu.org> <83eihojc1z.fsf@gnu.org> <83pr12pfw6.fsf@gnu.org> <83fx1xowfj.fsf@gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: lo.gmane.org X-Trace: dough.gmane.org 1273831381 25670 80.91.229.12 (14 May 2010 10:03:01 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Fri, 14 May 2010 10:03:01 +0000 (UTC) Cc: yair.f.lists@gmail.com, emacs-devel@gnu.org To: Kenichi Handa , Jason Rumney Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri May 14 12:02:59 2010 connect(): No such file or directory Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1OCrjP-0005fJ-E8 for ged-emacs-devel@m.gmane.org; Fri, 14 May 2010 12:02:59 +0200 Original-Received: from localhost ([127.0.0.1]:42461 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OCrjP-0007Ki-20 for ged-emacs-devel@m.gmane.org; Fri, 14 May 2010 06:02:59 -0400 Original-Received: from [140.186.70.92] (port=37946 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OCrjG-0007GM-B0 for emacs-devel@gnu.org; Fri, 14 May 2010 06:02:51 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OCrjE-0007F3-Vx for emacs-devel@gnu.org; Fri, 14 May 2010 06:02:50 -0400 Original-Received: from mtaout22.012.net.il ([80.179.55.172]:50036) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OCrjE-0007Eq-PI; Fri, 14 May 2010 06:02:48 -0400 Original-Received: from conversion-daemon.a-mtaout22.012.net.il by a-mtaout22.012.net.il (HyperSendmail v2007.08) id <0L2E00B00MIWCW00@a-mtaout22.012.net.il>; Fri, 14 May 2010 13:02:14 +0300 (IDT) Original-Received: from HOME-C4E4A596F7 ([77.127.206.56]) by a-mtaout22.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0L2E006PQMJORAB0@a-mtaout22.012.net.il>; Fri, 14 May 2010 13:02:14 +0300 (IDT) In-reply-to: X-012-Sender: halo1@inter.net.il X-detected-operating-system: by eggs.gnu.org: Solaris 10 (beta) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:124762 Archived-At: > From: Kenichi Handa > Cc: eliz@gnu.org, yair.f.lists@gmail.com, emacs-devel@gnu.org > Date: Fri, 14 May 2010 17:10:33 +0900 > > I've just committed a fix. > > Eli, please check the comments of set_iterator_to_next, and > verify that I'm doing the right thing. It looks okay at a first glance, thank you! In the HELLO buffer, the RLM character is not composed with the following parenthesis, though. Is this a separate problem? I will work on the issues you raised in the comments. For now, I have just one response: in this fragment from set_iterator_to_next: /* Update IT's char/byte positions to point the first character of the next grapheme cluster, or to the character visually after the current composition. */ #if 0 /* Is it ok to do this directly? */ IT_CHARPOS (*it) += it->cmp_it.nchars; IT_BYTEPOS (*it) += it->cmp_it.nbytes; #else /* Or do we have to call bidi_get_next_char_visually repeatedly (perhaps not to confuse some internal state of bidi_it)? At least we must do this if we have consumed all grapheme clusters in the current composition because the next character will be in the different bidi level. */ for (i = 0; i < it->cmp_it.nchars; i++) bidi_get_next_char_visually (&it->bidi_it); the "#else" part is doing TRT. You cannot jump to a different place in the buffer/string behind the back of bidi_get_next_char_visually, because that would violate the integrity of its internal cache, which must correspond to the buffer/string positions 1:1. > I have not yet committed proper codes for Hebrew > composition. I'm now testing with this simple version. > > (let ((pattern "[\u05D0-\u05F2][\u0591-\u05BF\u05C1-\u05C5\u05C7]+")) > (set-char-table-range > composition-function-table '(#x591 . #x5C7) > (list (vector pattern 1 'font-shape-gstring) > ["[\u0591-\u05C7]" 0 font-shape-gstring])) > (set-char-table-range > composition-function-table #x5C0 nil) > (set-char-table-range > composition-function-table #x5C6 nil)) Could you please look at the message I posted in http://lists.gnu.org/archive/html/emacs-devel/2010-05/msg00251.html? I still see the infloop, with the current trunk, even when bidi-display-reordering is set to nil, after I type BET and DAGESH, as described in that message. What kind of problems in the information that Uniscribe returns to Emacs could cause such a loop? If I type a different diacritical after BET, like PATAH, there's no infloop, but the display is incorrect: I see both the isolated PATAH and the composed BAT+PATAH after it. Jason, could you help me with this? It looks like some Uniscribe-specific issue. TIA