From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: find-composition still depends on the composition property Date: Thu, 23 Oct 2008 10:18:22 +0900 Message-ID: References: <87tzbh7kd9.fsf@jurta.org> <87tzb5ikrw.fsf@jurta.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1224724788 31113 80.91.229.12 (23 Oct 2008 01:19:48 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 23 Oct 2008 01:19:48 +0000 (UTC) Cc: juri@jurta.org, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Oct 23 03:20:49 2008 connect(): Connection refused Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1Ksosa-0002Pm-UY for ged-emacs-devel@m.gmane.org; Thu, 23 Oct 2008 03:20:49 +0200 Original-Received: from localhost ([127.0.0.1]:50829 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KsorV-0003oM-Bg for ged-emacs-devel@m.gmane.org; Wed, 22 Oct 2008 21:19:41 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1KsoqM-0003Ph-U7 for emacs-devel@gnu.org; Wed, 22 Oct 2008 21:18:30 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1KsoqL-0003Or-5L for emacs-devel@gnu.org; Wed, 22 Oct 2008 21:18:30 -0400 Original-Received: from [199.232.76.173] (port=42037 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KsoqK-0003Ok-Vf for emacs-devel@gnu.org; Wed, 22 Oct 2008 21:18:29 -0400 Original-Received: from mx1.aist.go.jp ([150.29.246.133]:55723) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1KsoqI-0002bj-IQ; Wed, 22 Oct 2008 21:18:27 -0400 Original-Received: from rqsmtp1.aist.go.jp (rqsmtp1.aist.go.jp [150.29.254.115]) by mx1.aist.go.jp with ESMTP id m9N1IOUi014888; Thu, 23 Oct 2008 10:18:24 +0900 (JST) env-from (handa@m17n.org) Original-Received: from smtp1.aist.go.jp by rqsmtp1.aist.go.jp with ESMTP id m9N1INdx000453; Thu, 23 Oct 2008 10:18:23 +0900 (JST) env-from (handa@m17n.org) Original-Received: by smtp1.aist.go.jp with ESMTP id m9N1IMi8020614; Thu, 23 Oct 2008 10:18:23 +0900 (JST) env-from (handa@m17n.org) Original-Received: from handa by etlken.m17n.org with local (Exim 4.69) (envelope-from ) id 1KsoqE-0006VL-Ty; Thu, 23 Oct 2008 10:18:22 +0900 In-reply-to: (message from Eli Zaretskii on Wed, 22 Oct 2008 21:35:40 +0200) X-MIME-Autoconverted: from 8bit to quoted-printable by mx1.aist.go.jp id m9N1IOUi014888 X-detected-operating-system: by monty-python.gnu.org: Solaris 9 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:104881 Archived-At: In article , Eli Zaretskii writes: > Thanks, but Emacs still does not get this quite right. For example, > in the following line: > =D7=90=D7=91=D7=92=D7=93=D7=9412345 > Which mixes Hebrew letters with digits, M-f stops at the first digit, > whereas in this line: > abcde12345 > it does not. The latter behavior is correct, the former is not. (I'm > ashamed to admit that even MS Word gets it right.) > I understand that the way for fixing this would be to install more > entries in word-combining-categories, but more infrastructure seems to > be missing, since right now no characters have the "Hebrew" category, > for example (at least judging by the output of describe-categories). Then what to do is: (1-1) assign the category "6" (digit) to "0123456789". (1-2) define a category, say "D", and assign it to all characters that have no word-boundary between digits. (1-3) add (?D . ?6) and (?6 . ?D) to word-combining-categories. Another way is: (2-1) modify word_boundary_p to handle negative category mnemonic in word-*-categories to catch a character that doesn't have the specified category. (2-2) assign the category "6" (digit) to "0123456789". (2-3) define a category, say "X", and assign it to all characters that have word-boundary between digits. (2-4) add ((- ?X) . ?6) and (?6 . (- ?X)) to word-combining-categories. Or, (3-1) Make `common' script and classify digits, etc to it. (3-2) modify word_boundary_p not to distinguish `common' from any other script. (3-3) define a category, say "X", and assign it to all characters that have word-boundary between digits. (3-4) add (?X . ?6) and (?6 . ?X) to word-separating-categories. > By the way, I'd suggest to move the legend generated by > describe-categories to the beginning of the buffer, because the buffer > is huge and it does not say anywhere at the beginning that there's a > legend at the end. Without the legend, the buffer looks like a large > pile of gibberish. The legend is longer than 40 lines. If we put that at the head, it will occupy the whole first page, which I think is not that good. Saying something like "See the end of the buffer for the legend." with "legend" clickable at the first line will be good. What do you think? > And another wish: can we have word-combining-categories and > word-separating-categories display their elements with human-readable > letters, not as their ASCII codes? (Quick: what letter is code 94?) How about modifing word_boundary_p to accept a mnemonic string (instead of a mnemonic character) in those variables? Then we can specify multiple categories in the string to catch a character that have one of them. --- Kenichi Handa handa@ni.aist.go.jp =20