From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Kenichi Handa <handa@m17n.org>
Newsgroups: gmane.emacs.devel
Subject: Re: find-composition still depends on the composition property
Date: Thu, 23 Oct 2008 10:18:22 +0900
Message-ID: <E1KsoqE-0006VL-Ty@etlken.m17n.org>
References: <f7ccd24b0808290646r7ce000aet3aa6af5a1315b9d3@mail.gmail.com>
	<E1KbQ3Z-0004XB-Ck@etlken.m17n.org> <87tzbh7kd9.fsf@jurta.org>
	<E1KroXF-0005mK-07@etlken.m17n.org> <87tzb5ikrw.fsf@jurta.org>
	<E1KsSMH-0005T2-Ek@etlken.m17n.org>
	<E1KsWHz-00079W-SZ@etlken.m17n.org> <uwsg0e837.fsf@gnu.org>
NNTP-Posting-Host: lo.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Trace: ger.gmane.org 1224724788 31113 80.91.229.12 (23 Oct 2008 01:19:48 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Thu, 23 Oct 2008 01:19:48 +0000 (UTC)
Cc: juri@jurta.org, emacs-devel@gnu.org
To: Eli Zaretskii <eliz@gnu.org>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Oct 23 03:20:49 2008
connect(): Connection refused
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([199.232.76.165])
	by lo.gmane.org with esmtp (Exim 4.50)
	id 1Ksosa-0002Pm-UY
	for ged-emacs-devel@m.gmane.org; Thu, 23 Oct 2008 03:20:49 +0200
Original-Received: from localhost ([127.0.0.1]:50829 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1KsorV-0003oM-Bg
	for ged-emacs-devel@m.gmane.org; Wed, 22 Oct 2008 21:19:41 -0400
Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1KsoqM-0003Ph-U7
	for emacs-devel@gnu.org; Wed, 22 Oct 2008 21:18:30 -0400
Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1KsoqL-0003Or-5L
	for emacs-devel@gnu.org; Wed, 22 Oct 2008 21:18:30 -0400
Original-Received: from [199.232.76.173] (port=42037 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1KsoqK-0003Ok-Vf
	for emacs-devel@gnu.org; Wed, 22 Oct 2008 21:18:29 -0400
Original-Received: from mx1.aist.go.jp ([150.29.246.133]:55723)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <handa@m17n.org>)
	id 1KsoqI-0002bj-IQ; Wed, 22 Oct 2008 21:18:27 -0400
Original-Received: from rqsmtp1.aist.go.jp (rqsmtp1.aist.go.jp [150.29.254.115])
	by mx1.aist.go.jp  with ESMTP id m9N1IOUi014888;
	Thu, 23 Oct 2008 10:18:24 +0900 (JST) env-from (handa@m17n.org)
Original-Received: from smtp1.aist.go.jp
	by rqsmtp1.aist.go.jp  with ESMTP id m9N1INdx000453;
	Thu, 23 Oct 2008 10:18:23 +0900 (JST) env-from (handa@m17n.org)
Original-Received: by smtp1.aist.go.jp  with ESMTP id m9N1IMi8020614;
	Thu, 23 Oct 2008 10:18:23 +0900 (JST) env-from (handa@m17n.org)
Original-Received: from handa by etlken.m17n.org with local (Exim 4.69)
	(envelope-from <handa@m17n.org>)
	id 1KsoqE-0006VL-Ty; Thu, 23 Oct 2008 10:18:22 +0900
In-reply-to: <uwsg0e837.fsf@gnu.org> (message from Eli Zaretskii on Wed, 22
	Oct 2008 21:35:40 +0200)
X-MIME-Autoconverted: from 8bit to quoted-printable by mx1.aist.go.jp id
	m9N1IOUi014888
X-detected-operating-system: by monty-python.gnu.org: Solaris 9
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:104881
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/104881>

In article <uwsg0e837.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> Thanks, but Emacs still does not get this quite right.  For example,
> in the following line:

>   =D7=90=D7=91=D7=92=D7=93=D7=9412345

> Which mixes Hebrew letters with digits, M-f stops at the first digit,
> whereas in this line:

>   abcde12345

> it does not.  The latter behavior is correct, the former is not.  (I'm
> ashamed to admit that even MS Word gets it right.)

> I understand that the way for fixing this would be to install more
> entries in word-combining-categories, but more infrastructure seems to
> be missing, since right now no characters have the "Hebrew" category,
> for example (at least judging by the output of describe-categories).

Then what to do is:

(1-1) assign the category "6" (digit) to "0123456789".
(1-2) define a category, say "D", and assign it to all
characters that have no word-boundary between digits.
(1-3) add (?D . ?6) and (?6 . ?D) to word-combining-categories.

Another way is:

(2-1) modify word_boundary_p to handle negative category mnemonic in
word-*-categories to catch a character that doesn't have the
specified category.
(2-2) assign the category "6" (digit) to "0123456789".
(2-3) define a category, say "X", and assign it to all
characters that have word-boundary between digits.
(2-4) add ((- ?X) . ?6) and (?6 . (- ?X)) to
word-combining-categories.

Or,

(3-1) Make `common' script and classify digits, etc to it.
(3-2) modify word_boundary_p not to distinguish `common' from
any other script.
(3-3) define a category, say "X", and assign it to all
characters that have word-boundary between digits.
(3-4) add (?X . ?6) and (?6 . ?X) to
word-separating-categories.

> By the way, I'd suggest to move the legend generated by
> describe-categories to the beginning of the buffer, because the buffer
> is huge and it does not say anywhere at the beginning that there's a
> legend at the end.  Without the legend, the buffer looks like a large
> pile of gibberish.

The legend is longer than 40 lines.  If we put that at the
head, it will occupy the whole first page, which I think is
not that good.  Saying something like "See the end of the
buffer for the legend." with "legend" clickable at the first
line will be good.  What do you think?

> And another wish: can we have word-combining-categories and
> word-separating-categories display their elements with human-readable
> letters, not as their ASCII codes?  (Quick: what letter is code 94?)

How about modifing word_boundary_p to accept a mnemonic
string (instead of a mnemonic character) in those variables?
Then we can specify multiple categories in the string to
catch a character that have one of them.

---
Kenichi Handa
handa@ni.aist.go.jp
=20