From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Miles Bader <miles@lsi.nec.co.jp>
Newsgroups: gmane.emacs.devel
Subject: Re: announcing thaiword.el?
Date: Tue, 29 Mar 2005 17:35:15 +0900
Message-ID: <buo1x9yvoho.fsf@mctpc71.ucom.lsi.nec.co.jp>
References: <E1DEiWs-0003pT-Qq@fencepost.gnu.org>
	<20050325.081838.163323532.wl@gnu.org>
	<m1u0mzvldv.fsf-monnier+emacs@gnu.org>
	<20050325.232613.73792307.wl@gnu.org>
	<200503260106.KAA20718@etlken.m17n.org>
	<E1DFOpv-0000BQ-UI@fencepost.gnu.org>
	<200503280047.JAA25472@etlken.m17n.org>
Reply-To: Miles Bader <miles@gnu.org>
NNTP-Posting-Host: main.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: sea.gmane.org 1112085507 23909 80.91.229.2 (29 Mar 2005 08:38:27 GMT)
X-Complaints-To: usenet@sea.gmane.org
NNTP-Posting-Date: Tue, 29 Mar 2005 08:38:27 +0000 (UTC)
Cc: emacs-devel@gnu.org, rms@gnu.org, monnier@iro.umontreal.ca
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Mar 29 10:38:23 2005
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Original-Received: from lists.gnu.org ([199.232.76.165])
	by ciao.gmane.org with esmtp (Exim 4.43)
	id 1DGCEV-0000MU-PR
	for ged-emacs-devel@m.gmane.org; Tue, 29 Mar 2005 10:37:56 +0200
Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1DGCUl-0005Tf-I3
	for ged-emacs-devel@m.gmane.org; Tue, 29 Mar 2005 03:54:43 -0500
Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1DGCTl-0005CN-9I
	for emacs-devel@gnu.org; Tue, 29 Mar 2005 03:53:42 -0500
Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1DGCTb-00057Z-SO
	for emacs-devel@gnu.org; Tue, 29 Mar 2005 03:53:35 -0500
Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1DGCTa-00056Y-Cr
	for emacs-devel@gnu.org; Tue, 29 Mar 2005 03:53:30 -0500
Original-Received: from [210.143.35.51] (helo=tyo201.gate.nec.co.jp)
	by monty-python.gnu.org with esmtp (Exim 4.34)
	id 1DGCCF-0000PA-Sv; Tue, 29 Mar 2005 03:35:36 -0500
Original-Received: from mailgate4.nec.co.jp (mailgate53.nec.co.jp [10.7.69.184])
	by tyo201.gate.nec.co.jp (8.11.7/3.7W01080315) with ESMTP id
	j2T8ZKO01754; Tue, 29 Mar 2005 17:35:20 +0900 (JST)
Original-Received: (from root@localhost) by mailgate4.nec.co.jp
	(8.11.7/3.7W-MAILGATE-NEC)
	id j2T8ZJf09317; Tue, 29 Mar 2005 17:35:19 +0900 (JST)
Original-Received: from edsgm01.lsi.nec.co.jp ([10.50.208.11]) by mailsv5.nec.co.jp
	(8.11.7/3.7W-MAILSV4-NEC) with ESMTP
	id j2T8ZJn16737; Tue, 29 Mar 2005 17:35:19 +0900 (JST)
Original-Received: from mcsss2.ucom.lsi.nec.co.jp (localhost [127.0.0.1])
	by edsgm01.lsi.nec.co.jp (8.12.10/8.12.10) with ESMTP id j2T8ZHbE016284;
	Tue, 29 Mar 2005 17:35:17 +0900 (JST)
Original-Received: from mctpc71 (mctpc71.ucom.lsi.nec.co.jp [10.30.118.121])
	by mcsss2.ucom.lsi.nec.co.jp (8.12.10/8.12.8/EDcg v2.01-mc/1046780839)
	with ESMTP id j2T8ZGKt020070; Tue, 29 Mar 2005 17:35:16 +0900 (JST)
Original-Received: by mctpc71 (Postfix, from userid 31295)
	id D1AAB2A; Tue, 29 Mar 2005 17:35:15 +0900 (JST)
Original-To: Kenichi Handa <handa@m17n.org>
System-Type: i686-pc-linux-gnu
Blat: Foop
In-Reply-To: <200503280047.JAA25472@etlken.m17n.org> (Kenichi Handa's
	message of "Mon, 28 Mar 2005 09:47:09 +0900 (JST)")
Original-Lines: 31
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:35290
X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:35290

On Mon, 28 Mar 2005 09:47:09 +0900 (JST), Kenichi Handa <handa@m17n.org> wrote:
> To handle the regular expression "\\b" and "\\B" correctly
> for Thai, we need a bigger change in regex.c.  For the
> moment, I have no idea how to do that.

Current extensions to "word syntax", using `word-separating-categories'
etc., seem to do the correct thing with regexps.[*]  Perhaps some
extension to that mechanism would work.

For instance, what if entries in `word-separating-categories' could have an
optional predicate function -- in addition to the current (CAT1 . CAT2)
format, allow (CAT1 CAT2 PREDICATE-FUN), and only consider the entry to
match if PREDICATE-FUN fun (with some apropriate args) also returns true?

Then for a case like Thai, where you want to do more complicated tests
to establish word-boundaries inside sequences of non-delimited text,
could use a "degenerate" entry in `word-separating-categories' with both
CAT1 and CAT2 the same, but also with a predicate attached to do the
more complicated test.  I suppose that would slow down word matching
when the predicate is called, but it would only happen for text where
that is appropriate.

-Miles

[*] I was surprised that this is true, and I don't understand why from
    my quick look at regex.c :-/ ... But my simple tests seem to show
    that it does really work.  E.g., I can add '(?C . ?C) to
    `word-separating-categories', and then a regexp search will suddenly
    start considering every single kanji character as a standalone word.
-- 
Do not taunt Happy Fun Ball.