From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: announcing thaiword.el? Date: Tue, 29 Mar 2005 18:02:51 +0900 (JST) Message-ID: <200503290902.SAA29414@etlken.m17n.org> References: <20050325.081838.163323532.wl@gnu.org> <20050325.232613.73792307.wl@gnu.org> <200503260106.KAA20718@etlken.m17n.org> <200503280047.JAA25472@etlken.m17n.org> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII X-Trace: sea.gmane.org 1112091550 8959 80.91.229.2 (29 Mar 2005 10:19:10 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Tue, 29 Mar 2005 10:19:10 +0000 (UTC) Cc: emacs-devel@gnu.org, rms@gnu.org, monnier@iro.umontreal.ca Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Mar 29 12:19:01 2005 Return-path: Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1DGDoH-000245-G2 for ged-emacs-devel@m.gmane.org; Tue, 29 Mar 2005 12:18:57 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1DGE4X-0007xH-4Q for ged-emacs-devel@m.gmane.org; Tue, 29 Mar 2005 05:35:45 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1DGDZt-0000mG-CG for emacs-devel@gnu.org; Tue, 29 Mar 2005 05:04:05 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1DGDZn-0000jl-Ea for emacs-devel@gnu.org; Tue, 29 Mar 2005 05:04:02 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1DGDRh-00079v-Bh for emacs-devel@gnu.org; Tue, 29 Mar 2005 04:55:37 -0500 Original-Received: from [192.47.44.130] (helo=tsukuba.m17n.org) by monty-python.gnu.org with esmtp (TLSv1:DES-CBC3-SHA:168) (Exim 4.34) id 1DGCl5-0004Yj-DF; Tue, 29 Mar 2005 04:11:36 -0500 Original-Received: from nfs.m17n.org (nfs.m17n.org [192.47.44.7]) by tsukuba.m17n.org (8.12.3/8.12.3/Debian-7.1) with ESMTP id j2T92pdY001216; Tue, 29 Mar 2005 18:02:51 +0900 Original-Received: from etlken.m17n.org (etlken.m17n.org [192.47.44.125]) by nfs.m17n.org (8.12.3/8.12.3/Debian-7.1) with ESMTP id j2T92pDI016424; Tue, 29 Mar 2005 18:02:51 +0900 Original-Received: (from handa@localhost) by etlken.m17n.org (8.8.8+Sun/3.7W-2001040620) id SAA29414; Tue, 29 Mar 2005 18:02:51 +0900 (JST) Original-To: Miles Bader In-reply-to: (message from Miles Bader on Tue, 29 Mar 2005 17:35:15 +0900) User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/22.0.50 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:35295 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:35295 In article , Miles Bader writes: > On Mon, 28 Mar 2005 09:47:09 +0900 (JST), Kenichi Handa wrote: >> To handle the regular expression "\\b" and "\\B" correctly >> for Thai, we need a bigger change in regex.c. For the >> moment, I have no idea how to do that. > Current extensions to "word syntax", using `word-separating-categories' > etc., seem to do the correct thing with regexps.[*] Perhaps some > extension to that mechanism would work. > For instance, what if entries in `word-separating-categories' could have an > optional predicate function -- in addition to the current (CAT1 . CAT2) > format, allow (CAT1 CAT2 PREDICATE-FUN), and only consider the entry to > match if PREDICATE-FUN fun (with some apropriate args) also returns true? The problem is that the innermost function re_match_2_internal doesn't know about the original buffer or Lisp string. So, to make PREDICATE-FUN work, we must generate a Lisp string each time and that will be extemely slow. And first of all, is re_match_2_internal a safe place to call a Lisp function? > [*] I was surprised that this is true, and I don't understand why from > my quick look at regex.c :-/ ... But my simple tests seem to show > that it does really work. E.g., I can add '(?C . ?C) to > `word-separating-categories', and then a regexp search will suddenly > start considering every single kanji character as a standalone word. I spent fairy long time to make it work. :-p re_match_2_internal calls the macro WORD_BOUNDARY_P at proper places. It is also used in scan_words (syntax.c). --- Ken'ichi HANDA handa@m17n.org