From: Kenichi Handa <handa@m17n.org>
Cc: emacs-devel@gnu.org, rms@gnu.org, monnier@iro.umontreal.ca
Subject: Re: announcing thaiword.el?
Date: Tue, 29 Mar 2005 18:02:51 +0900 (JST) [thread overview]
Message-ID: <200503290902.SAA29414@etlken.m17n.org> (raw)
In-Reply-To: <buo1x9yvoho.fsf@mctpc71.ucom.lsi.nec.co.jp> (message from Miles Bader on Tue, 29 Mar 2005 17:35:15 +0900)
In article <buo1x9yvoho.fsf@mctpc71.ucom.lsi.nec.co.jp>, Miles Bader <miles@lsi.nec.co.jp> writes:
> On Mon, 28 Mar 2005 09:47:09 +0900 (JST), Kenichi Handa <handa@m17n.org> wrote:
>> To handle the regular expression "\\b" and "\\B" correctly
>> for Thai, we need a bigger change in regex.c. For the
>> moment, I have no idea how to do that.
> Current extensions to "word syntax", using `word-separating-categories'
> etc., seem to do the correct thing with regexps.[*] Perhaps some
> extension to that mechanism would work.
> For instance, what if entries in `word-separating-categories' could have an
> optional predicate function -- in addition to the current (CAT1 . CAT2)
> format, allow (CAT1 CAT2 PREDICATE-FUN), and only consider the entry to
> match if PREDICATE-FUN fun (with some apropriate args) also returns true?
The problem is that the innermost function
re_match_2_internal doesn't know about the original buffer
or Lisp string. So, to make PREDICATE-FUN work, we must
generate a Lisp string each time and that will be extemely
slow. And first of all, is re_match_2_internal a safe place
to call a Lisp function?
> [*] I was surprised that this is true, and I don't understand why from
> my quick look at regex.c :-/ ... But my simple tests seem to show
> that it does really work. E.g., I can add '(?C . ?C) to
> `word-separating-categories', and then a regexp search will suddenly
> start considering every single kanji character as a standalone word.
I spent fairy long time to make it work. :-p
re_match_2_internal calls the macro WORD_BOUNDARY_P at
proper places. It is also used in scan_words (syntax.c).
---
Ken'ichi HANDA
handa@m17n.org
next prev parent reply other threads:[~2005-03-29 9:02 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-03-24 7:29 announcing thaiword.el? Werner LEMBERG
2005-03-24 19:41 ` Eli Zaretskii
2005-03-24 21:11 ` Werner LEMBERG
2005-03-30 7:16 ` Werner LEMBERG
2005-03-30 11:34 ` Kenichi Handa
2005-03-25 6:42 ` Richard Stallman
2005-03-25 7:18 ` Werner LEMBERG
2005-03-25 14:44 ` Stefan Monnier
2005-03-25 22:26 ` Werner LEMBERG
2005-03-26 1:06 ` Kenichi Handa
2005-03-26 15:21 ` Stefan Monnier
2005-03-29 8:10 ` Kenichi Handa
2005-03-27 3:53 ` Richard Stallman
2005-03-28 0:47 ` Kenichi Handa
2005-03-28 22:53 ` Richard Stallman
2005-03-29 7:25 ` Kim F. Storm
2005-03-29 11:35 ` Kenichi Handa
2005-03-29 5:44 ` Juri Linkov
2005-03-29 8:35 ` Miles Bader
2005-03-29 9:02 ` Kenichi Handa [this message]
2005-03-29 10:14 ` Miles Bader
2005-03-29 11:29 ` Kenichi Handa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200503290902.SAA29414@etlken.m17n.org \
--to=handa@m17n.org \
--cc=emacs-devel@gnu.org \
--cc=monnier@iro.umontreal.ca \
--cc=rms@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).