all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Kenichi Handa <handa@m17n.org>
Cc: emacs-devel@gnu.org, rms@gnu.org, monnier@iro.umontreal.ca
Subject: Re: announcing thaiword.el?
Date: Tue, 29 Mar 2005 18:02:51 +0900 (JST)	[thread overview]
Message-ID: <200503290902.SAA29414@etlken.m17n.org> (raw)
In-Reply-To: <buo1x9yvoho.fsf@mctpc71.ucom.lsi.nec.co.jp> (message from Miles Bader on Tue, 29 Mar 2005 17:35:15 +0900)

In article <buo1x9yvoho.fsf@mctpc71.ucom.lsi.nec.co.jp>, Miles Bader <miles@lsi.nec.co.jp> writes:

> On Mon, 28 Mar 2005 09:47:09 +0900 (JST), Kenichi Handa <handa@m17n.org> wrote:
>>  To handle the regular expression "\\b" and "\\B" correctly
>>  for Thai, we need a bigger change in regex.c.  For the
>>  moment, I have no idea how to do that.

> Current extensions to "word syntax", using `word-separating-categories'
> etc., seem to do the correct thing with regexps.[*]  Perhaps some
> extension to that mechanism would work.

> For instance, what if entries in `word-separating-categories' could have an
> optional predicate function -- in addition to the current (CAT1 . CAT2)
> format, allow (CAT1 CAT2 PREDICATE-FUN), and only consider the entry to
> match if PREDICATE-FUN fun (with some apropriate args) also returns true?

The problem is that the innermost function
re_match_2_internal doesn't know about the original buffer
or Lisp string.  So, to make PREDICATE-FUN work, we must
generate a Lisp string each time and that will be extemely
slow.  And first of all, is re_match_2_internal a safe place
to call a Lisp function?

> [*] I was surprised that this is true, and I don't understand why from
>     my quick look at regex.c :-/ ... But my simple tests seem to show
>     that it does really work.  E.g., I can add '(?C . ?C) to
>     `word-separating-categories', and then a regexp search will suddenly
>     start considering every single kanji character as a standalone word.

I spent fairy long time to make it work. :-p
re_match_2_internal calls the macro WORD_BOUNDARY_P at
proper places.  It is also used in scan_words (syntax.c).

---
Ken'ichi HANDA
handa@m17n.org

  reply	other threads:[~2005-03-29  9:02 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-03-24  7:29 announcing thaiword.el? Werner LEMBERG
2005-03-24 19:41 ` Eli Zaretskii
2005-03-24 21:11   ` Werner LEMBERG
2005-03-30  7:16     ` Werner LEMBERG
2005-03-30 11:34       ` Kenichi Handa
2005-03-25  6:42 ` Richard Stallman
2005-03-25  7:18   ` Werner LEMBERG
2005-03-25 14:44     ` Stefan Monnier
2005-03-25 22:26       ` Werner LEMBERG
2005-03-26  1:06         ` Kenichi Handa
2005-03-26 15:21           ` Stefan Monnier
2005-03-29  8:10             ` Kenichi Handa
2005-03-27  3:53           ` Richard Stallman
2005-03-28  0:47             ` Kenichi Handa
2005-03-28 22:53               ` Richard Stallman
2005-03-29  7:25                 ` Kim F. Storm
2005-03-29 11:35                   ` Kenichi Handa
2005-03-29  5:44               ` Juri Linkov
2005-03-29  8:35               ` Miles Bader
2005-03-29  9:02                 ` Kenichi Handa [this message]
2005-03-29 10:14                   ` Miles Bader
2005-03-29 11:29                     ` Kenichi Handa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200503290902.SAA29414@etlken.m17n.org \
    --to=handa@m17n.org \
    --cc=emacs-devel@gnu.org \
    --cc=monnier@iro.umontreal.ca \
    --cc=rms@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.