all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: help-gnu-emacs@gnu.org
Subject: Re: Regexp capturing unicode characters
Date: Thu, 01 Aug 2024 18:34:15 +0300	[thread overview]
Message-ID: <86h6c4w93c.fsf@gnu.org> (raw)
In-Reply-To: <uLWDqkkLlZSrOa36xkHb88GyVQc7GLDYNOT1xuiXWNPSSe13R9Z-LsIH-htVRVl6jBulj12Nbb209Bh5BfuFQJODhlEppLEnBo1_bGHqVds=@protonmail.com> (message from Heime on Thu, 01 Aug 2024 13:43:20 +0000)

> Date: Thu, 01 Aug 2024 13:43:20 +0000
> From: Heime <heimeborgia@protonmail.com>
> Cc: help-gnu-emacs@gnu.org
> 
> > Why do you need that? Don't you know which characters you'd like to
> > match?
> 
> No, because language insertion in emacs depends upon the user.  But I want 
> to match foreign language characters mostly.

If by "foreign language characters" you mean letters and digits, then
[:alnum:] is what you want, as I already suggested.  This covers all
the characters that are either letters or digits, in all the
languages.

> > > Is there a way to show the characters that are members of each class ?
> > 
> > No, but you can check each character whether it matches a class.
> 
> What is the function name for doing that ?

string-match-p if you have a string or looking-at-p if you have it in
the buffer.

> Can one scan the buffer and list the matched character classes ?

Character classes overlap, so I'm not sure what kind of function you
want, and I don't think we have it anyway.  It's usually the other way
around: the author of a Lisp program knows in advance what kinds of
characters the program needs to match, and uses a regexp which will do
the job.

> > > Thought that [:multibyte:] captured the unicode characters. Bet even when
> > > I applied (set-buffer-multibyte t) to the buffer, I did not get matches.
> > 
> > Don't use [:multibyte:], it is hardly ever the right thing nowadays.
> 
> Can we update the manual with useful information such as with [:multibyte:] please.

The useful information is already there (including a cross-reference
to a detailed description of what "multibyte" means).  I just
translated it into simpler terms, based on what you told about the job
you want to do, to save you from the need to read that if you don't
want to.

> > > Does [:word:] mean word in the english language only ?
> > 
> > 
> > No, it means characters that have the word syntax. IOW, which
> > character match depends on the major mode's syntax table. If you are
> > classifying characters from human-readable text, [:word:] is not the
> > right thing to use.
> 
> Can one show the syntax table ?  For me it is just word syntax table does 
> not give me enough information.  Perhaps give more explanation in the manual.

The manual already does that: there's a cross-reference in the
description of [:word:] which leads to the node "Syntax Class Table",
which explains syntax tables in detail.



  parent reply	other threads:[~2024-08-01 15:34 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-31 21:24 Regexp capturing unicode characters Heime
2024-07-31 21:50 ` Heime
2024-08-01  5:15 ` Eli Zaretskii
2024-08-01 11:26   ` Heime
2024-08-01 12:10     ` Eli Zaretskii
2024-08-01 13:43       ` Heime
2024-08-01 14:30         ` Michael Heerdegen via Users list for the GNU Emacs text editor
2024-08-01 15:34         ` Eli Zaretskii [this message]
2024-08-01 17:06           ` Heime
2024-08-01 17:46             ` Eli Zaretskii
2024-08-01 19:44               ` Heime
2024-08-02  5:44                 ` Eli Zaretskii
2024-08-02  8:03                   ` uzibalqa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86h6c4w93c.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=help-gnu-emacs@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.