From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail
From: Eli Zaretskii <eliz@gnu.org>
Newsgroups: gmane.emacs.help
Subject: Re: Regexp capturing unicode characters
Date: Thu, 01 Aug 2024 15:10:57 +0300
Message-ID: <86le1gwii6.fsf@gnu.org>
References: <dsvxyTSPY2IeArhvS10w_f4j9Hiw3A1eCZCdlBBOIvjH37zyHj8dKii8j5fTodda-SST4ecImQ7L_CE37hVNws5Tzf0Sz_-2TCGfdqALx7k=@protonmail.com>
 <865xskygar.fsf@gnu.org>
 <2wHi4S9MruOl3ZOkpjKnin3CJxnVnomMkaIdhl-i3OF7AYEda3X-7-1ijhWUrLZ22JwOMXQu5ntZ3FFBuAlmhkpMxgXFbhZ-sS_XMmCrE4g=@protonmail.com>
Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214";
	logging-data="6171"; mail-complaints-to="usenet@ciao.gmane.io"
To: help-gnu-emacs@gnu.org
Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Thu Aug 01 14:11:44 2024
Return-path: <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org>
Envelope-to: geh-help-gnu-emacs@m.gmane-mx.org
Original-Received: from lists.gnu.org ([209.51.188.17])
	by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
	(Exim 4.92)
	(envelope-from <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org>)
	id 1sZUel-0001Oy-R0
	for geh-help-gnu-emacs@m.gmane-mx.org; Thu, 01 Aug 2024 14:11:44 +0200
Original-Received: from localhost ([::1] helo=lists1p.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <help-gnu-emacs-bounces@gnu.org>)
	id 1sZUeA-000203-Ms; Thu, 01 Aug 2024 08:11:06 -0400
Original-Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@gnu.org>) id 1sZUe8-0001zZ-6N
 for help-gnu-emacs@gnu.org; Thu, 01 Aug 2024 08:11:04 -0400
Original-Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@gnu.org>) id 1sZUe7-0001cv-RH
 for help-gnu-emacs@gnu.org; Thu, 01 Aug 2024 08:11:03 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date:
 mime-version; bh=fJPd2W1Q5VU3eBJK76zABRCaGpacSxkVs0xnybHfFL8=; b=fbXxCLBM/Bpm
 FxaiGa/pzNmvtd64BgGSZv7a1kGdTL8s/KxUJb3MTptjkfOlFfBA1K5kK2vXQ5mG7HyL13JcbJ+ED
 85MAKn47GRDRKuSxk9qc3KJFMoY5gyYAJZAAqt6AWMGGl3ytX3tu//Hv5UuaFYVW7g2b9E5V3t8+5
 ZKHwY+w9qGPrXixzhSWnp6vvoF3JsZNdCFgIcd6KHUXXO5h7RBZHIW//x02TggeOfNbAMKt4RamqC
 qoExg/x3hi7i2kOfACS0WaUbGcSQNBMWCMg9HxxhuzOzl8FyjaIAY4WN6ZYhEhl/OMcbZX3ex8FKf
 eemI3f76flEfjfCti80ajA==;
In-Reply-To: <2wHi4S9MruOl3ZOkpjKnin3CJxnVnomMkaIdhl-i3OF7AYEda3X-7-1ijhWUrLZ22JwOMXQu5ntZ3FFBuAlmhkpMxgXFbhZ-sS_XMmCrE4g=@protonmail.com>
 (message from Heime on Thu, 01 Aug 2024 11:26:40 +0000)
X-BeenThere: help-gnu-emacs@gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Users list for the GNU Emacs text editor <help-gnu-emacs.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/help-gnu-emacs>,
 <mailto:help-gnu-emacs-request@gnu.org?subject=unsubscribe>
List-Archive: <https://lists.gnu.org/archive/html/help-gnu-emacs>
List-Post: <mailto:help-gnu-emacs@gnu.org>
List-Help: <mailto:help-gnu-emacs-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/help-gnu-emacs>,
 <mailto:help-gnu-emacs-request@gnu.org?subject=subscribe>
Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org
Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org
Xref: news.gmane.io gmane.emacs.help:147482
Archived-At: <http://permalink.gmane.org/gmane.emacs.help/147482>

> Date: Thu, 01 Aug 2024 11:26:40 +0000
> From: Heime <heimeborgia@protonmail.com>
> Cc: help-gnu-emacs@gnu.org
> 
> On Thursday, August 1st, 2024 at 5:15 PM, Eli Zaretskii <eliz@gnu.org> wrote:
> 
> > > Date: Wed, 31 Jul 2024 21:24:46 +0000
> > > From: Heime heimeborgia@protonmail.com
> > > 
> > > I am using unicode characters in my elisp code (e.g. foreign language symbols in icelandic
> > > and spanish).
> > > 
> > > Is the regexp [[:word:]] appropriate to capture them ?
> > 
> > 
> > No. [[:word:]] matches characters that have the word syntax, so which
> > characters match depends on the major mode. My suggestion is to use
> > either [[:alnum:]] or [[:alpha:]] instead, depending on whether you
> > want or don't want to match digit characters.
> > 
> > The meaning of each character class is documented in the "Char
> > Classes" node of the ELisp Reference manual, I suggest to read it and
> > choose the most appropriate one for your needs.
> 
> It is difficult to determine from a character class, the actual character.

Why do you need that?  Don't you know which characters you'd like to
match?

> Is there a way to show the characters that are members of each class ?

No, but you can check each character whether it matches a class.

> Thought that [:multibyte:] captured the unicode characters.  Bet even when
> I applied (set-buffer-multibyte t) to the buffer, I did not get matches.

Don't use [:multibyte:], it is hardly ever the right thing nowadays.

> Does [:word:] mean word in the english language only ?

No, it means characters that have the word _syntax_.  IOW, which
character match depends on the major mode's syntax table.  If you are
classifying characters from human-readable text, [:word:] is not the
right thing to use.