From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.help Subject: Re: Regexp capturing unicode characters Date: Thu, 01 Aug 2024 08:15:40 +0300 Message-ID: <865xskygar.fsf@gnu.org> References: Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="2196"; mail-complaints-to="usenet@ciao.gmane.io" To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Thu Aug 01 07:16:26 2024 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1sZOAs-0000Q6-Ef for geh-help-gnu-emacs@m.gmane-mx.org; Thu, 01 Aug 2024 07:16:26 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sZOAB-00062h-Cl; Thu, 01 Aug 2024 01:15:43 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sZOA9-00061e-S7 for help-gnu-emacs@gnu.org; Thu, 01 Aug 2024 01:15:41 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sZOA9-0007sB-Fg for help-gnu-emacs@gnu.org; Thu, 01 Aug 2024 01:15:41 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=jTxf9oHw8RMQ+gfPWWk4YHoB9Qskh3V+xEIK+tEXq1g=; b=KKOs69s7DpqJ dSShIC9dh3BDSNm2jfUuLVD4sqx/4h2O2T2bQixDZQClEZpHXO4Cy4Qg4EQQaz2SmBejtwn19RdC0 cKmUpVJWLEd5t+o7FdrFfB7GVjeCd0mfoRyU5I+rriNq24w6CArxhJZ11Zs9U2lnu/cfMUV1/6KK7 etyGrkKT/RtLow/P78ZXlfVt+NyVv3mfQO9ECgBO02Cil9enxrCdK09kN9j8dwswQg+sml1lrO4uR +NByl80PVsal2hvzHc3ZgWEJsFXd4BWjadiTCXumCIjP/a0Ds+ATBGGG3Cu4R0XAzh5KmOI3KkEXj GBHZe4lHqYw5OFfYSok/kA==; In-Reply-To: (message from Heime on Wed, 31 Jul 2024 21:24:46 +0000) X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.help:147479 Archived-At: > Date: Wed, 31 Jul 2024 21:24:46 +0000 > From: Heime > > I am using unicode characters in my elisp code (e.g. foreign language symbols in icelandic > and spanish). > > Is the regexp [[:word:]] appropriate to capture them ? No. [[:word:]] matches characters that have the word syntax, so which characters match depends on the major mode. My suggestion is to use either [[:alnum:]] or [[:alpha:]] instead, depending on whether you want or don't want to match digit characters. The meaning of each character class is documented in the "Char Classes" node of the ELisp Reference manual, I suggest to read it and choose the most appropriate one for your needs.