From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Heime Newsgroups: gmane.emacs.help Subject: Re: Regexp capturing unicode characters Date: Thu, 01 Aug 2024 17:06:26 +0000 Message-ID: References: <865xskygar.fsf@gnu.org> <2wHi4S9MruOl3ZOkpjKnin3CJxnVnomMkaIdhl-i3OF7AYEda3X-7-1ijhWUrLZ22JwOMXQu5ntZ3FFBuAlmhkpMxgXFbhZ-sS_XMmCrE4g=@protonmail.com> <86le1gwii6.fsf@gnu.org> <86h6c4w93c.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="5982"; mail-complaints-to="usenet@ciao.gmane.io" Cc: help-gnu-emacs@gnu.org To: Eli Zaretskii Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Thu Aug 01 19:07:29 2024 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1sZZGy-0001Mk-VW for geh-help-gnu-emacs@m.gmane-mx.org; Thu, 01 Aug 2024 19:07:29 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sZZGL-00028R-3J; Thu, 01 Aug 2024 13:06:49 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sZZGJ-000205-36 for help-gnu-emacs@gnu.org; Thu, 01 Aug 2024 13:06:47 -0400 Original-Received: from mail-43166.protonmail.ch ([185.70.43.166]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sZZGG-0006qR-FR; Thu, 01 Aug 2024 13:06:46 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=protonmail.com; s=protonmail3; t=1722532000; x=1722791200; bh=T7jDNuOs8p4SYyWdDrd5saX0fu09kzlTm/uGH35d4z8=; h=Date:To:From:Cc:Subject:Message-ID:In-Reply-To:References: Feedback-ID:From:To:Cc:Date:Subject:Reply-To:Feedback-ID: Message-ID:BIMI-Selector; b=Yt8A8Ff3WzNM8d/TGJSrvWOEmh+BFsA3FzUqoyVmHVEeiR47fLJ4S+i0edxtVXepS 6xwx3BVtsor3BTcsW8JsQ8uCiNc7KdNfv702T/ITcY5jthFySCVXmzt7z0Bjb91BTl JufYcYmD604xSQO6rnc09/FzTV7OAFRVYVg2DFaLraOme9bJuKWgih6cIlLWVOzBzW W5CGUOmUYuQQWoG5ItzvVfCTSFyD69/N/RrgWVchddrdCUwz4lbaTLhfeb0IW+Pbdy TZtPgdy1gsctVMHFSxwnez/+GJQJifnYRSyDsSB2CkiU0mQxNCUX6GbcxmDyj5UgTU hZ0S4BqEZXTIQ== In-Reply-To: <86h6c4w93c.fsf@gnu.org> Feedback-ID: 57735886:user:proton X-Pm-Message-ID: cb16f13e1bed0a9a879df16dd82962aae4847035 Received-SPF: pass client-ip=185.70.43.166; envelope-from=heimeborgia@protonmail.com; helo=mail-43166.protonmail.ch X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.help:147491 Archived-At: On Friday, August 2nd, 2024 at 3:34 AM, Eli Zaretskii wrote: > > Date: Thu, 01 Aug 2024 13:43:20 +0000 > > From: Heime heimeborgia@protonmail.com > > Cc: help-gnu-emacs@gnu.org > >=20 > > > Why do you need that? Don't you know which characters you'd like to > > > match? > >=20 > > No, because language insertion in emacs depends upon the user. But I wa= nt > > to match foreign language characters mostly. >=20 >=20 > If by "foreign language characters" you mean letters and digits, then > [:alnum:] is what you want, as I already suggested. This covers all > the characters that are either letters or digits, in all the > languages. >=20 > > > > Is there a way to show the characters that are members of each clas= s ? > > >=20 > > > No, but you can check each character whether it matches a class. > >=20 > > What is the function name for doing that ? >=20 >=20 > string-match-p if you have a string or looking-at-p if you have it in > the buffer. >=20 > > Can one scan the buffer and list the matched character classes ? >=20 >=20 > Character classes overlap, so I'm not sure what kind of function you > want, and I don't think we have it anyway. It's usually the other way > around: the author of a Lisp program knows in advance what kinds of > characters the program needs to match, and uses a regexp which will do > the job. I want to include in the regexp the possibility that the user wrote some comment in a foreign language other than english. Otherwise the regexp = =20 would simply skip them. And your suggestion has been [alpha] and [:alnum:]= . =20 > > > > Thought that [:multibyte:] captured the unicode characters. Bet eve= n when > > > > I applied (set-buffer-multibyte t) to the buffer, I did not get mat= ches. > > >=20 > > > Don't use [:multibyte:], it is hardly ever the right thing nowadays. > >=20 > > Can we update the manual with useful information such as with [:multiby= te:] please. >=20 >=20 > The useful information is already there (including a cross-reference > to a detailed description of what "multibyte" means). I just > translated it into simpler terms, based on what you told about the job > you want to do, to save you from the need to read that if you don't > want to. A mention that [:multibyte:] is not used much nowadays. =20 > > > > Does [:word:] mean word in the english language only ? > > >=20 > > > No, it means characters that have the word syntax. IOW, which > > > character match depends on the major mode's syntax table. If you are > > > classifying characters from human-readable text, [:word:] is not the > > > right thing to use. >=20 > > Can one show the syntax table ? For me it is just word syntax table doe= s > > not give me enough information. Perhaps give more explanation in the ma= nual. >=20 > The manual already does that: there's a cross-reference in the > description of [:word:] which leads to the node "Syntax Class Table", > which explains syntax tables in detail.