From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Newsgroups: gmane.emacs.help Subject: Re: [External] : Re: Regexp for matching control character, say, FORM FEED. (Was: Re: The `^L' appeared in built-in help.) Date: Thu, 22 Jul 2021 10:06:43 +0200 Message-ID: <20210722080643.GC11096@tuxteam.de> References: <87lf61ymol.fsf@zoho.eu> <87im14xv08.fsf@zoho.eu> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="0lnxQi9hkpPO77W3" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="18011"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mutt/1.5.21 (2010-09-15) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Thu Jul 22 10:07:11 2021 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1m6Tjb-0004X2-1e for geh-help-gnu-emacs@m.gmane-mx.org; Thu, 22 Jul 2021 10:07:11 +0200 Original-Received: from localhost ([::1]:47536 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1m6TjZ-00087e-MZ for geh-help-gnu-emacs@m.gmane-mx.org; Thu, 22 Jul 2021 04:07:09 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:54206) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m6TjE-000877-1z for help-gnu-emacs@gnu.org; Thu, 22 Jul 2021 04:06:48 -0400 Original-Received: from mail.tuxteam.de ([5.199.139.25]:33886) by eggs.gnu.org with esmtps (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.90_1) (envelope-from ) id 1m6TjB-0001GA-JD for help-gnu-emacs@gnu.org; Thu, 22 Jul 2021 04:06:47 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=tuxteam.de; s=mail; h=From:In-Reply-To:Content-Type:MIME-Version:References:Message-ID:Subject:To:Date; bh=mjCytQutdfoNo0QwGDly9iBbYhgF824nwgZLeC3CVRo=; b=Q0ne9ofBYJfxo9RX5OWQAghAuXAOSFFGht7CrxXlP3HQLEqyZqMTvekUXI06yqZJsCT1b/tRsYncHIu4ymqFl4+t6UtSocWuUy9lohwM6LZg8DMOK/HNSKXZJHVdG2r1e495QxUXAhE8X3tc1gitEOGYkzFdsrlyX+9DUUzLdqTNvXT+rH5JqoTouBBrQwu8wBIgc1OG5g4PWK7UWXAsmXXj23iRHv9sZ5ZEvA5zkTVGsEvKlIZFgkyDqyFJHD9zN6fA0lpXfZpj8BhO4KuSFh6qYuMqg2jBEpKmqOogPVtK2trlRvpZHcRaSQTU3IiYFkbLgqzHO3OaHh+QagMnrA==; Original-Received: from tomas by mail.tuxteam.de with local (Exim 4.80) (envelope-from ) id 1m6Tj9-0003fs-FH for help-gnu-emacs@gnu.org; Thu, 22 Jul 2021 10:06:43 +0200 Content-Disposition: inline In-Reply-To: Received-SPF: pass client-ip=5.199.139.25; envelope-from=tomas@tuxteam.de; helo=mail.tuxteam.de X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "help-gnu-emacs" Xref: news.gmane.io gmane.emacs.help:132022 Archived-At: --0lnxQi9hkpPO77W3 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Jul 22, 2021 at 09:13:31AM +0800, Hongyi Zhao wrote: [...] > I want to know whether there are some similar regexp patterns in Emacs > as the ones used by grep, say, $'\014' or $'\f'. To offer some other perspective on the (correct) answers by Emanuel and Drew, remember that a regular expression is, basically, a string where each character is interpreted as "itself", unless it is a "regexp special" character [1]. So, for example searching for the regular expression "a" will find all "a"s in your text, because the character a isn't a "regexp special". Now ASCII control characters are all *not* "regexp special" so you only have to find a way to express them whithin a string. How, that is stated in the Emacs Lisp manual when it talks about "string type" [2] (especially the subnode "Non-ASCII Characters in Strings", which leads you to "character type" [3]. The special forms "\f", "\^L" or "\C-L" (all of them equivalent), which all were talked about here are treated in a subnode of the above [4]. This notation carries some historical baggage, so don't expect too much logic from it. For example, why ^L? Because form feed is at point 12 (in decimal) in the ascii table, and L at point 76, the difference being 64. What happens is that the "^" "subtracts 64 from the character code", or more precisely masks out bit 6 of its binary representation. So ^M would be "carriage return" and so on. Just have a look at the ASCII table. Then "\f" comes from the C string literal representation. It's meant to be mnemonic ("f" for "form feed" -- similarly "\n" for "line feed", aka "new line", "\b" for "bell" and so on). The references below lead you to more alternative representations, like short hex "\x0C", short Unicode hex "\u000C", long Unicode hex "\U0000000C"; there are also (mostly historical) octals, etc. You can even put the unicode /names/ in there, using the "\N{...}" notation, so your ^L can be named "\N{FORM FEED (FF)}" (yes the (FF) in parentheses is part of it: the Unicode Consortium put it in there. Life is like that). If you want to explore those unicode names, type in C-x 8 , you can autocomplete your way among them. Hope this gives some rough map for that landscape :-) Cheers [1] Emacs Lisp reference manual "Syntax of Regular Expressions" or https://www.gnu.org/software/emacs/manual/html_node/elisp/Syntax-of-= Regexps.html [2] Emacs Lisp reference manual "String Type" and its subnodes or https://www.gnu.org/software/emacs/manual/html_node/elisp/String-Typ= e.html =20 [3] Emacs Lisp reference manual "Character Type" https://www.gnu.org/software/emacs/manual/html_node/elisp/Character-Typ= e.html [4] Emacs Lisp reference manual "Control-Character Syntax" https://www.gnu.org/software/emacs/manual/html_node/elisp/Ctl_002dChar-= Syntax.html - tom=C3=A1s --0lnxQi9hkpPO77W3 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iEYEARECAAYFAmD5JxMACgkQBcgs9XrR2kZqSACfVBuanOdrFkSeqxgLOsGefK+C d5sAn11GoIV6Il65pVsqrkha3KPwumwU =soR1 -----END PGP SIGNATURE----- --0lnxQi9hkpPO77W3--