From: Hongyi Zhao <hongyi.zhao@gmail.com>
To: tomas@tuxteam.de
Cc: help-gnu-emacs <help-gnu-emacs@gnu.org>
Subject: Re: [External] : Re: Regexp for matching control character, say, FORM FEED. (Was: Re: The `^L' appeared in built-in help.)
Date: Thu, 22 Jul 2021 17:45:36 +0800 [thread overview]
Message-ID: <CAGP6POJdNh4i1TkOGACGD7081-Q6JOuBeqbQUduVDHvYM6ESFw@mail.gmail.com> (raw)
In-Reply-To: <20210722080643.GC11096@tuxteam.de>
On Thu, Jul 22, 2021 at 4:06 PM <tomas@tuxteam.de> wrote:
>
> On Thu, Jul 22, 2021 at 09:13:31AM +0800, Hongyi Zhao wrote:
>
> [...]
>
> > I want to know whether there are some similar regexp patterns in Emacs
> > as the ones used by grep, say, $'\014' or $'\f'.
>
> To offer some other perspective on the (correct) answers by Emanuel and
> Drew, remember that a regular expression is, basically, a string
> where each character is interpreted as "itself", unless it is a "regexp
> special" character [1]. So, for example searching for the regular expression
> "a" will find all "a"s in your text, because the character a isn't a
> "regexp special".
>
> Now ASCII control characters are all *not* "regexp special" so you only
> have to find a way to express them whithin a string. How, that is stated
> in the Emacs Lisp manual when it talks about "string type" [2] (especially
> the subnode "Non-ASCII Characters in Strings", which leads you to "character
> type" [3]. The special forms "\f", "\^L" or "\C-L" (all of them equivalent),
> which all were talked about here are treated in a subnode of the above [4].
> This notation carries some historical baggage, so don't expect too much
> logic from it.
>
> For example, why ^L? Because form feed is at point 12 (in decimal) in the
> ascii table, and L at point 76, the difference being 64.
$ man ascii |egrep ' L$'
014 12 0C FF '\f' (form feed) 114 76 4C L
> What happens is that the "^" "subtracts 64 from the character code", or more precisely
> masks out bit 6 of its binary representation.
$ man ascii |egrep ' \^$'
036 30 1E RS (record separator) 136 94 5E ^
If so, the RS should be represented by ^^ in a self-consistent way :-)
> So ^M would be "carriage return" and so on. Just have a look at the ASCII table.
$ man ascii |egrep ' M$'
015 13 0D CR '\r' (carriage ret) 115 77 4D M
> Then "\f" comes from the C string literal representation. It's meant to
> be mnemonic ("f" for "form feed" -- similarly "\n" for "line feed", aka
> "new line", "\b" for "bell" and so on).
>
> The references below lead you to more alternative representations, like
> short hex "\x0C", short Unicode hex "\u000C", long Unicode hex "\U0000000C";
> there are also (mostly historical) octals, etc.
>
> You can even put the unicode /names/ in there, using the "\N{...}"
> notation, so your ^L can be named "\N{FORM FEED (FF)}" (yes the (FF)
> in parentheses is part of it: the Unicode Consortium put it in there.
> Life is like that).
>
> If you want to explore those unicode names, type in C-x 8 <RET>, you
> can autocomplete your way among them.
>
> Hope this gives some rough map for that landscape :-)
Thank you for your systematic and informative comments and explanations.
> Cheers
>
> [1] Emacs Lisp reference manual "Syntax of Regular Expressions"
> or https://www.gnu.org/software/emacs/manual/html_node/elisp/Syntax-of-Regexps.html
>
>
> [2] Emacs Lisp reference manual "String Type" and its subnodes
> or https://www.gnu.org/software/emacs/manual/html_node/elisp/String-Type.html
>
> [3] Emacs Lisp reference manual "Character Type"
> https://www.gnu.org/software/emacs/manual/html_node/elisp/Character-Type.html
>
> [4] Emacs Lisp reference manual "Control-Character Syntax"
> https://www.gnu.org/software/emacs/manual/html_node/elisp/Ctl_002dChar-Syntax.html
>
> - tomás
Best,
HY
next prev parent reply other threads:[~2021-07-22 9:45 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-06 2:34 The `^L' appeared in built-in help Hongyi Zhao
2021-07-06 2:46 ` [External] : " Drew Adams
2021-07-06 2:53 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-07-06 14:56 ` Drew Adams
2021-07-06 15:56 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-07-06 17:04 ` Drew Adams
2021-07-06 17:12 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-07-06 2:51 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-07-06 3:44 ` Hongyi Zhao
2021-07-06 4:06 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-07-06 8:26 ` Hongyi Zhao
2021-07-06 8:31 ` Hongyi Zhao
2021-07-06 9:12 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-07-06 9:40 ` Hongyi Zhao
2021-07-06 10:06 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-07-06 11:07 ` Hongyi Zhao
2021-07-06 11:22 ` Hongyi Zhao
2021-07-06 11:55 ` Hongyi Zhao
2021-07-06 12:09 ` Hongyi Zhao
2021-07-06 16:13 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-07-06 16:12 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-07-07 3:03 ` Hongyi Zhao
2021-07-13 3:06 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-07-06 16:12 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-07-07 1:40 ` Hongyi Zhao
2021-07-13 3:07 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-07-18 6:34 ` Hongyi Zhao
2021-07-19 0:27 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-07-20 1:27 ` Hongyi Zhao
2021-07-20 1:42 ` [External] : " Drew Adams
2021-07-20 2:02 ` Hongyi Zhao
2021-07-20 4:28 ` Drew Adams
2021-07-20 5:56 ` Hongyi Zhao
2021-07-20 10:29 ` Hongyi Zhao
2021-07-20 14:48 ` Drew Adams
2021-07-20 16:28 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-07-21 2:03 ` Hongyi Zhao
2021-07-21 2:26 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-07-21 4:44 ` Drew Adams
2021-07-21 7:15 ` Regexp for matching control character, say, FORM FEED. (Was: Re: The `^L' appeared in built-in help.) Hongyi Zhao
2021-07-21 17:08 ` [External] : " Drew Adams
2021-07-22 1:13 ` Hongyi Zhao
2021-07-22 1:28 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-07-22 1:39 ` Drew Adams
2021-07-22 1:42 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-07-22 3:52 ` Drew Adams
2021-07-22 4:14 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-07-22 8:04 ` Hongyi Zhao
2021-07-22 13:56 ` Hongyi Zhao
2021-07-22 14:38 ` tomas
2021-07-22 14:53 ` Hongyi Zhao
2021-07-22 15:01 ` tomas
2021-07-22 15:21 ` Hongyi Zhao
2021-07-22 22:07 ` [External] : Re: Regexp for matching control character, say, FORM FEED Michael Heerdegen
2021-07-23 1:09 ` Hongyi Zhao
2021-07-22 17:07 ` [External] : Re: Regexp for matching control character, say, FORM FEED. (Was: Re: The `^L' appeared in built-in help.) Drew Adams
2021-07-22 17:11 ` tomas
2021-08-01 2:41 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-07-22 8:06 ` tomas
2021-07-22 9:45 ` Hongyi Zhao [this message]
2021-07-22 10:06 ` tomas
2021-07-22 10:27 ` Hongyi Zhao
2021-07-22 12:14 ` tomas
2021-08-01 2:31 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-07-06 4:19 ` The `^L' appeared in built-in help 2QdxY4RzWzUUiLuE
2021-07-06 4:29 ` Emanuel Berg via Users list for the GNU Emacs text editor
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAGP6POJdNh4i1TkOGACGD7081-Q6JOuBeqbQUduVDHvYM6ESFw@mail.gmail.com \
--to=hongyi.zhao@gmail.com \
--cc=help-gnu-emacs@gnu.org \
--cc=tomas@tuxteam.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.