From: Eli Zaretskii <eliz@gnu.org>
To: help-gnu-emacs@gnu.org
Subject: Re: Making re-search-forward search for \377
Date: Mon, 03 Nov 2008 21:42:14 +0200 [thread overview]
Message-ID: <uskq8vbop.fsf@gnu.org> (raw)
In-Reply-To: <87hc6ppfxf.fsf@pcdesk.net>
> From: Tyler Spivey <tspivey@pcdesk.net>
> Date: Sun, 02 Nov 2008 20:54:52 -0800
>
> What I'm trying to do is split text up for use in a mud
> client, based on the following re:
> "\\(\377[\371\357]\\)\\|\\(\n\\)"
> the encoding of the process is raw-text-unix.
> manually running M-: (re-search-forward "\\(\377[\371\357]\\)") fails,
> but
> running M-: (re-search-forward "\377\371") works fine. However, I want
> it to match
> the longer re stated above, but running re-search on that just matches
> the newlines.
>
> This is mostly text, with telnet control characters thrown in
If it's text, Emacs is unlikely to treat what was \377 etc. in the
file as just 8-bit byte whose integer value is \377. Depending on
your locale, Emacs will interpret such bytes as encoded characters and
convert them to its internal representation, which is exposed to you
as a large integer. (This conversion is called ``decoding''.)
To see what Emacs thinks about those characters, go to one of them and
type "C-u C-x =".
If I'm right, searching for literal \377\371 is unlikely to succeed,
since there's no such character in the buffer after decoding.
Instead, you should search for the codepoints in the internal
representation, as shown to you by "C-u C-x =". To insert such
characters, the easiest way is to use an ``input method''. You set an
input method by typing "C-u C-\" and then the name of the input method
you want. Typing "C-u C-\ TAB" will show the list of available input
methods, and "C-h C-\ METHOD" will describe the named input method.
> In reading section 2.3.8.2 of the manual, we get this:
> You can represent a unibyte non-ASCII character with its character
> code, which must be in the range from 128 (0200 octal) to 255 (0377
> octal). If you write all such character codes in octal and the string
> contains no other characters forcing it to be multibyte, this produces
> a unibyte string. However, using any hex escape in a string (even for
> an ASCII character) forces the string to be multibyte.
>
> I've left enable-multibyte-characters alone, but even searching for
> "[\377]\371" fails, while "\377\371" succeeds.
I don't recommend to use unibyte facilities, they are tricky and
treacherous.
prev parent reply other threads:[~2008-11-03 19:42 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-11-02 7:31 Making re-search-forward search for \377 Tyler Spivey
2008-11-02 8:45 ` Xah
2008-11-02 9:12 ` Tyler Spivey
2008-11-02 18:10 ` Kevin Rodgers
2008-11-02 20:32 ` Xah
2008-11-02 22:35 ` Tyler Spivey
2008-11-03 4:21 ` Eli Zaretskii
[not found] ` <mailman.2743.1225686066.25473.help-gnu-emacs@gnu.org>
2008-11-03 4:54 ` Tyler Spivey
2008-11-03 19:42 ` Eli Zaretskii [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=uskq8vbop.fsf@gnu.org \
--to=eliz@gnu.org \
--cc=help-gnu-emacs@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).