From: Tyler Spivey <tspivey@pcdesk.net>
To: help-gnu-emacs@gnu.org
Subject: Re: Making re-search-forward search for \377
Date: Sun, 02 Nov 2008 20:54:52 -0800 [thread overview]
Message-ID: <87hc6ppfxf.fsf@pcdesk.net> (raw)
In-Reply-To: mailman.2743.1225686066.25473.help-gnu-emacs@gnu.org
Eli Zaretskii <eliz@gnu.org> writes:
>> From: Tyler Spivey <tspivey@pcdesk.net>
>> Date: Sun, 02 Nov 2008 01:12:10 -0800
>>
>> I'm probably going to end up working with binary data in a temp
>> buffer. Doing more research, I want enable-multibyte-characters to be
>> off. Given that, if we go to *scratch*
>> and run M-X toggle-enable-multibyte-characters until that variable
>> becomes nil, doing C-Q 377 RET gives 0xff, which is what I want
>> (according to C-x =, C-u C-x = and M-x describe-char). Now to
>> match it, I try:
>>
>> (re-search-forward "\xff") - no luck
>>
>> What did you use to figure out that the multibyte version of that
>> character was 0x00FF? I found it out accidentally as a lisp error, but
>> none of the previously described commands (C-X =, M-X describe-char or
>> C-u C-x =) will show that it is 0x00ff, they just show FF.
>
> Why are you trying to use re-search-forward with octal codes such as
> \377? What are you trying to do? does the buffer you are searching
> hold human-readable text or does it hold binary data, i.e. raw bytes?
>
> In the former case, you need to use characters in the search string,
> not literal codes like \377 or xff, and the buffer should be in the
> (default) multibyte mode. \377 is not a character code, as far as
> Emacs is concerned, it's an encoding of some character. Do _not_ make
> a mistake of turning enable-multibyte-characters off and using raw
> bytes such as \377 for searching normal text, that way lies madness.
I think this is partially a problem with emacs, and partially a problem
with what I'm trying to do, or my understanding of regex. I posted to emacs-devel, maybe someone there
might know more. What I'm trying to do is split text up for use in a mud
client, based on the following re:
"\\(\377[\371\357]\\)\\|\\(\n\\)"
the encoding of the process is raw-text-unix.
manually running M-: (re-search-forward "\\(\377[\371\357]\\)") fails,
but
running M-: (re-search-forward "\377\371") works fine. However, I want
it to match
the longer re stated above, but running re-search on that just matches
the newlines.
This is mostly text, with telnet control characters thrown in that I
want to use as delimiters of a sort and process on them, while deleting
them from the text. Using a re-search would be perfect for this if I
could figure out how to do it.
In reading section 2.3.8.2 of the manual, we get this:
You can represent a unibyte non-ASCII character with its character
code, which must be in the range from 128 (0200 octal) to 255 (0377
octal). If you write all such character codes in octal and the string
contains no other characters forcing it to be multibyte, this produces
a unibyte string. However, using any hex escape in a string (even for
an ASCII character) forces the string to be multibyte.
I've left enable-multibyte-characters alone, but even searching for
"[\377]\371" fails, while "\377\371" succeeds.
next prev parent reply other threads:[~2008-11-03 4:54 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-11-02 7:31 Making re-search-forward search for \377 Tyler Spivey
2008-11-02 8:45 ` Xah
2008-11-02 9:12 ` Tyler Spivey
2008-11-02 18:10 ` Kevin Rodgers
2008-11-02 20:32 ` Xah
2008-11-02 22:35 ` Tyler Spivey
2008-11-03 4:21 ` Eli Zaretskii
[not found] ` <mailman.2743.1225686066.25473.help-gnu-emacs@gnu.org>
2008-11-03 4:54 ` Tyler Spivey [this message]
2008-11-03 19:42 ` Eli Zaretskii
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87hc6ppfxf.fsf@pcdesk.net \
--to=tspivey@pcdesk.net \
--cc=help-gnu-emacs@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).