all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Tyler Spivey <tspivey@pcdesk.net>
To: help-gnu-emacs@gnu.org
Subject: Re: Making re-search-forward search for \377
Date: Sun, 02 Nov 2008 20:54:52 -0800	[thread overview]
Message-ID: <87hc6ppfxf.fsf@pcdesk.net> (raw)
In-Reply-To: mailman.2743.1225686066.25473.help-gnu-emacs@gnu.org

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Tyler Spivey <tspivey@pcdesk.net>
>> Date: Sun, 02 Nov 2008 01:12:10 -0800
>> 
>> I'm probably going to end up working with binary data in a temp
>> buffer. Doing more research, I want enable-multibyte-characters to be
>> off. Given that, if we go to *scratch*
>> and run M-X toggle-enable-multibyte-characters until that variable
>> becomes nil, doing C-Q 377 RET gives 0xff, which is what I want
>> (according to C-x =, C-u C-x = and M-x describe-char). Now to
>> match it, I try:
>> 
>> (re-search-forward "\xff") - no luck
>> 
>> What did you use to figure out that the multibyte version of that
>> character was 0x00FF? I found it out accidentally as a lisp error, but
>> none of the previously described commands (C-X =, M-X describe-char or
>> C-u C-x =) will show that it is 0x00ff, they just show FF.
>
> Why are you trying to use re-search-forward with octal codes such as
> \377?  What are you trying to do? does the buffer you are searching
> hold human-readable text or does it hold binary data, i.e. raw bytes?
>
> In the former case, you need to use characters in the search string,
> not literal codes like \377 or xff, and the buffer should be in the
> (default) multibyte mode.  \377 is not a character code, as far as
> Emacs is concerned, it's an encoding of some character.  Do _not_ make
> a mistake of turning enable-multibyte-characters off and using raw
> bytes such as \377 for searching normal text, that way lies madness.

I think this is partially a problem with emacs, and partially a problem
with what I'm trying to do, or my understanding of regex. I posted to emacs-devel, maybe someone there
might know more. What I'm trying to do is split text up for use in a mud
client, based on the following re:
"\\(\377[\371\357]\\)\\|\\(\n\\)"
the encoding of the process is raw-text-unix.
manually running M-: (re-search-forward "\\(\377[\371\357]\\)") fails,
but
running M-: (re-search-forward "\377\371") works fine. However, I want
it to match
the longer re stated above, but running re-search on that just matches
the newlines.

This is mostly text, with telnet control characters thrown in that I
want to use as delimiters of a sort and process on them, while deleting
them from the text. Using a re-search would be perfect for this if I
could figure out how to do it.

In reading section 2.3.8.2 of the manual, we get this:
   You can represent a unibyte non-ASCII character with its character
code, which must be in the range from 128 (0200 octal) to 255 (0377
octal).  If you write all such character codes in octal and the string
contains no other characters forcing it to be multibyte, this produces
a unibyte string.  However, using any hex escape in a string (even for
an ASCII character) forces the string to be multibyte.

I've left enable-multibyte-characters alone, but even searching for
"[\377]\371" fails, while "\377\371" succeeds.


  parent reply	other threads:[~2008-11-03  4:54 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-02  7:31 Making re-search-forward search for \377 Tyler Spivey
2008-11-02  8:45 ` Xah
2008-11-02  9:12   ` Tyler Spivey
2008-11-02 18:10     ` Kevin Rodgers
2008-11-02 20:32     ` Xah
2008-11-02 22:35       ` Tyler Spivey
2008-11-03  4:21     ` Eli Zaretskii
     [not found]     ` <mailman.2743.1225686066.25473.help-gnu-emacs@gnu.org>
2008-11-03  4:54       ` Tyler Spivey [this message]
2008-11-03 19:42         ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87hc6ppfxf.fsf@pcdesk.net \
    --to=tspivey@pcdesk.net \
    --cc=help-gnu-emacs@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.