From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Tyler Spivey Newsgroups: gmane.emacs.help Subject: Re: Making re-search-forward search for \377 Date: Sun, 02 Nov 2008 20:54:52 -0800 Message-ID: <87hc6ppfxf.fsf@pcdesk.net> References: <87tzaqporw.fsf@pcdesk.net> <87prlepk45.fsf@pcdesk.net> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1225690888 21260 80.91.229.12 (3 Nov 2008 05:41:28 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 3 Nov 2008 05:41:28 +0000 (UTC) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Mon Nov 03 06:42:27 2008 connect(): Connection refused Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1KwsCl-0005Xm-Oj for geh-help-gnu-emacs@m.gmane.org; Mon, 03 Nov 2008 06:42:24 +0100 Original-Received: from localhost ([127.0.0.1]:34965 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KwsBe-0005dg-2x for geh-help-gnu-emacs@m.gmane.org; Mon, 03 Nov 2008 00:41:14 -0500 Original-Path: news.stanford.edu!newsfeed.stanford.edu!postnews.google.com!news2.google.com!npeer03.iad.highwinds-media.com!news.highwinds-media.com!feed-me.highwinds-media.com!post01.iad.highwinds-media.com!newsfe01.iad.POSTED!7564ea0f!not-for-mail Original-Newsgroups: gnu.emacs.help User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.60 (gnu/linux) Cancel-Lock: sha1:58Di+KzbZyY/unJv7KuaRzB5fGs= Original-Lines: 59 Original-NNTP-Posting-Host: 70.68.146.221 Original-X-Complaints-To: internet.abuse@sjrb.ca Original-X-Trace: newsfe01.iad 1225688091 70.68.146.221 (Mon, 03 Nov 2008 04:54:51 UTC) Original-NNTP-Posting-Date: Mon, 03 Nov 2008 04:54:51 UTC Original-Xref: news.stanford.edu gnu.emacs.help:164024 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:59367 Archived-At: Eli Zaretskii writes: >> From: Tyler Spivey >> Date: Sun, 02 Nov 2008 01:12:10 -0800 >> >> I'm probably going to end up working with binary data in a temp >> buffer. Doing more research, I want enable-multibyte-characters to be >> off. Given that, if we go to *scratch* >> and run M-X toggle-enable-multibyte-characters until that variable >> becomes nil, doing C-Q 377 RET gives 0xff, which is what I want >> (according to C-x =, C-u C-x = and M-x describe-char). Now to >> match it, I try: >> >> (re-search-forward "\xff") - no luck >> >> What did you use to figure out that the multibyte version of that >> character was 0x00FF? I found it out accidentally as a lisp error, but >> none of the previously described commands (C-X =, M-X describe-char or >> C-u C-x =) will show that it is 0x00ff, they just show FF. > > Why are you trying to use re-search-forward with octal codes such as > \377? What are you trying to do? does the buffer you are searching > hold human-readable text or does it hold binary data, i.e. raw bytes? > > In the former case, you need to use characters in the search string, > not literal codes like \377 or xff, and the buffer should be in the > (default) multibyte mode. \377 is not a character code, as far as > Emacs is concerned, it's an encoding of some character. Do _not_ make > a mistake of turning enable-multibyte-characters off and using raw > bytes such as \377 for searching normal text, that way lies madness. I think this is partially a problem with emacs, and partially a problem with what I'm trying to do, or my understanding of regex. I posted to emacs-devel, maybe someone there might know more. What I'm trying to do is split text up for use in a mud client, based on the following re: "\\(\377[\371\357]\\)\\|\\(\n\\)" the encoding of the process is raw-text-unix. manually running M-: (re-search-forward "\\(\377[\371\357]\\)") fails, but running M-: (re-search-forward "\377\371") works fine. However, I want it to match the longer re stated above, but running re-search on that just matches the newlines. This is mostly text, with telnet control characters thrown in that I want to use as delimiters of a sort and process on them, while deleting them from the text. Using a re-search would be perfect for this if I could figure out how to do it. In reading section 2.3.8.2 of the manual, we get this: You can represent a unibyte non-ASCII character with its character code, which must be in the range from 128 (0200 octal) to 255 (0377 octal). If you write all such character codes in octal and the string contains no other characters forcing it to be multibyte, this produces a unibyte string. However, using any hex escape in a string (even for an ASCII character) forces the string to be multibyte. I've left enable-multibyte-characters alone, but even searching for "[\377]\371" fails, while "\377\371" succeeds.