From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.help Subject: Re: Making re-search-forward search for \377 Date: Mon, 03 Nov 2008 21:42:14 +0200 Message-ID: References: <87tzaqporw.fsf@pcdesk.net> <87prlepk45.fsf@pcdesk.net> <87hc6ppfxf.fsf@pcdesk.net> NNTP-Posting-Host: lo.gmane.org X-Trace: ger.gmane.org 1225741369 14141 80.91.229.12 (3 Nov 2008 19:42:49 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 3 Nov 2008 19:42:49 +0000 (UTC) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Mon Nov 03 20:43:51 2008 connect(): Connection refused Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1Kx5L1-0003Yp-IY for geh-help-gnu-emacs@m.gmane.org; Mon, 03 Nov 2008 20:43:47 +0100 Original-Received: from localhost ([127.0.0.1]:36599 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Kx5Ju-0002u6-Gq for geh-help-gnu-emacs@m.gmane.org; Mon, 03 Nov 2008 14:42:38 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1Kx5Ja-0002ql-03 for help-gnu-emacs@gnu.org; Mon, 03 Nov 2008 14:42:18 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1Kx5JY-0002qZ-D3 for help-gnu-emacs@gnu.org; Mon, 03 Nov 2008 14:42:16 -0500 Original-Received: from [199.232.76.173] (port=46281 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Kx5JY-0002qW-8L for help-gnu-emacs@gnu.org; Mon, 03 Nov 2008 14:42:16 -0500 Original-Received: from mtaout6.012.net.il ([84.95.2.16]:50451) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1Kx5JX-0000au-SH for help-gnu-emacs@gnu.org; Mon, 03 Nov 2008 14:42:16 -0500 Original-Received: from HOME-C4E4A596F7 ([77.127.192.143]) by i-mtaout6.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0K9R00HW7W54XI40@i-mtaout6.012.net.il> for help-gnu-emacs@gnu.org; Mon, 03 Nov 2008 21:43:56 +0200 (IST) In-reply-to: <87hc6ppfxf.fsf@pcdesk.net> X-012-Sender: halo1@inter.net.il X-detected-operating-system: by monty-python.gnu.org: Solaris 10 (1203?) X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:59407 Archived-At: > From: Tyler Spivey > Date: Sun, 02 Nov 2008 20:54:52 -0800 > > What I'm trying to do is split text up for use in a mud > client, based on the following re: > "\\(\377[\371\357]\\)\\|\\(\n\\)" > the encoding of the process is raw-text-unix. > manually running M-: (re-search-forward "\\(\377[\371\357]\\)") fails, > but > running M-: (re-search-forward "\377\371") works fine. However, I want > it to match > the longer re stated above, but running re-search on that just matches > the newlines. > > This is mostly text, with telnet control characters thrown in If it's text, Emacs is unlikely to treat what was \377 etc. in the file as just 8-bit byte whose integer value is \377. Depending on your locale, Emacs will interpret such bytes as encoded characters and convert them to its internal representation, which is exposed to you as a large integer. (This conversion is called ``decoding''.) To see what Emacs thinks about those characters, go to one of them and type "C-u C-x =". If I'm right, searching for literal \377\371 is unlikely to succeed, since there's no such character in the buffer after decoding. Instead, you should search for the codepoints in the internal representation, as shown to you by "C-u C-x =". To insert such characters, the easiest way is to use an ``input method''. You set an input method by typing "C-u C-\" and then the name of the input method you want. Typing "C-u C-\ TAB" will show the list of available input methods, and "C-h C-\ METHOD" will describe the named input method. > In reading section 2.3.8.2 of the manual, we get this: > You can represent a unibyte non-ASCII character with its character > code, which must be in the range from 128 (0200 octal) to 255 (0377 > octal). If you write all such character codes in octal and the string > contains no other characters forcing it to be multibyte, this produces > a unibyte string. However, using any hex escape in a string (even for > an ASCII character) forces the string to be multibyte. > > I've left enable-multibyte-characters alone, but even searching for > "[\377]\371" fails, while "\377\371" succeeds. I don't recommend to use unibyte facilities, they are tricky and treacherous.