all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: rasmith@tamu.edu
To: Peter_Dyballa@Web.DE
Cc: help-gnu-emacs@gnu.org
Subject: Re: search-forward in emacs23 lisp
Date: Sun, 28 Mar 2010 19:44:12 -0500 (CDT)	[thread overview]
Message-ID: <20100328.194412.1151864885461785121.rasmith@aristotle.tamu.edu> (raw)
In-Reply-To: <D80A206F-B9A7-4D0E-A801-3951A7549A09@Web.DE>

From: Peter Dyballa <Peter_Dyballa@Web.DE>
Subject: Re: search-forward in emacs23 lisp
Date: Sun, 28 Mar 2010 23:45:26 +0200

> 
> Am 27.03.2010 um 21:31 schrieb rasmith:
> 
>> The behavior of the search-forward function in emacs-lisp has changed
>> in emacs23 in a way that breaks some scripts I use, in particular
>> cgreek-tlg.el from Naoto Takahashi's cgreek package.
> 
> 
> Maybe the problem is simply that, that the buffer is in UTF-8. Then is
> makes really no sense to search for that byte because it does not
> exist, like a quark (although baryons and mesons are built from them),
> there only exists the two-byte word \xc3\xbf (standing for ÿ, LATIN
> SMALL LETTER Y WITH DIAERESIS). Clearly, you can't search what does
> not exist – except you're Lancelot.
> 
> Which coding is used in the buffer? Can you switch to a (raw)
> byte-based encoding and test in this state?
> 

No, the buffer's not in utf-8.  The file was read in with
insert-file-contents literally, and (set-buffer raw) 
and (set-buffer-multibyte nil) were executed just before that.
When I run the function containing the problem code, sometimes it just
returns a not found: "\377" and stops, and sometimes it returns an
error message indicating that it's not looking at what it expects (the
actual message is "Unexpected author description introducer" followed
by a pair of bytes in hex).  I can then switch into that buffer, and
in the latter case what I find is that the point is sitting just after
a pair of bytes, specifically \231\277 (this is where 
(search-forward (char-to-string ?\xff)) stopped).  This is well beyond
an earlier occurrence of \377 in the buffer (I won't explain the
rather complicated format of the files in question, but in them \377
is used as a string terminator--and don't ask me to change that, since
the whole purpose of the code is to process files having this
format). While visiting that buffer, it's pretty obvious that it's in
raw mode (all high bytes display in octal, and what-cursor-position
identifies everything you look at as an 8-bit byte, never a utf-8
multibyte character).  

Within that buffer, an isearch for \377 finds a 255 byte
with no problem.  The problem is entirely in the search-forward
function.  I tried inserting (search-forward (unibyte-string ?\377))
in the buffer and executing it from there; when I do that, it skips
right over \377 but stops instead at \231\277 (which as I pointed out
is not the utf-8 version of \377).  This result happens with all the
possible arguments I've come up with for search-forward, such as:
(unibyte-string ?\377) 
(string-to-unibyte (unibyte-string ?\377))
"ÿ"
"\377"
"\xff" (this is even worse: it's translated to two bytes \x00ff)

I've verified that (unibyte-string ?\377) returns exactly what it
should: a string containing just the 8-bit byte \377.  However, when 
search-forward gets that argument, running from a raw buffer with
multibyte turned off, it first turns it into the two-byte string
\231\277 and then matches on that.  If there's a way to keep it from
doing that, I'd like to know.

As I said in a reply to myself, I found a workaround:

      (while (/= (char-after) ?\377)
	(forward-char 1)
	)
      (forward-char 1)

But it would be nice to know exactly what it is that search-forward is
doing here.  My knowledge of emacs-lisp is pretty rudimentary, so if
I'm missing something obvious, please let me know.

Thanks,

Robin Smith

  reply	other threads:[~2010-03-29  0:44 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-27 20:31 search-forward in emacs23 lisp rasmith
2010-03-28 16:39 ` rasmith
2010-03-28 16:50   ` Lennart Borgman
2010-03-28 17:04     ` rasmith
2010-03-28 17:10       ` Lennart Borgman
2010-03-28 17:56         ` rasmith
2010-03-28 17:59         ` rasmith
2010-03-28 18:22           ` Lennart Borgman
2010-03-28 21:45 ` Peter Dyballa
2010-03-29  0:44   ` rasmith [this message]
2010-03-28 23:00 ` Johan Bockgård
2010-03-29  6:51   ` Eli Zaretskii
2010-03-29 15:01     ` rasmith
2010-03-29 15:17       ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100328.194412.1151864885461785121.rasmith@aristotle.tamu.edu \
    --to=rasmith@tamu.edu \
    --cc=Peter_Dyballa@Web.DE \
    --cc=help-gnu-emacs@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.