all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Thorsten Jolitz <tjolitz@gmail.com>
To: help-gnu-emacs@gnu.org
Subject: Re: Is it valid to use the zero-byte "^@" in regexps?
Date: Wed, 18 Jun 2014 12:22:35 +0200	[thread overview]
Message-ID: <87fvj2zfdg.fsf@gmail.com> (raw)
In-Reply-To: 8761jysfxw.fsf@geodiff-mac3.ulb.ac.be

Nicolas Richard <theonewiththeevillook@yahoo.fr> writes:

> Thorsten Jolitz <tjolitz@gmail.com> writes:
>> To rule out a fundamental problem - is it valid to have the zero-byte
>> (inserted with C-q C-@) appear in a regexp like this? 
>>
>> ,--------------------------------------------------------
>> | "^#\\+begin_src[[:space:]]+emacs-lisp[^^@]*\n#\\+end_src"
>> `--------------------------------------------------------
>
> I don't see why it wouldn't be valid, but I don't know. If it is
> desirable is another question : it would be better to search for the
> beginning, then search for the end with another regexp.

That what I did initially, and what is of course much easier, but took
twice (?) as long too ...

>> If so, this regexp should reliably match any 
>>
>> ,-----------------------
>> | #+begin_src emacs-lisp
>> |  [...]
>> | #+end_src
>> `-----------------------
>
> From the first occurrence of
> #+begin_src emacs-lisp
> ;; after point to the last occurence of
> #+end_src
> in the buffer. If there's more than one, they'll be part of the match
> too. e.g. if there's another block in the same document :
> #+begin_src sh
> echo whatever.
> #+end_src
> it'll be part of the match too. If you don't want that, make the star
> non-greedy by appending a question mark to it:
> (re-search-forward
> "^#\\+begin_src[[:space:]]+emacs-lisp[^^@]*?\n#\\+end_src")

yes, thanks for the hint, in my real sources I do use the non-greedy *?
(otherwise it would not work), but forgot about it when writing the
mail.

>> no matter whats inside the block, right?
>
> Except NUL characters of course.

i.e. zero-byte "^@"?

But Emacs can differentiate between NUL characters and the @ character -
or not? NUL chars have blue fonts, and message-mode complains when
trying to send them via email, but e.g. this mail has many @ chars that
are just normal text (just like my test-file) and they are recognized as
such.

Often, but not always, the not matched source-blocks contain @
characters (but not NUL chars). The strange thing is that the failed
matching happens with these blocks being part of a really big
testfile. When I isolate and copy them to a temp buffer and try to match
them there, it just works.

That makes testing/bisecting a bit difficult - whenever I find the
problem and isolate it, its gone ...

Therefore my question - is this technique with negated zero-bytes in
regexps supposed to work, or maybe problematic from the beginning?

-- 
cheers,
Thorsten




  reply	other threads:[~2014-06-18 10:22 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-18  9:14 Is it valid to use the zero-byte "^@" in regexps? Thorsten Jolitz
2014-06-18  9:52 ` Nicolas Richard
2014-06-18 10:22   ` Thorsten Jolitz [this message]
2014-06-18 10:55     ` Nicolas Richard
2014-06-18 11:16       ` Thorsten Jolitz
2014-06-18 11:38 ` Michael Albinus
2014-06-18 12:15   ` Nicolas Richard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87fvj2zfdg.fsf@gmail.com \
    --to=tjolitz@gmail.com \
    --cc=help-gnu-emacs@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.