From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Thorsten Jolitz Newsgroups: gmane.emacs.help Subject: Re: Is it valid to use the zero-byte "^@" in regexps? Date: Wed, 18 Jun 2014 13:16:11 +0200 Message-ID: <878uouzcw4.fsf@gmail.com> References: <87sin2zijl.fsf@gmail.com> <8761jysfxw.fsf@geodiff-mac3.ulb.ac.be> <87fvj2zfdg.fsf@gmail.com> <87zjhaqyfl.fsf@geodiff-mac3.ulb.ac.be> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1403090218 21841 80.91.229.3 (18 Jun 2014 11:16:58 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 18 Jun 2014 11:16:58 +0000 (UTC) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Wed Jun 18 13:16:52 2014 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1WxDrH-00049f-UC for geh-help-gnu-emacs@m.gmane.org; Wed, 18 Jun 2014 13:16:52 +0200 Original-Received: from localhost ([::1]:56553 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WxDrH-0001GI-Ey for geh-help-gnu-emacs@m.gmane.org; Wed, 18 Jun 2014 07:16:51 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:55595) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WxDr1-0001FW-3o for help-gnu-emacs@gnu.org; Wed, 18 Jun 2014 07:16:40 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WxDqs-0005dX-Gk for help-gnu-emacs@gnu.org; Wed, 18 Jun 2014 07:16:35 -0400 Original-Received: from plane.gmane.org ([80.91.229.3]:42172) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WxDqs-0005cW-AF for help-gnu-emacs@gnu.org; Wed, 18 Jun 2014 07:16:26 -0400 Original-Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1WxDqp-0003eD-Rd for help-gnu-emacs@gnu.org; Wed, 18 Jun 2014 13:16:23 +0200 Original-Received: from e178189070.adsl.alicedsl.de ([85.178.189.70]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 18 Jun 2014 13:16:23 +0200 Original-Received: from tjolitz by e178189070.adsl.alicedsl.de with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 18 Jun 2014 13:16:23 +0200 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 67 Original-X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: e178189070.adsl.alicedsl.de User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) Cancel-Lock: sha1:prDKAEHZcnEsWlbEIbHESUBJD34= X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 80.91.229.3 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:98330 Archived-At: Nicolas Richard writes: > Thorsten Jolitz writes: >>> I don't see why it wouldn't be valid, but I don't know. If it is >>> desirable is another question : it would be better to search for the >>> beginning, then search for the end with another regexp. >> >> That what I did initially, and what is of course much easier, but took >> twice (?) as long too ... > > I'm surprised but I guess I'm being too naive. most likely not, the speed problem might be unrelated, I have to double-check again. >>> Except NUL characters of course. >> >> i.e. zero-byte "^@"? > > Yes, "NUL" is the name you find in most ASCII charts. "zero-byte" less > so, afaict. > >> But Emacs can differentiate between NUL characters and the @ character - > > Of course. One has ascii code 0, the other is 64. > > NUL is represented by ^@ because of > http://en.wikipedia.org/wiki/Caret_notation > > If you hit C-f with point before a NUL, you jump over it ; whereas if > you C-f with point before the two characters ^@ (i.e. not a NUL), cursor > only jumps over the ^. yes, thats what I could expect from a well-behaving Emacs ... >> Often, but not always, the not matched source-blocks contain @ >> characters (but not NUL chars). The strange thing is that the failed >> matching happens with these blocks being part of a really big >> testfile. When I isolate and copy them to a temp buffer and try to match >> them there, it just works. > > If you have a reproducible recipe (even with a big file) it would > certainly help. After double-checking myy test-file again, it seems that the bug was sitting iin front of the computer again. Although thatnice library ert-buffer.el enables me to run buffer tests on rea-wors without *without* modifying them, I had some left-over dangling ,----------- | #+begin_src `----------- delimiters in my test file. I probably called the commands directly (not via ERT), accidentally, and a few things went wrong and left these dangling delimiters in the original file. After undoing this, the DIFF's of the ERT test now show mainly indentation and whitespace differences, which is quite encouraging. Conclusion -> NUL chars in regexps do work, if the testfile isn't messed up. Thx for your input. -- cheers, Thorsten