unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#4209: 23.1; Emacs 23.1 regression in re-search-forward
@ 2009-08-20  0:50 Christopher J. Madsen
  2010-01-26 20:38 ` Chong Yidong
  0 siblings, 1 reply; 11+ messages in thread
From: Christopher J. Madsen @ 2009-08-20  0:50 UTC (permalink / raw)
  To: bug-gnu-emacs; +Cc: cjm

I've found a regression in Emacs 23.1 (versus Emacs 22.3).  I've
narrowed it down to this test case:

;--- re-bug.el starts here
(set-buffer (get-buffer-create "*Test Buffer*"))

(insert "\xC2\x4C\xEF\x77\xC6\x69\x8C\x0A")

(goto-char (point-min))

(message "looking-at: %s" (looking-at "\\`\xC2\x4C\xEF\x77\xC6\x69\x8C\x0A"))

(message "re-search-forward: %s"
         (re-search-forward "\\`\xC2\x4C\xEF\x77\xC6\x69\x8C\x0A" 100 t))
;--- re-bug.el ends here

Then at the command line:

$ emacs-22 --batch -Q -l re-bug.el
looking-at: t
re-search-forward: 9

$ emacs-23 --batch -Q -l re-bug.el
looking-at: t
re-search-forward: nil


As you can see, looking-at succeeds in both versions, but
re-search-forward fails in Emacs 23.  I don't know why.  It seems like
the functions should either both succeed or both fail.


For comparison, here's the version of Emacs 22 that I'm using:
GNU Emacs 22.3.1 (i686-pc-linux-gnu, GTK+ Version 2.12.11)
 of 2009-04-06 on byte
Windowing system distributor `The X.Org Foundation', version 11.0.10503000
configured using `configure  '--prefix=/usr' '--host=i686-pc-linux-gnu' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--datadir=/usr/share' '--sysconfdir=/etc' '--localstatedir=/var/lib' '--program-suffix=-emacs-22' '--infodir=/usr/share/info/emacs-22' '--without-carbon' '--with-sound' '--with-x' '--with-toolkit-scroll-bars' '--with-jpeg' '--with-tiff' '--with-gif' '--with-png' '--with-xpm' '--with-x-toolkit=gtk' '--without-hesiod' '--without-kerberos' '--without-kerberos5' '--build=i686-pc-linux-gnu' 'build_alias=i686-pc-linux-gnu' 'host_alias=i686-pc-linux-gnu' 'CFLAGS=-march=prescott -O2 -pipe' 'LDFLAGS=-Wl,-O1''


And this is the Emacs 23 information:

In GNU Emacs 23.1.1 (i686-pc-linux-gnu, GTK+ Version 2.16.5)
 of 2009-08-10 on byte
Windowing system distributor `The X.Org Foundation', version 11.0.10503000
configured using `configure  '--prefix=/usr' '--host=i686-pc-linux-gnu' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--datadir=/usr/share' '--sysconfdir=/etc' '--localstatedir=/var/lib' '--program-suffix=-emacs-23' '--infodir=/usr/share/info/emacs-23' '--with-sound' '--with-x' '--with-toolkit-scroll-bars' '--with-gif' '--with-jpeg' '--with-png' '--with-rsvg' '--with-tiff' '--with-xpm' '--without-xft' '--without-libotf' '--without-m17n-flt' '--with-x-toolkit=gtk' '--without-hesiod' '--without-kerberos' '--without-kerberos5' '--with-gpm' '--with-dbus' '--build=i686-pc-linux-gnu' 'build_alias=i686-pc-linux-gnu' 'host_alias=i686-pc-linux-gnu' 'CFLAGS=-march=core2 -O2 -pipe' 'LDFLAGS=-Wl,-O1''

Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: nil
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: nil
  value of $LANG: en_US.utf8
  value of $XMODIFIERS: nil
  locale-coding-system: utf-8-unix
  default-enable-multibyte-characters: t

Major mode: Fundamental







^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#4209: 23.1; Emacs 23.1 regression in re-search-forward
@ 2009-12-02  0:21 Matthew Dempsky
  0 siblings, 0 replies; 11+ messages in thread
From: Matthew Dempsky @ 2009-12-02  0:21 UTC (permalink / raw)
  To: 4209

This is a stab in the dark, but the patch below corrects this issue for me:

    $ ./retest.sh
    looking-at: t
    re-search-forward: 9

I don't see any reason this should cause regressions (searching
forward 0 steps seems to me it should be the same as searching
backward 0 steps), but I've only casually looked over regex.c.

--- a/src/regex.c
+++ b/src/regex.c
@@ -4524,7 +4524,7 @@ re_search_2 (bufp, str1, size1, str2, size2,
startpos, range, regs, stop)

          d = POS_ADDR_VSTRING (startpos);

-         if (range > 0)        /* Searching forwards.  */
+         if (range >= 0)       /* Searching forwards.  */
            {
              register int lim = 0;
              int irange = range;





^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#4209: 23.1; Emacs 23.1 regression in re-search-forward
@ 2009-12-07  3:30 Christopher J. Madsen
  0 siblings, 0 replies; 11+ messages in thread
From: Christopher J. Madsen @ 2009-12-07  3:30 UTC (permalink / raw)
  To: 4209

Matthew's patch corrects the problem for me, too.  (Even though that
line did not change between 22.3 and 23.1.)

Thanks, Matthew.  This bug had been preventing me from upgrading to 23.






^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#4209: 23.1; Emacs 23.1 regression in re-search-forward
  2009-08-20  0:50 bug#4209: 23.1; Emacs 23.1 regression in re-search-forward Christopher J. Madsen
@ 2010-01-26 20:38 ` Chong Yidong
  2010-01-27  3:43   ` Kenichi Handa
  0 siblings, 1 reply; 11+ messages in thread
From: Chong Yidong @ 2010-01-26 20:38 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: 4209

Hi Handa-san,

Could you try to investigate Bug#4209?  I took a quick look, and the
contents of the Lisp string passed to Fre_search_forward in Emacs 23 is

$2 = (struct Lisp_String *) 0x86765b8
"\\`\302L\357w\306i\214\n"

but in Emacs 22 (where this test works) it's

$2 = (struct Lisp_String *) 0x86290e8
"\\`\302L\357w\306i\236\254\n"

which seems a little strange to me.


> I've found a regression in Emacs 23.1 (versus Emacs 22.3).  I've
> narrowed it down to this test case:
>
> ;--- re-bug.el starts here
> (set-buffer (get-buffer-create "*Test Buffer*"))
>
> (insert "\xC2\x4C\xEF\x77\xC6\x69\x8C\x0A")
>
> (goto-char (point-min))
>
> (message "looking-at: %s" (looking-at "\\`\xC2\x4C\xEF\x77\xC6\x69\x8C\x0A"))
>
> (message "re-search-forward: %s"
>          (re-search-forward "\\`\xC2\x4C\xEF\x77\xC6\x69\x8C\x0A" 100 t))
> ;--- re-bug.el ends here
>
> Then at the command line:
>
> $ emacs-22 --batch -Q -l re-bug.el
> looking-at: t
> re-search-forward: 9
>
> $ emacs-23 --batch -Q -l re-bug.el
> looking-at: t
> re-search-forward: nil






^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#4209: 23.1; Emacs 23.1 regression in re-search-forward
  2010-01-26 20:38 ` Chong Yidong
@ 2010-01-27  3:43   ` Kenichi Handa
  2010-01-27  5:41     ` Kenichi Handa
  0 siblings, 1 reply; 11+ messages in thread
From: Kenichi Handa @ 2010-01-27  3:43 UTC (permalink / raw)
  To: Chong Yidong; +Cc: 4209

In article <87ljfkha9i.fsf@stupidchicken.com>, Chong Yidong <cyd@stupidchicken.com> writes:

> Hi Handa-san,
> Could you try to investigate Bug#4209?

Ok, I'll work on it.

> I took a quick look, and the
> contents of the Lisp string passed to Fre_search_forward in Emacs 23 is

> $2 = (struct Lisp_String *) 0x86765b8
> "\\`\302L\357w\306i\214\n"

> but in Emacs 22 (where this test works) it's

> $2 = (struct Lisp_String *) 0x86290e8
> "\\`\302L\357w\306i\236\254\n"

> which seems a little strange to me.

It seems that Emacs 22 provoides a multibyte string (perhaps
because the searching buffer is multibyte) and Emacs 23
provoides a unibyte string.  But, I think that difference is
not important here.

---
Kenichi Handa
handa@m17n.org






^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#4209: 23.1; Emacs 23.1 regression in re-search-forward
  2010-01-27  3:43   ` Kenichi Handa
@ 2010-01-27  5:41     ` Kenichi Handa
  2010-01-27 14:34       ` Stefan Monnier
  0 siblings, 1 reply; 11+ messages in thread
From: Kenichi Handa @ 2010-01-27  5:41 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: 4209, cyd

In article <tl7my00w6tw.fsf@m17n.org>, Kenichi Handa <handa@m17n.org> writes:

> In article <87ljfkha9i.fsf@stupidchicken.com>, Chong Yidong <cyd@stupidchicken.com> writes:
> > Hi Handa-san,
> > Could you try to investigate Bug#4209?

> Ok, I'll work on it.

I fixed it as below.

=== modified file 'src/regex.c'
--- src/regex.c	2010-01-13 08:35:10 +0000
+++ src/regex.c	2010-01-27 03:57:03 +0000
@@ -4083,8 +4083,7 @@
 		     the corresponding multibyte character.  */
 		  int c = RE_CHAR_TO_MULTIBYTE (p[1]);
 
-		  if (! CHAR_BYTE8_P (c))
-		    fastmap[CHAR_LEADING_CODE (c)] = 1;
+		  fastmap[CHAR_LEADING_CODE (c)] = 1;
 		}
 	    }
 	  break;

But, first of all, I don't know (remember) why there was this check:

   if (! CHAR_BYTE8_P (c))

I may have overlooked something.  Stefan, could you please
confirm that the above change is correct?

---
Kenichi Handa
handa@m17n.org






^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#4209: 23.1; Emacs 23.1 regression in re-search-forward
  2010-01-27  5:41     ` Kenichi Handa
@ 2010-01-27 14:34       ` Stefan Monnier
  2010-01-27 16:43         ` Chong Yidong
  2010-01-28  1:18         ` Kenichi Handa
  0 siblings, 2 replies; 11+ messages in thread
From: Stefan Monnier @ 2010-01-27 14:34 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: 4209, cyd

>> > Could you try to investigate Bug#4209?
>> Ok, I'll work on it.
> I fixed it as below.

> === modified file 'src/regex.c'
> --- src/regex.c	2010-01-13 08:35:10 +0000
> +++ src/regex.c	2010-01-27 03:57:03 +0000
> @@ -4083,8 +4083,7 @@
>  		     the corresponding multibyte character.  */
>  		  int c = RE_CHAR_TO_MULTIBYTE (p[1]);
 
> -		  if (! CHAR_BYTE8_P (c))
> -		    fastmap[CHAR_LEADING_CODE (c)] = 1;
> +		  fastmap[CHAR_LEADING_CODE (c)] = 1;
>  		}
>  	    }
>  	  break;

> But, first of all, I don't know (remember) why there was this check:

>    if (! CHAR_BYTE8_P (c))

> I may have overlooked something.  Stefan, could you please
> confirm that the above change is correct?

The preceding comment keeps me puzzled.  I thought that we only ever
matched re_patterns and buffers of the same multibyteness, i.e. if
a unibyte regexp is matched against a multibyte buffer it should first
be turned into a multibyte regexp and then re_compiled, so the case of:

		  /* For the case of matching this unibyte regex
		     against multibyte, we must set a leading code of
		     the corresponding multibyte character.  */

should never happen in analyse_first.  Yet, if your patch fixes the bug,
that indicates that apparently it *does* happen.


        Stefan






^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#4209: 23.1; Emacs 23.1 regression in re-search-forward
  2010-01-27 14:34       ` Stefan Monnier
@ 2010-01-27 16:43         ` Chong Yidong
  2010-01-28  1:18         ` Kenichi Handa
  1 sibling, 0 replies; 11+ messages in thread
From: Chong Yidong @ 2010-01-27 16:43 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 4209

Stefan Monnier <monnier@iro.umontreal.ca> writes:

> The preceding comment keeps me puzzled.  I thought that we only ever
> matched re_patterns and buffers of the same multibyteness, i.e. if
> a unibyte regexp is matched against a multibyte buffer it should first
> be turned into a multibyte regexp and then re_compiled, so the case of:
>
> 		  /* For the case of matching this unibyte regex
> 		     against multibyte, we must set a leading code of
> 		     the corresponding multibyte character.  */
>
> should never happen in analyse_first.  Yet, if your patch fixes the bug,
> that indicates that apparently it *does* happen.

I observe that in the original bug recipe:

  (set-buffer (get-buffer-create "*Test Buffer*"))
  (insert "\xC2\x4C\xEF\x77\xC6\x69\x8C\x0A")
  (goto-char (point-min))
  (message "looking-at: %s" (looking-at "\\`\xC2\x4C\xEF\x77\xC6\x69\x8C\x0A"))
  (message "re-search-forward: %s"
           (re-search-forward "\\`\xC2\x4C\xEF\x77\xC6\x69\x8C\x0A" 100 t))

If we replace

  (re-search-forward "\\`\xC2\x4C\xEF\x77\xC6\x69\x8C\x0A" 100 t))

with

  (re-search-forward (string-to-multibyte
                       "\\`\xC2\x4C\xEF\x77\xC6\x69\x8C\x0A") 100 t))

then the regexp match takes places correctly.  I'm not sure why the
looking-at call works, tho.






^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#4209: 23.1; Emacs 23.1 regression in re-search-forward
  2010-01-27 14:34       ` Stefan Monnier
  2010-01-27 16:43         ` Chong Yidong
@ 2010-01-28  1:18         ` Kenichi Handa
  2010-01-28 19:01           ` Stefan Monnier
  1 sibling, 1 reply; 11+ messages in thread
From: Kenichi Handa @ 2010-01-28  1:18 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 4209, cyd

In article <jwviqanliww.fsf-monnier+emacs@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes:

> The preceding comment keeps me puzzled.  I thought that we only ever
> matched re_patterns and buffers of the same multibyteness, i.e. if
> a unibyte regexp is matched against a multibyte buffer it should first
> be turned into a multibyte regexp and then re_compiled, so the case of:

Before we changed the behavour of unibyte->multibyte
conversion, that conversion depended on the preferred
charset (thus on lang. env.).  But, Emacs 22 wrongly cached
the pattern converted at some point, and reused it without
checking the change of preferred charset.

So, in emacs-unicode branch, I fixed the regex code so that
unibyte pattern can be directry used for multibyte buffer
search by doing unibyte->multibyte conversion on the fly.
And that code was merged to trunk.

So, 

> 		  /* For the case of matching this unibyte regex
> 		     against multibyte, we must set a leading code of
> 		     the corresponding multibyte character.  */

really happens.

---
Kenichi Handa
handa@m17n.org






^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#4209: 23.1; Emacs 23.1 regression in re-search-forward
  2010-01-28  1:18         ` Kenichi Handa
@ 2010-01-28 19:01           ` Stefan Monnier
  2010-01-29  6:15             ` Kenichi Handa
  0 siblings, 1 reply; 11+ messages in thread
From: Stefan Monnier @ 2010-01-28 19:01 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: 4209, cyd

> So, in emacs-unicode branch, I fixed the regex code so that
> unibyte pattern can be directry used for multibyte buffer
> search by doing unibyte->multibyte conversion on the fly.
> And that code was merged to trunk.

Hmm... that's too bad since the subsequent change to get rid of the
dependency on locales made this change unnecessary.

But given this, yes, the patch looks right, and no, I have no idea what
the CHAR_BYTE8_P test was trying to do.


        Stefan






^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#4209: 23.1; Emacs 23.1 regression in re-search-forward
  2010-01-28 19:01           ` Stefan Monnier
@ 2010-01-29  6:15             ` Kenichi Handa
  0 siblings, 0 replies; 11+ messages in thread
From: Kenichi Handa @ 2010-01-29  6:15 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 4209, cyd

In article <jwvvdemoy0b.fsf-monnier+emacs@gnu.org>, Stefan Monnier <monnier@IRO.UMontreal.CA> writes:

> > So, in emacs-unicode branch, I fixed the regex code so that
> > unibyte pattern can be directry used for multibyte buffer
> > search by doing unibyte->multibyte conversion on the fly.
> > And that code was merged to trunk.

> Hmm... that's too bad since the subsequent change to get rid of the
> dependency on locales made this change unnecessary.

Yes.  I'll put this in my todo list (but with lower priority).

   * avoid on-the-fly uni<->multi conversion in regex.c.

> But given this, yes, the patch looks right, and no, I have no idea what
> the CHAR_BYTE8_P test was trying to do.

Ok, thank you for the confirmation.

---
Kenichi Handa
handa@m17n.org






^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-01-29  6:15 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-08-20  0:50 bug#4209: 23.1; Emacs 23.1 regression in re-search-forward Christopher J. Madsen
2010-01-26 20:38 ` Chong Yidong
2010-01-27  3:43   ` Kenichi Handa
2010-01-27  5:41     ` Kenichi Handa
2010-01-27 14:34       ` Stefan Monnier
2010-01-27 16:43         ` Chong Yidong
2010-01-28  1:18         ` Kenichi Handa
2010-01-28 19:01           ` Stefan Monnier
2010-01-29  6:15             ` Kenichi Handa
  -- strict thread matches above, loose matches on Subject: below --
2009-12-02  0:21 Matthew Dempsky
2009-12-07  3:30 Christopher J. Madsen

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).