unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* regex.c simplification
@ 2018-06-16 15:35 Daniel Colascione
  2018-06-16 15:53 ` Eli Zaretskii
                   ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Daniel Colascione @ 2018-06-16 15:35 UTC (permalink / raw)
  To: emacs-devel

I was doing some work on regex.c just now, and I was frustrated that the
code is unnecessarily complicated by the ifdefs necessary to support some
theoretical non-Emacs use case. Is all of this complexity really
necessary? Are we sure the !emacs case even compiles? Are there non-Emacs
users of the Emacs regex code? Can we just fork the implementation? How
about baking in switches like MATCH_MAY_ALLOCATE?




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: regex.c simplification
  2018-06-16 15:35 regex.c simplification Daniel Colascione
@ 2018-06-16 15:53 ` Eli Zaretskii
  2018-06-16 16:11   ` Paul Eggert
  2018-06-16 16:12   ` Daniel Colascione
  2018-06-16 16:09 ` Noam Postavsky
  2018-06-16 16:35 ` Perry E. Metzger
  2 siblings, 2 replies; 30+ messages in thread
From: Eli Zaretskii @ 2018-06-16 15:53 UTC (permalink / raw)
  To: Daniel Colascione; +Cc: emacs-devel

> Date: Sat, 16 Jun 2018 08:35:34 -0700
> From: "Daniel Colascione" <dancol@dancol.org>
> 
> I was doing some work on regex.c just now, and I was frustrated that the
> code is unnecessarily complicated by the ifdefs necessary to support some
> theoretical non-Emacs use case. Is all of this complexity really
> necessary? Are we sure the !emacs case even compiles? Are there non-Emacs
> users of the Emacs regex code? Can we just fork the implementation? How
> about baking in switches like MATCH_MAY_ALLOCATE?

I think we still haven't abandoned the hope of updating to the latest
glibc/gnulib versions of regex.c, although I'm not sure how practical
these hopes are at this point.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: regex.c simplification
  2018-06-16 15:35 regex.c simplification Daniel Colascione
  2018-06-16 15:53 ` Eli Zaretskii
@ 2018-06-16 16:09 ` Noam Postavsky
  2018-06-16 16:35 ` Perry E. Metzger
  2 siblings, 0 replies; 30+ messages in thread
From: Noam Postavsky @ 2018-06-16 16:09 UTC (permalink / raw)
  To: Daniel Colascione; +Cc: Emacs developers

On 16 June 2018 at 11:35, Daniel Colascione <dancol@dancol.org> wrote:
> I was doing some work on regex.c just now, and I was frustrated that the
> code is unnecessarily complicated by the ifdefs necessary to support some
> theoretical non-Emacs use case. Is all of this complexity really
> necessary? Are we sure the !emacs case even compiles? Are there non-Emacs
> users of the Emacs regex code?

In terms of #ifndef emacs, I believe lib-src/etags.c uses that.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: regex.c simplification
  2018-06-16 15:53 ` Eli Zaretskii
@ 2018-06-16 16:11   ` Paul Eggert
  2018-06-16 16:17     ` Daniel Colascione
                       ` (2 more replies)
  2018-06-16 16:12   ` Daniel Colascione
  1 sibling, 3 replies; 30+ messages in thread
From: Paul Eggert @ 2018-06-16 16:11 UTC (permalink / raw)
  To: Eli Zaretskii, Daniel Colascione; +Cc: emacs-devel

Eli Zaretskii wrote:
> I think we still haven't abandoned the hope of updating to the latest
> glibc/gnulib versions of regex.c, although I'm not sure how practical
> these hopes are at this point.

That's been on my list of things to do for ages. I don't know if it'll ever get 
done, or even whether it's worth doing.

As far as I know, Emacs is the only package that still uses the "old" regex.c 
code derived from pre-2002 glibc. Everybody else has migrated to the "new" 
regex.c code that was contributed to glibc in 2002 and is in Gnulib. So, in some 
sense regex.c has already forked; we just haven't made it official.

A complication: src/regex.c is compiled twice, once within lib-src (for etags) 
and once within src (for Emacs proper), and the "#if defined emacs" stuff in 
src/regex.c matters for this.

If we wanted to make the fork more official, we could simplify src/regex.c to 
not worry about lib-src, by having etags use Glibc/Gnulib regex rather than 
Emacs regex. That would be easy for me to arrange, if you like. Once we did 
that, you could simplify src/regex.c by assuming that 'emacs' is defined. None 
of this would preclude us from eventually merging Emacs src/regex.c with 
Gnulib/glibc, a task that is so hard that the changes Daniel is thinking about 
wouldn't make it much harder.

While we're on the topic, a couple of more comments about regex code.

The "old" and the "new" regex implementations both have problems. The old one 
has serious performance problems in some cases, and fails to conform to POSIX. 
The new one is typically better in both departments, but is so complicated that 
no maintainer understands it (I have attempted to contact the original 
contributor Isamu Hasegawa of Square Enix Co., Ltd., but have never heard back), 
so its (hopefully few) bugs remain unfixed.

The Perl regular expression library is popular in other free software and 
appears to be better maintained than either "old" or "new" regexp code. GNU 
Grep, for example, uses either the "new" regexp code or the Perl library, 
depending on command-line options. The Perl library tends to be more like the 
"old" regex implementation, in that it prefers functionality and flexibility to 
performance; however, it has many more features than the "old" regex code does. 
Among other things, it supports a more-readable regular expression syntax (a 
topic that came up recently on this mailing list in another context).



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: regex.c simplification
  2018-06-16 15:53 ` Eli Zaretskii
  2018-06-16 16:11   ` Paul Eggert
@ 2018-06-16 16:12   ` Daniel Colascione
  2018-06-16 16:43     ` Perry E. Metzger
  1 sibling, 1 reply; 30+ messages in thread
From: Daniel Colascione @ 2018-06-16 16:12 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Daniel Colascione, emacs-devel

>> Date: Sat, 16 Jun 2018 08:35:34 -0700
>> From: "Daniel Colascione" <dancol@dancol.org>
>>
>> I was doing some work on regex.c just now, and I was frustrated that the
>> code is unnecessarily complicated by the ifdefs necessary to support
>> some
>> theoretical non-Emacs use case. Is all of this complexity really
>> necessary? Are we sure the !emacs case even compiles? Are there
>> non-Emacs
>> users of the Emacs regex code? Can we just fork the implementation? How
>> about baking in switches like MATCH_MAY_ALLOCATE?
>
> I think we still haven't abandoned the hope of updating to the latest
> glibc/gnulib versions of regex.c, although I'm not sure how practical
> these hopes are at this point.

I checked out the latest glibc and gnulib sources. Both are so far
diverged that I think updating Emacs to that code is hopeless. (They have
a DFA mode, for example.)




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: regex.c simplification
  2018-06-16 16:11   ` Paul Eggert
@ 2018-06-16 16:17     ` Daniel Colascione
  2018-06-16 18:06     ` Andreas Schwab
  2018-06-18 14:08     ` Stefan Monnier
  2 siblings, 0 replies; 30+ messages in thread
From: Daniel Colascione @ 2018-06-16 16:17 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Eli Zaretskii, Daniel Colascione, emacs-devel

> Eli Zaretskii wrote:
>> I think we still haven't abandoned the hope of updating to the latest
>> glibc/gnulib versions of regex.c, although I'm not sure how practical
>> these hopes are at this point.
>
> That's been on my list of things to do for ages. I don't know if it'll
> ever get
> done, or even whether it's worth doing.
>
> As far as I know, Emacs is the only package that still uses the "old"
> regex.c
> code derived from pre-2002 glibc. Everybody else has migrated to the "new"
> regex.c code that was contributed to glibc in 2002 and is in Gnulib. So,
> in some
> sense regex.c has already forked; we just haven't made it official.
>
> A complication: src/regex.c is compiled twice, once within lib-src (for
> etags)
> and once within src (for Emacs proper), and the "#if defined emacs" stuff
> in
> src/regex.c matters for this.
>
> If we wanted to make the fork more official, we could simplify src/regex.c
> to
> not worry about lib-src, by having etags use Glibc/Gnulib regex rather
> than
> Emacs regex.

That's probably a good idea. The other approach would be to run etags
inside a real Emacs context somehow, and that seems too complicated.

> That would be easy for me to arrange, if you like.

Thanks.

> While we're on the topic, a couple of more comments about regex code.

The regex API could be a lot better too. It'd be nice to expose the
pattern compilation machinery to lisp as some kind of new pattern pvec
object, then let lisp manage the cache. The nice thing about doing it this
way is that you could transparently support having multiple different
kinds of pattern --- e.g., PEGs, PCRE-syntax REs --- and use them
transparently, since you'd be able to supply a pattern object anywhere you
pass a regex string today.




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: regex.c simplification
  2018-06-16 15:35 regex.c simplification Daniel Colascione
  2018-06-16 15:53 ` Eli Zaretskii
  2018-06-16 16:09 ` Noam Postavsky
@ 2018-06-16 16:35 ` Perry E. Metzger
  2018-06-16 16:42   ` Daniel Colascione
  2 siblings, 1 reply; 30+ messages in thread
From: Perry E. Metzger @ 2018-06-16 16:35 UTC (permalink / raw)
  To: Daniel Colascione; +Cc: emacs-devel

On Sat, 16 Jun 2018 08:35:34 -0700 "Daniel Colascione"
<dancol@dancol.org> wrote:
> I was doing some work on regex.c just now, and I was frustrated
> that the code is unnecessarily complicated by the ifdefs necessary
> to support some theoretical non-Emacs use case. Is all of this
> complexity really necessary? Are we sure the !emacs case even
> compiles? Are there non-Emacs users of the Emacs regex code? Can we
> just fork the implementation? How about baking in switches like
> MATCH_MAY_ALLOCATE?

The emacs regex code is hardly state of the art. I would suggest that
there are many other, better, free software implementations of
regexes.

Indeed, arguably at some point the Emacs regex code could use an
overhaul.


Perry
-- 
Perry E. Metzger		perry@piermont.com



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: regex.c simplification
  2018-06-16 16:35 ` Perry E. Metzger
@ 2018-06-16 16:42   ` Daniel Colascione
  2018-06-16 16:55     ` Eli Zaretskii
  0 siblings, 1 reply; 30+ messages in thread
From: Daniel Colascione @ 2018-06-16 16:42 UTC (permalink / raw)
  To: Perry E. Metzger; +Cc: Daniel Colascione, emacs-devel

> On Sat, 16 Jun 2018 08:35:34 -0700 "Daniel Colascione"
> <dancol@dancol.org> wrote:
>> I was doing some work on regex.c just now, and I was frustrated
>> that the code is unnecessarily complicated by the ifdefs necessary
>> to support some theoretical non-Emacs use case. Is all of this
>> complexity really necessary? Are we sure the !emacs case even
>> compiles? Are there non-Emacs users of the Emacs regex code? Can we
>> just fork the implementation? How about baking in switches like
>> MATCH_MAY_ALLOCATE?
>
> The emacs regex code is hardly state of the art. I would suggest that
> there are many other, better, free software implementations of
> regexes.

There are. Unfortunately, none of them understand predicates like \= and \s|.




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: regex.c simplification
  2018-06-16 16:12   ` Daniel Colascione
@ 2018-06-16 16:43     ` Perry E. Metzger
  0 siblings, 0 replies; 30+ messages in thread
From: Perry E. Metzger @ 2018-06-16 16:43 UTC (permalink / raw)
  To: Daniel Colascione; +Cc: Eli Zaretskii, emacs-devel

On Sat, 16 Jun 2018 09:12:34 -0700 "Daniel Colascione"
<dancol@dancol.org> wrote:
> I checked out the latest glibc and gnulib sources. Both are so far
> diverged that I think updating Emacs to that code is hopeless.
> (They have a DFA mode, for example.)
 
That probably performs a whole lot better on large searches. :(

-- 
Perry E. Metzger		perry@piermont.com



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: regex.c simplification
  2018-06-16 16:42   ` Daniel Colascione
@ 2018-06-16 16:55     ` Eli Zaretskii
  2018-06-16 18:24       ` Perry E. Metzger
  0 siblings, 1 reply; 30+ messages in thread
From: Eli Zaretskii @ 2018-06-16 16:55 UTC (permalink / raw)
  To: Daniel Colascione; +Cc: dancol, emacs-devel, perry

> Date: Sat, 16 Jun 2018 09:42:37 -0700
> From: "Daniel Colascione" <dancol@dancol.org>
> Cc: Daniel Colascione <dancol@dancol.org>, emacs-devel@gnu.org
> 
> > The emacs regex code is hardly state of the art. I would suggest that
> > there are many other, better, free software implementations of
> > regexes.
> 
> There are. Unfortunately, none of them understand predicates like \= and \s|.

Right.  And there are a few more features important to Emacs that
other implementations don't support.

So, while I think modernizing our regex code would be a welcome
development, we shouldn't mislead ourselves into thinking that any
other implementation could be a drop-in replacement.  Some work will
be needed to add the features we expect.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: regex.c simplification
  2018-06-16 16:11   ` Paul Eggert
  2018-06-16 16:17     ` Daniel Colascione
@ 2018-06-16 18:06     ` Andreas Schwab
  2018-06-16 19:27       ` Perry E. Metzger
  2018-06-18 14:08     ` Stefan Monnier
  2 siblings, 1 reply; 30+ messages in thread
From: Andreas Schwab @ 2018-06-16 18:06 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Eli Zaretskii, Daniel Colascione, emacs-devel

The problem is that none of the other regex implementations support a
gap.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: regex.c simplification
  2018-06-16 16:55     ` Eli Zaretskii
@ 2018-06-16 18:24       ` Perry E. Metzger
  2018-06-16 18:29         ` Eli Zaretskii
  0 siblings, 1 reply; 30+ messages in thread
From: Perry E. Metzger @ 2018-06-16 18:24 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Daniel Colascione, emacs-devel

On Sat, 16 Jun 2018 19:55:42 +0300 Eli Zaretskii <eliz@gnu.org> wrote:
> > Date: Sat, 16 Jun 2018 09:42:37 -0700
> > From: "Daniel Colascione" <dancol@dancol.org>
> > Cc: Daniel Colascione <dancol@dancol.org>, emacs-devel@gnu.org
> >   
> > > The emacs regex code is hardly state of the art. I would
> > > suggest that there are many other, better, free software
> > > implementations of regexes.  
> > 
> > There are. Unfortunately, none of them understand predicates like
> > \= and \s|.  
> 
> Right.  And there are a few more features important to Emacs that
> other implementations don't support.
> 
> So, while I think modernizing our regex code would be a welcome
> development, we shouldn't mislead ourselves into thinking that any
> other implementation could be a drop-in replacement.  Some work will
> be needed to add the features we expect.
> 

I was arguing in the opposite direction, that there isn't much point
in thinking others will be interested in using the Emacs regex code
in the future.

-- 
Perry E. Metzger		perry@piermont.com



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: regex.c simplification
  2018-06-16 18:24       ` Perry E. Metzger
@ 2018-06-16 18:29         ` Eli Zaretskii
  2018-06-16 18:58           ` Perry E. Metzger
  0 siblings, 1 reply; 30+ messages in thread
From: Eli Zaretskii @ 2018-06-16 18:29 UTC (permalink / raw)
  To: Perry E. Metzger; +Cc: dancol, emacs-devel

> Date: Sat, 16 Jun 2018 14:24:02 -0400
> From: "Perry E. Metzger" <perry@piermont.com>
> Cc: "Daniel Colascione" <dancol@dancol.org>, emacs-devel@gnu.org
> 
> > So, while I think modernizing our regex code would be a welcome
> > development, we shouldn't mislead ourselves into thinking that any
> > other implementation could be a drop-in replacement.  Some work will
> > be needed to add the features we expect.
> > 
> 
> I was arguing in the opposite direction, that there isn't much point
> in thinking others will be interested in using the Emacs regex code
> in the future.

How's that relevant to the issue at hand?



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: regex.c simplification
  2018-06-16 18:29         ` Eli Zaretskii
@ 2018-06-16 18:58           ` Perry E. Metzger
  2018-06-16 19:27             ` Eli Zaretskii
  0 siblings, 1 reply; 30+ messages in thread
From: Perry E. Metzger @ 2018-06-16 18:58 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: dancol, emacs-devel

On Sat, 16 Jun 2018 21:29:55 +0300 Eli Zaretskii <eliz@gnu.org> wrote:
> > Date: Sat, 16 Jun 2018 14:24:02 -0400
> > From: "Perry E. Metzger" <perry@piermont.com>
> > Cc: "Daniel Colascione" <dancol@dancol.org>, emacs-devel@gnu.org
> >   
> > > So, while I think modernizing our regex code would be a welcome
> > > development, we shouldn't mislead ourselves into thinking that
> > > any other implementation could be a drop-in replacement.  Some
> > > work will be needed to add the features we expect.
> > >   
> > 
> > I was arguing in the opposite direction, that there isn't much
> > point in thinking others will be interested in using the Emacs
> > regex code in the future.  
> 
> How's that relevant to the issue at hand?
> 

The original question was "should we keep the code that isn't needed
by emacs on the premise something else might need it someday." I was
implying that, no, the odds that something else would want it someday
seem low.

Perry
-- 
Perry E. Metzger		perry@piermont.com



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: regex.c simplification
  2018-06-16 18:06     ` Andreas Schwab
@ 2018-06-16 19:27       ` Perry E. Metzger
  2018-06-17 16:50         ` Clément Pit-Claudel
  0 siblings, 1 reply; 30+ messages in thread
From: Perry E. Metzger @ 2018-06-16 19:27 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Eli Zaretskii, Paul Eggert, Daniel Colascione, emacs-devel

On Sat, 16 Jun 2018 20:06:43 +0200 Andreas Schwab
<schwab@linux-m68k.org> wrote:
> The problem is that none of the other regex implementations support
> a gap.

Not quite. A couple of them (say TRE) support having a mechanism to
fetch the next character rather than assuming they're present in a
flat array or what have you, which would allow for dealing with a gap
buffer.

Perry
-- 
Perry E. Metzger		perry@piermont.com



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: regex.c simplification
  2018-06-16 18:58           ` Perry E. Metzger
@ 2018-06-16 19:27             ` Eli Zaretskii
  2018-06-18  9:36               ` Robert Pluim
  0 siblings, 1 reply; 30+ messages in thread
From: Eli Zaretskii @ 2018-06-16 19:27 UTC (permalink / raw)
  To: Perry E. Metzger; +Cc: dancol, emacs-devel

> Date: Sat, 16 Jun 2018 14:58:56 -0400
> From: "Perry E. Metzger" <perry@piermont.com>
> Cc: dancol@dancol.org, emacs-devel@gnu.org
> 
> The original question was "should we keep the code that isn't needed
> by emacs on the premise something else might need it someday." I was
> implying that, no, the odds that something else would want it someday
> seem low.

Yes, but the reason to keep the code not needed by Emacs is not
because someone outside of Emacs will want it.  It's because we
ourselves use it in etags.

If we ever import regex from gnulib, then yes, we will have to keep
non-Emacs code also for future merging with gnulib.  But not now.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: regex.c simplification
  2018-06-16 19:27       ` Perry E. Metzger
@ 2018-06-17 16:50         ` Clément Pit-Claudel
  0 siblings, 0 replies; 30+ messages in thread
From: Clément Pit-Claudel @ 2018-06-17 16:50 UTC (permalink / raw)
  To: emacs-devel

On 2018-06-16 15:27, Perry E. Metzger wrote:
> On Sat, 16 Jun 2018 20:06:43 +0200 Andreas Schwab
> <schwab@linux-m68k.org> wrote:
>> The problem is that none of the other regex implementations support
>> a gap.
> 
> Not quite. A couple of them (say TRE) support having a mechanism to
> fetch the next character rather than assuming they're present in a
> flat array or what have you, which would allow for dealing with a gap
> buffer.

Yeah, but TRE is unmaintained, and has open security issues on its tracker :/
PCRE *should* support a gap, but in practice it doesn't (suspending a search and resuming it in another buffer isn't guaranteed to give the same results as it would have on a single contiguous buffer).

There's some relevant context at https://lists.gnu.org/archive/html/emacs-devel/2016-12/msg00622.html

Clément.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: regex.c simplification
  2018-06-16 19:27             ` Eli Zaretskii
@ 2018-06-18  9:36               ` Robert Pluim
  0 siblings, 0 replies; 30+ messages in thread
From: Robert Pluim @ 2018-06-18  9:36 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: dancol, emacs-devel, Perry E. Metzger

Eli Zaretskii <eliz@gnu.org> writes:

>> Date: Sat, 16 Jun 2018 14:58:56 -0400
>> From: "Perry E. Metzger" <perry@piermont.com>
>> Cc: dancol@dancol.org, emacs-devel@gnu.org
>> 
>> The original question was "should we keep the code that isn't needed
>> by emacs on the premise something else might need it someday." I was
>> implying that, no, the odds that something else would want it someday
>> seem low.
>
> Yes, but the reason to keep the code not needed by Emacs is not
> because someone outside of Emacs will want it.  It's because we
> ourselves use it in etags.
>

We could switch to external etags, and remove our copy. Iʼm assuming
there are differences between the two implementations, but I donʼt
know exactly what they are.

Robert



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: regex.c simplification
  2018-06-16 16:11   ` Paul Eggert
  2018-06-16 16:17     ` Daniel Colascione
  2018-06-16 18:06     ` Andreas Schwab
@ 2018-06-18 14:08     ` Stefan Monnier
  2018-07-17 23:58       ` Paul Eggert
  2 siblings, 1 reply; 30+ messages in thread
From: Stefan Monnier @ 2018-06-18 14:08 UTC (permalink / raw)
  To: emacs-devel

> If we wanted to make the fork more official, we could simplify src/regex.c
> to not worry about lib-src, by having etags use Glibc/Gnulib regex rather
> than Emacs regex.

I would welcome such a change.


        Stefan




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: regex.c simplification
  2018-06-18 14:08     ` Stefan Monnier
@ 2018-07-17 23:58       ` Paul Eggert
  2018-07-20  0:33         ` Stefan Monnier
  0 siblings, 1 reply; 30+ messages in thread
From: Paul Eggert @ 2018-07-17 23:58 UTC (permalink / raw)
  To: Stefan Monnier, emacs-devel

Stefan Monnier wrote:
>> If we wanted to make the fork more official, we could simplify src/regex.c
>> to not worry about lib-src, by having etags use Glibc/Gnulib regex rather
>> than Emacs regex.
> I would welcome such a change.

I started the ball rolling by writing a patch that changes etags to use Glibc 
regex, falling back on a Gnulib copy if Glibc is not available; see Bug#32194. 
We can follow up later by simplifying the Emacs-only regex code to assume Emacs.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: regex.c simplification
  2018-07-17 23:58       ` Paul Eggert
@ 2018-07-20  0:33         ` Stefan Monnier
  2018-07-20  0:59           ` Paul Eggert
  0 siblings, 1 reply; 30+ messages in thread
From: Stefan Monnier @ 2018-07-20  0:33 UTC (permalink / raw)
  To: emacs-devel

>>> If we wanted to make the fork more official, we could simplify src/regex.c
>>> to not worry about lib-src, by having etags use Glibc/Gnulib regex rather
>>> than Emacs regex.
>> I would welcome such a change.
> I started the ball rolling by writing a patch that changes etags to use
> Glibc regex, falling back on a Gnulib copy if Glibc is not available; see
> Bug#32194. We can follow up later by simplifying the Emacs-only regex code
> to assume Emacs.

I wonder: does etags use regexps internally, or only to handle
user-provided "--regex" arguments?
More specifically, does it come with its own set of hardcoded regexps?


        Stefan




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: regex.c simplification
  2018-07-20  0:33         ` Stefan Monnier
@ 2018-07-20  0:59           ` Paul Eggert
  2018-07-20  1:42             ` Stefan Monnier
  2018-07-20  6:58             ` Eli Zaretskii
  0 siblings, 2 replies; 30+ messages in thread
From: Paul Eggert @ 2018-07-20  0:59 UTC (permalink / raw)
  To: Stefan Monnier, emacs-devel

Stefan Monnier wrote:
> does etags use regexps internally, or only to handle
> user-provided "--regex" arguments?
> More specifically, does it come with its own set of hardcoded regexps?

No, etags uses the regexp code only for --regex arguments. It would of course be 
simpler to disable --regex on platforms lacking the glibc regex API. However, my 
impression is that etags --regex gets some use. For example:

https://stackoverflow.com/questions/21283687/what-do-you-put-in-your-standard-etags-regex-calls
http://xahlee.info/comp/ctags_etags_gtags.html



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: regex.c simplification
  2018-07-20  0:59           ` Paul Eggert
@ 2018-07-20  1:42             ` Stefan Monnier
  2018-07-20  6:59               ` Eli Zaretskii
  2018-07-20  6:58             ` Eli Zaretskii
  1 sibling, 1 reply; 30+ messages in thread
From: Stefan Monnier @ 2018-07-20  1:42 UTC (permalink / raw)
  To: emacs-devel

>> does etags use regexps internally, or only to handle
>> user-provided "--regex" arguments?
>> More specifically, does it come with its own set of hardcoded regexps?
> No, etags uses the regexp code only for --regex arguments. It would of
> course be simpler to disable --regex on platforms lacking the glibc regex
> API. However, my impression is that etags --regex gets some use. For
> example:
> https://stackoverflow.com/questions/21283687/what-do-you-put-in-your-standard-etags-regex-calls
> http://xahlee.info/comp/ctags_etags_gtags.html

I was thinking of just always using the libc regexp code (whether it's
GNU libc or something else).


        Stefan




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: regex.c simplification
  2018-07-20  0:59           ` Paul Eggert
  2018-07-20  1:42             ` Stefan Monnier
@ 2018-07-20  6:58             ` Eli Zaretskii
  1 sibling, 0 replies; 30+ messages in thread
From: Eli Zaretskii @ 2018-07-20  6:58 UTC (permalink / raw)
  To: Paul Eggert; +Cc: monnier, emacs-devel

> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Thu, 19 Jul 2018 17:59:12 -0700
> 
> No, etags uses the regexp code only for --regex arguments. It would of course be 
> simpler to disable --regex on platforms lacking the glibc regex API. However, my 
> impression is that etags --regex gets some use. For example:
> 
> https://stackoverflow.com/questions/21283687/what-do-you-put-in-your-standard-etags-regex-calls
> http://xahlee.info/comp/ctags_etags_gtags.html

We actually use the --regex switch in our own Makefile, for the TAGS
target, see src/Makefile.in.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: regex.c simplification
  2018-07-20  1:42             ` Stefan Monnier
@ 2018-07-20  6:59               ` Eli Zaretskii
  2018-07-20 21:49                 ` Paul Eggert
  0 siblings, 1 reply; 30+ messages in thread
From: Eli Zaretskii @ 2018-07-20  6:59 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Thu, 19 Jul 2018 21:42:48 -0400
> 
> I was thinking of just always using the libc regexp code (whether it's
> GNU libc or something else).

Yes, that'd be a possibility.  Do we have any supported platform that
does NOT have its own regexp code, whether in libc or as a separate
library?



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: regex.c simplification
  2018-07-20  6:59               ` Eli Zaretskii
@ 2018-07-20 21:49                 ` Paul Eggert
  2018-07-21  6:43                   ` Eli Zaretskii
  0 siblings, 1 reply; 30+ messages in thread
From: Paul Eggert @ 2018-07-20 21:49 UTC (permalink / raw)
  To: Eli Zaretskii, Stefan Monnier; +Cc: emacs-devel

On 07/19/2018 11:59 PM, Eli Zaretskii wrote:
>> I was thinking of just always using the libc regexp code (whether it's
>> GNU libc or something else).
> Yes, that'd be a possibility.  Do we have any supported platform that
> does NOT have its own regexp code, whether in libc or as a separate
> library?
>
Every POSIX-conforming platform has regexp code somewhere, using the 
POSIX API. However, I can see some trouble using that code:

* Some of libc regex implementations have been reasonably buggy. Most 
GNU apps don't use these implementations any more so I'm not sure what 
their status is.

* We may need to use an option like -lregex to get the system library 
implementation, and that would have to be configured.

* Perhaps 'etags' users are using GNU extensions in their regular 
expressions, and if we switch to the libc API their usage will break.

* You're the expert, but as far as I know MS-Windows does not support 
the POSIX API so presumably we'd have to provide a substitute anyway, 
for MS-Windows.

* etags uses the GNU API so it would have to be changed to use the POSIX 
API.




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: regex.c simplification
  2018-07-20 21:49                 ` Paul Eggert
@ 2018-07-21  6:43                   ` Eli Zaretskii
  2018-07-21  7:17                     ` Paul Eggert
  0 siblings, 1 reply; 30+ messages in thread
From: Eli Zaretskii @ 2018-07-21  6:43 UTC (permalink / raw)
  To: Paul Eggert; +Cc: monnier, emacs-devel

> Cc: emacs-devel@gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Fri, 20 Jul 2018 14:49:15 -0700
> 
> On 07/19/2018 11:59 PM, Eli Zaretskii wrote:
> >> I was thinking of just always using the libc regexp code (whether it's
> >> GNU libc or something else).
> > Yes, that'd be a possibility.  Do we have any supported platform that
> > does NOT have its own regexp code, whether in libc or as a separate
> > library?
> >
> Every POSIX-conforming platform has regexp code somewhere, using the 
> POSIX API. However, I can see some trouble using that code:
> 
> * Some of libc regex implementations have been reasonably buggy. Most 
> GNU apps don't use these implementations any more so I'm not sure what 
> their status is.
> 
> * We may need to use an option like -lregex to get the system library 
> implementation, and that would have to be configured.
> 
> * Perhaps 'etags' users are using GNU extensions in their regular 
> expressions, and if we switch to the libc API their usage will break.

We could recommend such users to install GNU regexp, which AFAIK
exposes the Posix API as well.

> * You're the expert, but as far as I know MS-Windows does not support 
> the POSIX API so presumably we'd have to provide a substitute anyway, 
> for MS-Windows.

GNU regexp is available as a separate library on Windows, I used it in
several ports of GNU and Unix packages.

> * etags uses the GNU API so it would have to be changed to use the POSIX 
> API.

Right.

There's still the alternative which I asked about a couple of days
ago: use the Gnulib regexp without the additional code pulled in by
mbrtowc, I hope that's a viable option.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: regex.c simplification
  2018-07-21  6:43                   ` Eli Zaretskii
@ 2018-07-21  7:17                     ` Paul Eggert
  2018-08-01  0:17                       ` Paul Eggert
  0 siblings, 1 reply; 30+ messages in thread
From: Paul Eggert @ 2018-07-21  7:17 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: monnier, emacs-devel

Eli Zaretskii wrote:

>> * Perhaps 'etags' users are using GNU extensions in their regular
>> expressions, and if we switch to the libc API their usage will break.
> 
> We could recommend such users to install GNU regexp, which AFAIK
> exposes the Posix API as well.

I assume you mean GNU regex. That project is long dead, and has been superseded 
by Gnulib. I would not recommend it for Emacs usage. See:

https://www.gnu.org/software/regex/

> There's still the alternative which I asked about a couple of days
> ago: use the Gnulib regexp without the additional code pulled in by
> mbrtowc, I hope that's a viable option.

Yes, I've built that and am testing it. I plan to report back soon.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: regex.c simplification
  2018-07-21  7:17                     ` Paul Eggert
@ 2018-08-01  0:17                       ` Paul Eggert
  2018-08-01  2:38                         ` Brett Gilio
  0 siblings, 1 reply; 30+ messages in thread
From: Paul Eggert @ 2018-08-01  0:17 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: monnier, emacs-devel

On 07/21/2018 12:17 AM, Paul Eggert wrote:
>> There's still the alternative which I asked about a couple of days
>> ago: use the Gnulib regexp without the additional code pulled in by
>> mbrtowc, I hope that's a viable option.
>
> Yes, I've built that and am testing it. I plan to report back soon. 

I tested it a bit, simplified the regex code on the Emacs side, and sent 
a new set of patches here:

https://bugs.gnu.org/32194#11

This eliminates about 2500 lines of Emacs C source code, yeay! More 
improvement could be done, but it is getting time to merge in what I've got.




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: regex.c simplification
  2018-08-01  0:17                       ` Paul Eggert
@ 2018-08-01  2:38                         ` Brett Gilio
  0 siblings, 0 replies; 30+ messages in thread
From: Brett Gilio @ 2018-08-01  2:38 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Eli Zaretskii, monnier, emacs-devel


Paul Eggert writes:

> On 07/21/2018 12:17 AM, Paul Eggert wrote:
>>> There's still the alternative which I asked about a couple of days
>>> ago: use the Gnulib regexp without the additional code pulled in by
>>> mbrtowc, I hope that's a viable option.
>>
>> Yes, I've built that and am testing it. I plan to report back soon. 
>
> I tested it a bit, simplified the regex code on the Emacs side, and sent a new
> set of patches here:
>
> https://bugs.gnu.org/32194#11
>
> This eliminates about 2500 lines of Emacs C source code, yeay! More improvement
> could be done, but it is getting time to merge in what I've got.


Thank you for your work, Paul. It is nice to see the source code getting
a little bit lighter, rather than to the contrary.



-- 
Brett M. Gilio
Free Software Foundation, Member
https://parabola.nu | https://emacs.org



^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2018-08-01  2:38 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-16 15:35 regex.c simplification Daniel Colascione
2018-06-16 15:53 ` Eli Zaretskii
2018-06-16 16:11   ` Paul Eggert
2018-06-16 16:17     ` Daniel Colascione
2018-06-16 18:06     ` Andreas Schwab
2018-06-16 19:27       ` Perry E. Metzger
2018-06-17 16:50         ` Clément Pit-Claudel
2018-06-18 14:08     ` Stefan Monnier
2018-07-17 23:58       ` Paul Eggert
2018-07-20  0:33         ` Stefan Monnier
2018-07-20  0:59           ` Paul Eggert
2018-07-20  1:42             ` Stefan Monnier
2018-07-20  6:59               ` Eli Zaretskii
2018-07-20 21:49                 ` Paul Eggert
2018-07-21  6:43                   ` Eli Zaretskii
2018-07-21  7:17                     ` Paul Eggert
2018-08-01  0:17                       ` Paul Eggert
2018-08-01  2:38                         ` Brett Gilio
2018-07-20  6:58             ` Eli Zaretskii
2018-06-16 16:12   ` Daniel Colascione
2018-06-16 16:43     ` Perry E. Metzger
2018-06-16 16:09 ` Noam Postavsky
2018-06-16 16:35 ` Perry E. Metzger
2018-06-16 16:42   ` Daniel Colascione
2018-06-16 16:55     ` Eli Zaretskii
2018-06-16 18:24       ` Perry E. Metzger
2018-06-16 18:29         ` Eli Zaretskii
2018-06-16 18:58           ` Perry E. Metzger
2018-06-16 19:27             ` Eli Zaretskii
2018-06-18  9:36               ` Robert Pluim

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).