On 4/2/19 7:15 AM, Mattias Engdegård wrote:
> where does a user go to understand extant regexps?

A user that *really* wants to know can go read the source code and get
confused, just like I did. :-)

But I think it's better if the documentation doesn't say what happens.
If you prefer that the documentation explicitly say that it doesn't say
what happens, I guess that would be OK too (what sort of wording would
you like, though?).

> (Do we have any latitude at all for changing even obscure corners of
> regexp syntax and semantics today?)

I would say so, certainly for the raw 8-bit-bytes in ranges stuff (where
nobody knows what they mean or even should mean), and possibly even for
some of the other rarely-used and questionable uses.


>
> I've attached the ones found by a modified relint/xr, in case you are interested.

Sure! Fixed in the attached patch.


>
> +A character alternative can include duplicates.  For example,
> +@samp{[XYa-yYb-zX]} is less clear than @samp{[XYa-z]}.
>
> Certainly, but does this need to be mentioned? Overlapping ranges are rarely written on purpose. Besides, duplication isn't confined to ranges.

That example does contains non-range duplicates. I think duplicates are
worth mentioning (if only so that your trawler can point to the style
advice if people complain about the trawler being too picky :-).


> More useful, I think, would be to recommend ranges to stay within natural sequences (letters, digits, etc) so that a reader needn't consult a table to see what is included. Thus [0-9.:/] good, [.-:] bad, even though they denote the same set.
Good idea. I did that in the attached patch, which I just installed into
master and I hope addresses the points you raised. I hope that the Thai
example doesn't mess things up (I considered doing Arabic, which would
have been more fun :-).