On 4/2/19 7:15 AM, Mattias EngdegÄrd wrote: > where does a user go to understand extant regexps? A user that *really* wants to know can go read the source code and get confused, just like I did. :-) But I think it's better if the documentation doesn't say what happens. If you prefer that the documentation explicitly say that it doesn't say what happens, I guess that would be OK too (what sort of wording would you like, though?). > (Do we have any latitude at all for changing even obscure corners of > regexp syntax and semantics today?) I would say so, certainly for the raw 8-bit-bytes in ranges stuff (where nobody knows what they mean or even should mean), and possibly even for some of the other rarely-used and questionable uses. > > I've attached the ones found by a modified relint/xr, in case you are interested. Sure! Fixed in the attached patch. > > +A character alternative can include duplicates. For example, > +@samp{[XYa-yYb-zX]} is less clear than @samp{[XYa-z]}. > > Certainly, but does this need to be mentioned? Overlapping ranges are rarely written on purpose. Besides, duplication isn't confined to ranges. That example does contains non-range duplicates. I think duplicates are worth mentioning (if only so that your trawler can point to the style advice if people complain about the trawler being too picky :-). > More useful, I think, would be to recommend ranges to stay within natural sequences (letters, digits, etc) so that a reader needn't consult a table to see what is included. Thus [0-9.:/] good, [.-:] bad, even though they denote the same set. Good idea. I did that in the attached patch, which I just installed into master and I hope addresses the points you raised. I hope that the Thai example doesn't mess things up (I considered doing Arabic, which would have been more fun :-).