Mattias Engdegård wrote: > > don't we also need a precise description of exactly how they are interpreted by the engine? In other parts of Emacs, we are typically OK with specs that don't completely specify behavior. This gives us more freedom to make changes in the undocumented behavior later. I think it makes sense to do that here too, for regular expressions like "[z-a-m]" that most readers would find confusing. > I'm with Stefan here; `-' should go last. Anything else is a gritty detail. Stefan already changed the doc in master to say that. The attached patch tightens up the wording (and still says that "-" should go last). > Documenting differences from POSIX regexps is useful. Do you prefer having those differences being spread out, or all concentrated into one section? I don't have a strong preference. I wrote it concentrated originally, and that form seems to work well. > These days, a user may be more familiar with the various PCRE dialects than traditional or extended POSIX. Should that be taken into account? It might be helpful. However, PCRE is further away from Emacs regexps than POSIX is, and a comparison of PCRE and POSIX regexps is probably best put into a different section. It's not a section I'd like to write, to be honest; PCRE is pretty hairy. > The terminology is a bit confusing. Is 'raw 8-bit byte' included in 'unibyte'? Is \x7f ever a raw 8-bit byte? > I agree that [å-\xff], say, should be invalid but I've never seen such constructs. After looking into it I realized that I don't really know the semantics here (the text I recently added there seems to be wrong, in some cases), and I have my doubts that anyone else knows the semantics either. The attached patch simply gets rid of that section, leaving the area undocumented. User beware! > It already does, and some bugs were found that way. As a special case, it no longer complains about z-a because that is unlikely to be an accident and occurs in some code on purpose. OK, then we should document z-a as the preferred syntax (best go with the flow...). Done in the attached patch. > As an experiment, I added detection of 'chained' ranges like [a-m-z] to xr and found a handful in both Emacs and GNU ELPA, but none of them carried a freeload of bugs. Keeping that check didn't seem worthwhile; the regexps may be a bit odd-looking, but aren't wrong. It depends on what one means by "wrong". If one wants to use the ranges in both Emacs and grep they are "wrong", so it's reasonable for the manual to recommend against them. > a rule finding [X-Y] where Y=X+1 found one or two questionable cases in a sea of false positives (also in the attachment). It might also help for the trawler to warn about [X-Z] where Z = X+2. [XYZ] is clearer and less error-prone than [X-Z]. I shoehorned that into the attached patch too.