all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: "Mattias Engdegård" <mattiase@acm.org>
To: Eli Zaretskii <eliz@gnu.org>
Cc: Emacs developers <emacs-devel@gnu.org>
Subject: Re: master 544db1e: Faster grep pattern for identifiers
Date: Wed, 15 Sep 2021 18:29:31 +0200	[thread overview]
Message-ID: <9D076A20-098C-463D-B879-65C3A5443D26@acm.org> (raw)
In-Reply-To: <83h7elbzo3.fsf@gnu.org>

15 sep. 2021 kl. 17.56 skrev Eli Zaretskii <eliz@gnu.org>:

> Doesn't this change the semantics of the "word"?  The Grep notion of
> the word is not necessarily identical to that of Emacs, since the
> latter depends on the major mode.  The comment in the deleted code
> says that much, AFAICT.  Or what am I missing?

Sorry, I should have written a more descriptive commit message.

First of all, there is no risk for false positives because the grep output is filtered for occurrence of the sought identifier in post-processing. Thus, the only correctness risk is for false negatives.

The effect of -w is to reject matches with a word char immediately before or after a match. This is exactly what the previous glued-on regexps did.

Both the old and new approaches are sound with respect to the programming languages they are used for, because what grep considers to be word chars are alphanumeric characters (as determined by the locale) and underline. Thus, a false negative would require an identifier to occur immediately before or after such a character, and the lexical rules for supported languages don't allow that.

There could be exceptions. For example, ancient Smalltalk used _ as assignment operator because Xerox's character set was based on the 1963 ASCII draft where that code was used for a left-pointing arrow. That wouldn't work with our scheme, now or before.

One might wonder why we use -w at all given the post-processing. It reduces the grep output so that the post-processor isn't overwhelmed by false positives: consider a search for the identifier `i`. That said, -w has a nonzero cost, so omitting it for searches of identifiers above a certain length is likely to be advantageous, especially when the grep tool is slow. We haven't doe that at this time.




      parent reply	other threads:[~2021-09-15 16:29 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-15 15:56 master 544db1e: Faster grep pattern for identifiers Eli Zaretskii
2021-09-15 16:25 ` Dmitry Gutov
2021-09-15 16:33   ` Eli Zaretskii
2021-09-15 18:06     ` Dmitry Gutov
2021-09-15 18:14       ` Eli Zaretskii
2021-09-15 18:39         ` Dmitry Gutov
2021-09-17 16:07           ` bug#49836: Support ripgrep in semantic-symref-tool-grep Juri Linkov
2021-09-17 16:24             ` Lars Ingebrigtsen
2021-09-18 18:37               ` Juri Linkov
2021-09-16  7:28         ` master 544db1e: Faster grep pattern for identifiers Omar Polo
2021-09-15 16:29 ` Mattias Engdegård [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9D076A20-098C-463D-B879-65C3A5443D26@acm.org \
    --to=mattiase@acm.org \
    --cc=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.