From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Alan Mackenzie Newsgroups: gmane.emacs.bugs Subject: bug#22097: Ispell: lazy highlighting doesn't work properly. Date: Wed, 9 Dec 2015 21:59:52 +0000 Message-ID: <20151209215952.GD1896@acm.fritz.box> References: <20151205114230.GA2698@acm.fritz.box> <83egf1f2qp.fsf@gnu.org> <20151205140609.GB2698@acm.fritz.box> <83d1ulf03t.fsf@gnu.org> <20151205160429.GC2698@acm.fritz.box> <87fuzgebv4.fsf@mail.linkov.net> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1449698301 12009 80.91.229.3 (9 Dec 2015 21:58:21 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 9 Dec 2015 21:58:21 +0000 (UTC) Cc: 22097@debbugs.gnu.org To: Juri Linkov Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Wed Dec 09 22:58:11 2015 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1a6mkU-0004g5-FX for geb-bug-gnu-emacs@m.gmane.org; Wed, 09 Dec 2015 22:58:10 +0100 Original-Received: from localhost ([::1]:37998 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a6mkT-0007PD-ON for geb-bug-gnu-emacs@m.gmane.org; Wed, 09 Dec 2015 16:58:09 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:47983) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a6mkP-0007Oy-Kg for bug-gnu-emacs@gnu.org; Wed, 09 Dec 2015 16:58:06 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a6mkM-0003cQ-Ci for bug-gnu-emacs@gnu.org; Wed, 09 Dec 2015 16:58:05 -0500 Original-Received: from debbugs.gnu.org ([208.118.235.43]:55167) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a6mkM-0003cM-8j for bug-gnu-emacs@gnu.org; Wed, 09 Dec 2015 16:58:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84) (envelope-from ) id 1a6mkM-0003FI-3b for bug-gnu-emacs@gnu.org; Wed, 09 Dec 2015 16:58:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Alan Mackenzie Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 09 Dec 2015 21:58:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 22097 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 22097-submit@debbugs.gnu.org id=B22097.144969826612446 (code B ref 22097); Wed, 09 Dec 2015 21:58:02 +0000 Original-Received: (at 22097) by debbugs.gnu.org; 9 Dec 2015 21:57:46 +0000 Original-Received: from localhost ([127.0.0.1]:38194 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84) (envelope-from ) id 1a6mk6-0003Eg-IB for submit@debbugs.gnu.org; Wed, 09 Dec 2015 16:57:46 -0500 Original-Received: from mail.muc.de ([193.149.48.3]:16183) by debbugs.gnu.org with esmtp (Exim 4.84) (envelope-from ) id 1a6mk5-0003EW-Bj for 22097@debbugs.gnu.org; Wed, 09 Dec 2015 16:57:45 -0500 Original-Received: (qmail 1557 invoked by uid 3782); 9 Dec 2015 21:57:44 -0000 Original-Received: from acm.muc.de (p579E96B4.dip0.t-ipconnect.de [87.158.150.180]) by colin.muc.de (tmda-ofmipd) with ESMTP; Wed, 09 Dec 2015 22:57:43 +0100 Original-Received: (qmail 13714 invoked by uid 1000); 9 Dec 2015 21:59:52 -0000 Content-Disposition: inline In-Reply-To: <87fuzgebv4.fsf@mail.linkov.net> User-Agent: Mutt/1.5.23 (2014-03-12) X-Delivery-Agent: TMDA/1.1.12 (Macallan) X-Primary-Address: acm@muc.de X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:109816 Archived-At: Hello, Juri. On Sun, Dec 06, 2015 at 01:04:15AM +0200, Juri Linkov wrote: [quoting removed for clarity] \( \| \| \) --+ _+ \( \| \)\( \| \)*\( \)+ /\w \( \) \w [-_] [.:/@]+\( \| \)+ \( \| \)+[.:@] \w [-_~=?&] \w [-_] rrrrrrrrrrr 111111111111 22222222222 3333333333333333 rr > This is a nice ASCII-art in itself :-), but the problem is in another place - There are problems in that regexp, too: (i) As noted in the comment in the source, when - or _ has word syntax, the regexp is ill-formed - there is ambiguity in \(\w\|[-_]\) because the - or _ can match either side of the regexp fragment, leading to bad performance. This happens in the fragments maked 1, 2, and 3. (ii) The matching on --+ or _+ damage the lazy highlighting, often preventing it happening. But these clauses are there solely to protect against the poor performance noted in (i). I have solved these problems by generating this regexp at runtime, properly taking into account the syntax table. The fragments 1, 2, and 3 are generated at runtime, and fragments r are now redundant and have been removed. This regexp is no longer an element of ispell-skip-region-alist, instead being dynamically generated wherever ispell-skip-region-alist is used (3 places). I haven't yet tried the two patches you sent me. The one binding isearch-regexp-function to nil looks obvious and straightforward, the other one I'll need to look at more carefully. Here is the current version of my patch for this change. The comments and doc strings in it probably are of too low quality to be committed. [ .... ] diff --git a/lisp/textmodes/ispell.el b/lisp/textmodes/ispell.el index 7d5bb6d..9695a6b 100644 --- a/lisp/textmodes/ispell.el +++ b/lisp/textmodes/ispell.el @@ -1782,6 +1782,49 @@ ispell-parsing-keyword a `~' followed by an extended-character mode -- such as `~.tex'. The last occurring definition in the buffer will be used.") +(defun ispell--\\w-filter (char) + "Return CHAR in a string when CHAR doesn't have \"word\" syntax, +nil otherwise. CHAR must be a character." + (let ((str (string char))) + (and + (not (string-match "\\w" str)) + str))) + +(defun ispell--make-\\w-expression (chars) + "Make an expression like \"\\(\\w\\|[-_]\\)\". +This (parenthesized) expression matches either a character of +\"word\" syntax or one in CHARS. + +CHARS is a string of characters. A member of CHARS is omitted +from the expression if it already has word syntax. (Be sensible +about special characters such as ?\\, ?^, ?], and ?- in CHARS.) +If after this filtering there are no chars left, or only one, a +special form of the expression is generated." + (let ((filtered + (mapconcat #'ispell--\\w-filter chars ""))) + (concat + "\\(\\w" + (cond + ((equal filtered "") + "\\)") + ((eq (length filtered) 1) + (concat "\\|" filtered "\\)")) + (t + (concat "\\|[" filtered "]\\)")))))) + +(defun ispell--make-filename-or-URL-re () + "Construct a regexp to match some file names or URLs or email addresses." + (concat ;"\\(--+\\|_+\\|" + "\\(/\\w\\|\\(" + (ispell--make-\\w-expression "-_") + "+[.:@]\\)\\)" + (ispell--make-\\w-expression "-_") + "*\\([.:/@]+" + (ispell--make-\\w-expression "-_~=?&") + "+\\)+" + ;"\\)" + )) + ;;;###autoload (defvar ispell-skip-region-alist `((ispell-words-keyword forward-line) @@ -1798,7 +1841,7 @@ ispell-skip-region-alist ;; Matches e-mail addresses, file names, http addresses, etc. The ;; `-+' `_+' patterns are necessary for performance reasons when ;; `-' or `_' part of word syntax. - (,(purecopy "\\(--+\\|_+\\|\\(/\\w\\|\\(\\(\\w\\|[-_]\\)+[.:@]\\)\\)\\(\\w\\|[-_]\\)*\\([.:/@]+\\(\\w\\|[-_~=?&]\\)+\\)+\\)")) +; (,(purecopy "\\(--+\\|_+\\|\\(/\\w\\|\\(\\(\\w\\|[-_]\\)+[.:@]\\)\\)\\(\\w\\|[-_]\\)*\\([.:/@]+\\(\\w\\|[-_~=?&]\\)+\\)+\\)")) ;; above checks /.\w sequences ;;("\\(--+\\|\\(/\\|\\(\\(\\w\\|[-_]\\)+[.:@]\\)\\)\\(\\w\\|[-_]\\)*\\([.:/@]+\\(\\w\\|[-_~=?&]\\)+\\)+\\)") ;; This is a pretty complex regexp. It can be simplified to the following: @@ -3387,7 +3430,8 @@ ispell-begin-skip-region-regexp (if (string= "" comment-end) "^" (regexp-quote comment-end))) (if (and (null ispell-check-comments) comment-start) (regexp-quote comment-start)) - (ispell-begin-skip-region ispell-skip-region-alist))) + (ispell-begin-skip-region ispell-skip-region-alist) + (ispell--make-filename-or-URL-re))) "\\|")) @@ -3426,6 +3470,8 @@ ispell-skip-region-list The list is of the form described by variable `ispell-skip-region-alist'. Must be called after `ispell-buffer-local-parsing' due to dependence on mode." (let ((skip-alist ispell-skip-region-alist)) + (setq skip-alist (append (list (list (ispell--make-filename-or-URL-re))) + skip-alist)) ;; only additional explicit region definition is tex. (if (eq ispell-parser 'tex) (setq case-fold-search nil @@ -4119,9 +4165,10 @@ ispell-message (ispell-non-empty-string vm-included-text-prefix))) (t default-prefix))) (ispell-skip-region-alist - (cons (list (concat "^\\(" cite-regexp "\\)") - (function forward-line)) - ispell-skip-region-alist)) + (cons (list (ispell--make-filename-or-URL-re)) + (cons (list (concat "^\\(" cite-regexp "\\)") + (function forward-line)) + ispell-skip-region-alist))) (old-case-fold-search case-fold-search) (dictionary-alist ispell-message-dictionary-alist) (ispell-checking-message t)) -- Alan Mackenzie (Nuremberg, Germany).