unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* [PATCH 0/2] Better id: link buttonization
@ 2012-10-31  2:29 Austin Clements
  2012-10-31  2:29 ` [PATCH 1/2] test: Test buttonization of id: links Austin Clements
  2012-10-31  2:29 ` [PATCH 2/2] emacs: Improve the regexp used to match id:'s in messages Austin Clements
  0 siblings, 2 replies; 6+ messages in thread
From: Austin Clements @ 2012-10-31  2:29 UTC (permalink / raw)
  To: notmuch

Jani pointed out to me that the new id stashing with less quoting is
more likely to result in emails containing id: links that get
buttonized wrong.  Since id: links without quotes are hip, this series
fixes the buttonization regexp to match Xapian query syntax, rather
than being ad hoc.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] test: Test buttonization of id: links
  2012-10-31  2:29 [PATCH 0/2] Better id: link buttonization Austin Clements
@ 2012-10-31  2:29 ` Austin Clements
  2012-10-31  2:29 ` [PATCH 2/2] emacs: Improve the regexp used to match id:'s in messages Austin Clements
  1 sibling, 0 replies; 6+ messages in thread
From: Austin Clements @ 2012-10-31  2:29 UTC (permalink / raw)
  To: notmuch

This matches the current behavior of the buttonizer, so it passes, but
many of these cases are not what you'd want (and some of them aren't
even valid Xapian queries).  The next patch will fix the handling of
these cases and update the test.
---
 test/emacs-show  |   40 ++++++++++++++++++++++++++++++++++++++++
 test/test-lib.el |   14 ++++++++++++++
 2 files changed, 54 insertions(+)

diff --git a/test/emacs-show b/test/emacs-show
index 64c38d3..c4f3cf4 100755
--- a/test/emacs-show
+++ b/test/emacs-show
@@ -106,5 +106,45 @@ test_emacs '(notmuch-search "from:lars@seas.harvard.edu and subject:\"Maildir st
 	(test-visible-output)'
 test_expect_equal_file OUTPUT $EXPECTED/notmuch-show-indent-thread-content-off
 
+test_begin_subtest "id buttonization"
+add_message '[body]="
+id:abc
+id:abc.def. id:abc,def, id:abc;def;
+id:abc)def
+id:ab\"c def
+id:\"abc\"def
+id:\"ab\"\"c\"def
+id:\"ab c\"def
+id:\"abc\".def
+id:\"abc
+\"
+id:)
+id:
+cid:xxx"'
+test_emacs '(notmuch-show "id:'$gen_msg_id'")
+	(notmuch-test-mark-links)
+	(test-visible-output)'
+cat <<EOF >EXPECTED
+Notmuch Test Suite <test_suite@notmuchmail.org> (2001-01-05) (inbox)
+Subject: id buttonization
+To: Notmuch Test Suite <test_suite@notmuchmail.org>
+Date: Fri, 05 Jan 2001 15:43:57 +0000
+
+<<id:abc>>
+<<id:abc.def.>> <<id:abc,def,>> <<id:abc;def;>>
+<<id:abc)def>>
+<<id:ab>>"c def
+<<id:"abc">>def
+<<id:"ab">>"c"def
+id:"ab c"def
+<<id:"abc">>.def
+id:"abc
+"
+<<id:)>>
+id:
+c<<id:xxx>>
+EOF
+test_expect_equal_file OUTPUT EXPECTED
+
 
 test_done
diff --git a/test/test-lib.el b/test/test-lib.el
index fa3380c..dece811 100644
--- a/test/test-lib.el
+++ b/test/test-lib.el
@@ -107,6 +107,20 @@ nothing."
 	       (ad-set-arg 1 (char-to-string char))
 	       ad-do-it))))
 
+(defun notmuch-test-mark-links ()
+  "Enclose links in the current buffer with << and >>."
+  ;; Links are often created by jit-lock functions
+  (jit-lock-fontify-now)
+  (save-excursion
+    (let ((inhibit-read-only t))
+      (goto-char (point-min))
+      (let ((button))
+	(while (setq button (next-button (point)))
+	  (goto-char (button-start button))
+	  (insert "<<")
+	  (goto-char (button-end button))
+	  (insert ">>"))))))
+
 (defmacro notmuch-test-run (&rest body)
   "Evaluate a BODY of test expressions and output the result."
   `(with-temp-buffer
-- 
1.7.10

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] emacs: Improve the regexp used to match id:'s in messages
  2012-10-31  2:29 [PATCH 0/2] Better id: link buttonization Austin Clements
  2012-10-31  2:29 ` [PATCH 1/2] test: Test buttonization of id: links Austin Clements
@ 2012-10-31  2:29 ` Austin Clements
  2012-10-31 10:00   ` Jani Nikula
  2012-11-01 14:01   ` Sascha Silbe
  1 sibling, 2 replies; 6+ messages in thread
From: Austin Clements @ 2012-10-31  2:29 UTC (permalink / raw)
  To: notmuch

This regexp agrees with Xapian query syntax much more closely, though
we specifically disallow various cases that would be confusing in the
context of an email body (e.g., punctuation at the end of an id: link
is not considered part of the id: link because it's probably part of
the surrounding text).

In particular, this handles id: links that are not surrounded by
quotes much better, which stash is much more likely to generate now
that we don't quote id's that don't need to be quoted.  It also
handles quoted id: links better.

We update the buttonization test to reflect the new pattern.
---
 emacs/notmuch-show.el |   20 +++++++++++++++++++-
 test/emacs-show       |   14 +++++++-------
 2 files changed, 26 insertions(+), 8 deletions(-)

diff --git a/emacs/notmuch-show.el b/emacs/notmuch-show.el
index f273eb4..e96e099 100644
--- a/emacs/notmuch-show.el
+++ b/emacs/notmuch-show.el
@@ -991,6 +991,24 @@ message at DEPTH in the current thread."
   "Insert the forest of threads FOREST."
   (mapc (lambda (thread) (notmuch-show-insert-thread thread 0)) forest))
 
+(defvar notmuch-id-regexp
+  (concat
+   ;; Match the id: prefix only if it begins a word (to disallow, for
+   ;; example, matching cid:).
+   "\\<id:\\("
+   ;; If the term starts with a ", then parse Xapian's quoted boolean
+   ;; term syntax, which allows for anything as long as embedded
+   ;; double quotes escaped by doubling them.  We also disallow
+   ;; newlines (which Xapian allows) to prevent runaway terms.
+   "\"\\([^\"\n]\\|\"\"\\)*\""
+   ;; Otherwise, parse Xapian's unquoted syntax, which goes up to the
+   ;; next space or ).  We disallow [.,;] as the last character
+   ;; because these are probably part of the surrounding text, and not
+   ;; part of the id.  This doesn't match single character ids; meh.
+   "\\|[^\"[:space:])][^[:space:])]*[^[:space:]).,;]"
+   "\\)")
+  "The regexp used to match id: links in messages.")
+
 (defun notmuch-show-buttonise-links (start end)
   "Buttonise URLs and mail addresses between START and END.
 
@@ -999,7 +1017,7 @@ a corresponding notmuch search."
   (goto-address-fontify-region start end)
   (save-excursion
     (goto-char start)
-    (while (re-search-forward "id:\\(\"?\\)[^[:space:]\"]+\\1" end t)
+    (while (re-search-forward notmuch-id-regexp end t)
       ;; remove the overlay created by goto-address-mode
       (remove-overlays (match-beginning 0) (match-end 0) 'goto-address t)
       (make-text-button (match-beginning 0) (match-end 0)
diff --git a/test/emacs-show b/test/emacs-show
index c4f3cf4..e16483c 100755
--- a/test/emacs-show
+++ b/test/emacs-show
@@ -131,18 +131,18 @@ To: Notmuch Test Suite <test_suite@notmuchmail.org>
 Date: Fri, 05 Jan 2001 15:43:57 +0000
 
 <<id:abc>>
-<<id:abc.def.>> <<id:abc,def,>> <<id:abc;def;>>
-<<id:abc)def>>
-<<id:ab>>"c def
+<<id:abc.def>>. <<id:abc,def>>, <<id:abc;def>>;
+<<id:abc>>)def
+<<id:ab"c>> def
 <<id:"abc">>def
-<<id:"ab">>"c"def
-id:"ab c"def
+<<id:"ab""c">>def
+<<id:"ab c">>def
 <<id:"abc">>.def
 id:"abc
 "
-<<id:)>>
+id:)
 id:
-c<<id:xxx>>
+cid:xxx
 EOF
 test_expect_equal_file OUTPUT EXPECTED
 
-- 
1.7.10

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] emacs: Improve the regexp used to match id:'s in messages
  2012-10-31  2:29 ` [PATCH 2/2] emacs: Improve the regexp used to match id:'s in messages Austin Clements
@ 2012-10-31 10:00   ` Jani Nikula
  2012-11-01 14:01   ` Sascha Silbe
  1 sibling, 0 replies; 6+ messages in thread
From: Jani Nikula @ 2012-10-31 10:00 UTC (permalink / raw)
  To: Austin Clements, notmuch

On Wed, 31 Oct 2012, Austin Clements <amdragon@MIT.EDU> wrote:
> This regexp agrees with Xapian query syntax much more closely, though
> we specifically disallow various cases that would be confusing in the
> context of an email body (e.g., punctuation at the end of an id: link
> is not considered part of the id: link because it's probably part of
> the surrounding text).
>
> In particular, this handles id: links that are not surrounded by
> quotes much better, which stash is much more likely to generate now
> that we don't quote id's that don't need to be quoted.  It also
> handles quoted id: links better.
>
> We update the buttonization test to reflect the new pattern.

Hi Austin, all of this looks good as-is, but I propose the changes below
on top. (With the relevant comment changed too.)

BR,
Jani.


diff --git a/emacs/notmuch-show.el b/emacs/notmuch-show.el
index e96e099..117eb0e 100644
--- a/emacs/notmuch-show.el
+++ b/emacs/notmuch-show.el
@@ -1005,7 +1005,7 @@ message at DEPTH in the current thread."
    ;; next space or ).  We disallow [.,;] as the last character
    ;; because these are probably part of the surrounding text, and not
    ;; part of the id.  This doesn't match single character ids; meh.
-   "\\|[^\"[:space:])][^[:space:])]*[^[:space:]).,;]"
+   "\\|[^\"[:space:])][^[:space:])]*[^])[:space:].,:;?!]"
    "\\)")
   "The regexp used to match id: links in messages.")
 
diff --git a/test/emacs-show b/test/emacs-show
index e16483c..e2d7c70 100755
--- a/test/emacs-show
+++ b/test/emacs-show
@@ -109,7 +109,12 @@ test_expect_equal_file OUTPUT $EXPECTED/notmuch-show-indent-thread-content-off
 test_begin_subtest "id buttonization"
 add_message '[body]="
 id:abc
-id:abc.def. id:abc,def, id:abc;def;
+id:abc.def. id:abc,def, id:abc;def; id:abc:def:
+id:foo@bar.?baz? id:foo@bar!.baz!
+(id:foo@bar.baz) [id:foo@bar.baz]
+id:foo@bar.baz...
+id:2+2=5
+id:=_-:/.[]@$%+
 id:abc)def
 id:ab\"c def
 id:\"abc\"def
@@ -131,7 +136,12 @@ To: Notmuch Test Suite <test_suite@notmuchmail.org>
 Date: Fri, 05 Jan 2001 15:43:57 +0000
 
 <<id:abc>>
-<<id:abc.def>>. <<id:abc,def>>, <<id:abc;def>>;
+<<id:abc.def>>. <<id:abc,def>>, <<id:abc;def>>; <<id:abc:def>>:
+<<id:foo@bar.?baz>>? <<id:foo@bar!.baz>>!
+(<<id:foo@bar.baz>>) [<<id:foo@bar.baz>>]
+<<id:foo@bar.baz>>...
+<<id:2+2=5>>
+<<id:=_-:/.[]@$%+>>
 <<id:abc>>)def
 <<id:ab"c>> def
 <<id:"abc">>def

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] emacs: Improve the regexp used to match id:'s in messages
  2012-10-31  2:29 ` [PATCH 2/2] emacs: Improve the regexp used to match id:'s in messages Austin Clements
  2012-10-31 10:00   ` Jani Nikula
@ 2012-11-01 14:01   ` Sascha Silbe
  2012-11-02 19:27     ` Jameson Graef Rollins
  1 sibling, 1 reply; 6+ messages in thread
From: Sascha Silbe @ 2012-11-01 14:01 UTC (permalink / raw)
  To: Austin Clements, notmuch

[-- Attachment #1: Type: text/plain, Size: 559 bytes --]

Austin Clements <amdragon@MIT.EDU> writes:

> +(defvar notmuch-id-regexp
> +  (concat
> +   ;; Match the id: prefix only if it begins a word (to disallow, for
> +   ;; example, matching cid:).
> +   "\\<id:\\("
[...]

Since you're already at it, please extend it to accept RFC2392 [1]
syntax for message URLs:

     mid-url       = "mid" ":" message-id [ "/" content-id ]

The id: syntax is non-standard (i.e. specific to notmuch).

Sascha

[1] http://www.ietf.org/rfc/rfc2392.txt
-- 
http://sascha.silbe.org/
http://www.infra-silbe.de/

[-- Attachment #2: Type: application/pgp-signature, Size: 489 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] emacs: Improve the regexp used to match id:'s in messages
  2012-11-01 14:01   ` Sascha Silbe
@ 2012-11-02 19:27     ` Jameson Graef Rollins
  0 siblings, 0 replies; 6+ messages in thread
From: Jameson Graef Rollins @ 2012-11-02 19:27 UTC (permalink / raw)
  To: Sascha Silbe, Austin Clements, notmuch

[-- Attachment #1: Type: text/plain, Size: 625 bytes --]

On Thu, Nov 01 2012, Sascha Silbe <sascha-ml-reply-to-2012-4@silbe.org> wrote:
> Since you're already at it, please extend it to accept RFC2392 [1]
> syntax for message URLs:
>
>      mid-url       = "mid" ":" message-id [ "/" content-id ]
>
> The id: syntax is non-standard (i.e. specific to notmuch).
>
> Sascha
>
> [1] http://www.ietf.org/rfc/rfc2392.txt

Holy crap!  There's an RFC that specifies a URL scheme for message-ids??
From 1998??  Very cool.  We definitely need to support that.  Does
anything else support this?

I don't see any reason we couldn't support the "cid" schemes as well.

jamie.

[-- Attachment #2: Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-11-02 19:27 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-31  2:29 [PATCH 0/2] Better id: link buttonization Austin Clements
2012-10-31  2:29 ` [PATCH 1/2] test: Test buttonization of id: links Austin Clements
2012-10-31  2:29 ` [PATCH 2/2] emacs: Improve the regexp used to match id:'s in messages Austin Clements
2012-10-31 10:00   ` Jani Nikula
2012-11-01 14:01   ` Sascha Silbe
2012-11-02 19:27     ` Jameson Graef Rollins

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).