unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* regex [X-Z] with non-ascii char returns different results from (X|Y|Z)
@ 2019-08-21 12:58 yury.t
  2019-08-21 14:38 ` David Bremner
  0 siblings, 1 reply; 6+ messages in thread
From: yury.t @ 2019-08-21 12:58 UTC (permalink / raw)
  To: notmuch


Some regular expression returns incorrect results if the pattern contains multibyte characters in square brackets.  The following bracket expression matches subjects not starting with `[1-9]` and returns more results than the parenthesis expression.

(Please note that digits are full width, unicode characters.)




    notmuch count -- 'subject:"/^[1-9]/"' # 961


    notmuch count -- 'subject:"/^(1|2|3|4|5|6|7|8|9)/"' # 32





Somehow non-ascii characters in brackets match with any characters start with same hex code point.  For example:





- [1] (U+FF11) is treated as [\x{F000}-\x{FFFF}]


- ^[倀] (U+5000), ^[啕] (U+5555) and ^[忿] (U+5fff) return same results since they are all "U+5xxx".


Without ^, their results are vary but still contain unrelated subjects.





And curly brackets for repetition also have weird behavior.


If there are two emails whose subject is (A) "1人" and (B) "12人":



- ^(1|2...|9)人 - match A, unmatch B (expected)


- ^(1|2...|9){2}人 - unmatch A, match B (expected)


- ^[1-9]人 and ^[1-9]{2}人 - unmatch both


- ^[1-9]{3}人, {4} and {5} - match A, unmatch B


- ^[1-9]{6}人, {7} and {8} - unmatch A, match B





As noted in manpage of notmuch-search-terms, I surely wrap regular expression with double quotes and entire query with single quotes.  I also increase/decrease $XAPIAN_CJK_NGRAM and rebuild index, but the situation won't change.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-08-24 14:39 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-21 12:58 regex [X-Z] with non-ascii char returns different results from (X|Y|Z) yury.t
2019-08-21 14:38 ` David Bremner
2019-08-22 12:28   ` yury.t
2019-08-22 12:55     ` David Bremner
2019-08-22 19:53       ` Tomi Ollila
2019-08-24 14:39       ` yury.t

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).