* [PATCH] json_quote_str should handle non-ASCII characters @ 2010-03-04 10:40 Gregor Hoffleit 2010-03-04 13:57 ` Sebastian Spaeth 2010-04-13 15:36 ` Carl Worth 0 siblings, 2 replies; 12+ messages in thread From: Gregor Hoffleit @ 2010-03-04 10:40 UTC (permalink / raw) To: notmuch The current code in json_quote_str() only accepts strict printable ASCII code points (i.e. 32-127), all other code points are dropped from the JSON output. This patch accepts code points 32-255. json_quote_str() should handle non-ASCII characters. --- json.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/json.c b/json.c index 9614143..6dc0345 100644 --- a/json.c +++ b/json.c @@ -59,7 +59,7 @@ json_quote_str(const void *ctx, const char *str) return NULL; for (ptr = str; *ptr; len++, ptr++) { - if (*ptr < 32 || *ptr == '\"' || *ptr == '\\') + if ((unsigned char)(*ptr) < 32 || *ptr == '\"' || *ptr == '\\') len++; } @@ -70,7 +70,7 @@ json_quote_str(const void *ctx, const char *str) *ptr2++ = '\"'; while (*ptr) { - if (*ptr > 31 && *ptr != '\"' && *ptr != '\\') { + if ((unsigned char)(*ptr) > 31 && *ptr != '\"' && *ptr != '\\') { *ptr2++ = *ptr++; } else { *ptr2++ = '\\'; -- 1.7.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] json_quote_str should handle non-ASCII characters 2010-03-04 10:40 [PATCH] json_quote_str should handle non-ASCII characters Gregor Hoffleit @ 2010-03-04 13:57 ` Sebastian Spaeth 2010-03-04 14:26 ` Gregor Hoffleit 2010-04-13 15:36 ` Carl Worth 1 sibling, 1 reply; 12+ messages in thread From: Sebastian Spaeth @ 2010-03-04 13:57 UTC (permalink / raw) To: Gregor Hoffleit, notmuch On 2010-03-04, Sebastian Spaeth wrote: > The current code in json_quote_str() only accepts strict printable ASCII > code points (i.e. 32-127), all other code points are dropped from the > JSON output. That would explain why my web interface does not display any umlaut symbols. Nice finding, Sebastian ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] json_quote_str should handle non-ASCII characters 2010-03-04 13:57 ` Sebastian Spaeth @ 2010-03-04 14:26 ` Gregor Hoffleit 0 siblings, 0 replies; 12+ messages in thread From: Gregor Hoffleit @ 2010-03-04 14:26 UTC (permalink / raw) To: Sebastian Spaeth; +Cc: notmuch * Sebastian Spaeth <sebastian@sspaeth.de> [Do Mär 04 14:57:27 +0100 2010] > On 2010-03-04, Sebastian Spaeth wrote: > > The current code in json_quote_str() only accepts strict printable ASCII > > code points (i.e. 32-127), all other code points are dropped from the > > JSON output. > > That would explain why my web interface does not display any umlaut > symbols. Well, I noticed noneatall umlauts. That's how I found this problem. Regards, Gregor ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] json_quote_str should handle non-ASCII characters 2010-03-04 10:40 [PATCH] json_quote_str should handle non-ASCII characters Gregor Hoffleit 2010-03-04 13:57 ` Sebastian Spaeth @ 2010-04-13 15:36 ` Carl Worth 2010-04-13 16:37 ` [PATCH] First tests for JSON output and UTF-8 in mail body and subject Gregor Hoffleit 1 sibling, 1 reply; 12+ messages in thread From: Carl Worth @ 2010-04-13 15:36 UTC (permalink / raw) To: Gregor Hoffleit, notmuch [-- Attachment #1: Type: text/plain, Size: 563 bytes --] On Thu, 04 Mar 2010 11:40:03 +0100, Gregor Hoffleit <gregor@hoffleit.de> wrote: > The current code in json_quote_str() only accepts strict printable ASCII > code points (i.e. 32-127), all other code points are dropped from the > JSON output. > > This patch accepts code points 32-255. Thanks, Gregor! I've pushed this patch out now. And I noted in our TODO file that we really should add some tests to the test suite to exercise our --format=json code paths. It would be nice to ensure we don't have any regressions in this area later. -Carl [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH] First tests for JSON output and UTF-8 in mail body and subject 2010-04-13 15:36 ` Carl Worth @ 2010-04-13 16:37 ` Gregor Hoffleit 2010-04-15 0:35 ` Carl Worth 0 siblings, 1 reply; 12+ messages in thread From: Gregor Hoffleit @ 2010-04-13 16:37 UTC (permalink / raw) To: notmuch The test suite doesn't yet cover --format=json output nor UTF-8 in subject or body. This patch starts with test cases for 'search --format=json' and 'show --format=json'. Furthermore, it has test cases for a search for a UTF-8 string in a mail body for a UTF-8 string in a mail subject. Finally, it has a test case for --format=json with UTF-8 messages, demonstrating the fix in 1267697893-sup-4538@sam.mediasupervision.de. --- test/notmuch-test | 38 ++++++++++++++++++++++++++++++++++++++ 1 files changed, 38 insertions(+), 0 deletions(-) diff --git a/test/notmuch-test b/test/notmuch-test index 1c5191b..b2a0756 100755 --- a/test/notmuch-test +++ b/test/notmuch-test @@ -146,7 +146,7 @@ add_message () } NOTMUCH_IGNORED_OUTPUT_REGEXP='^Processed [0-9]*( total)? file|Found [0-9]* total file' -NOTMUCH_THREAD_ID_SQUELCH='s/thread:................/thread:XXX/' +NOTMUCH_THREAD_ID_SQUELCH='s/thread:................/thread:XXX/;s/"thread": "................",/"thread": "XXX",/' execute_expecting () { args=$1 @@ -348,6 +348,42 @@ add_message '[subject]="body search (phrase)"' '[date]="Sat, 01 Jan 2000 12:00:0 add_message '[subject]="negative result"' '[date]="Sat, 01 Jan 2000 12:00:00 -0000"' '[body]="This phrase should not match the body search"' execute_expecting "search '\"body search (phrase)\"'" "thread:XXX 2000-01-01 [1/1] Notmuch Test Suite; body search (phrase) (inbox unread)" +printf " Show message: json...\t\t" +add_message '[subject]="json-show-subject"' '[date]="Sat, 01 Jan 2000 12:00:00 -0000"' '[body]="json-show-message"' +execute_expecting "show --format=json 'json-show-message'" '[[[{"id": "'${gen_msg_id}'", "match": true, "filename": "'${gen_msg_filename}'", "date_unix": 946728000, "date_relative": "2000-01-01", "tags": ["inbox","unread"], "headers": {"Subject": "json-show-subject", "From": "Notmuch Test Suite <test_suite@notmuchmail.org>", "To": "Notmuch Test Suite <test_suite@notmuchmail.org>", "Cc": "", "Bcc": "", "Date": "Sat, 01 Jan 2000 12:00:00 -0000"}, "body": [{"id": 1, "content-type": "text/plain", "content": "json-show-message\n"}]}, []]]]' + +printf " Search message: json...\t" +add_message '[subject]="json-search-subject"' '[date]="Sat, 01 Jan 2000 12:00:00 -0000"' '[body]="json-search-message"' +execute_expecting "search --format=json 'json-search-message'" '[{"thread": "XXX", +"timestamp": 946724400, +"matched": 1, +"total": 1, +"authors": "Notmuch Test Suite", +"subject": "json-search-subject", +"tags": ["inbox", "unread"]}]' + +printf " Search by subject (utf-8):...\t" +add_message [subject]=utf8-sübjéct '[date]="Sat, 01 Jan 2000 12:00:00 -0000"' +execute_expecting "search subject:utf8-sübjéct" "thread:XXX 2000-01-01 [1/1] Notmuch Test Suite; utf8-sübjéct (inbox unread)" + +printf " Search body (utf-8):...\t" +add_message '[subject]="utf8-message-body-subject"' '[date]="Sat, 01 Jan 2000 12:00:00 -0000"' '[body]="message body utf8: bödý"' +execute_expecting "search 'bödý'" "thread:XXX 2000-01-01 [1/1] Notmuch Test Suite; utf8-message-body-subject (inbox unread)" + +printf " Show message: json, utf-8...\t" +add_message '[subject]="json-show-utf8-body-sübjéct"' '[date]="Sat, 01 Jan 2000 12:00:00 -0000"' '[body]="jsön-show-méssage"' +execute_expecting "show --format=json 'jsön-show-méssage'" '[[[{"id": "'${gen_msg_id}'", "match": true, "filename": "'${gen_msg_filename}'", "date_unix": 946728000, "date_relative": "2000-01-01", "tags": ["inbox","unread"], "headers": {"Subject": "json-show-utf8-body-sübjéct", "From": "Notmuch Test Suite <test_suite@notmuchmail.org>", "To": "Notmuch Test Suite <test_suite@notmuchmail.org>", "Cc": "", "Bcc": "", "Date": "Sat, 01 Jan 2000 12:00:00 -0000"}, "body": [{"id": 1, "content-type": "text/plain", "content": "jsön-show-méssage\n"}]}, []]]]' + +printf " Search message: json, utf-8...\t" +add_message '[subject]="json-search-utf8-body-sübjéct"' '[date]="Sat, 01 Jan 2000 12:00:00 -0000"' '[body]="jsön-search-méssage"' +execute_expecting "search --format=json 'jsön-search-méssage'" '[{"thread": "XXX", +"timestamp": 946724400, +"matched": 1, +"total": 1, +"authors": "Notmuch Test Suite", +"subject": "json-search-utf8-body-sübjéct", +"tags": ["inbox", "unread"]}]' + printf " Search by from: (address)...\t" add_message '[subject]="search by from (address)"' '[date]="Sat, 01 Jan 2000 12:00:00 -0000"' [from]=searchbyfrom@example.com execute_expecting "search from:searchbyfrom@example.com" "thread:XXX 2000-01-01 [1/1] searchbyfrom@example.com; search by from (address) (inbox unread)" ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] First tests for JSON output and UTF-8 in mail body and subject 2010-04-13 16:37 ` [PATCH] First tests for JSON output and UTF-8 in mail body and subject Gregor Hoffleit @ 2010-04-15 0:35 ` Carl Worth 2010-04-15 8:33 ` Michal Sojka ` (2 more replies) 0 siblings, 3 replies; 12+ messages in thread From: Carl Worth @ 2010-04-15 0:35 UTC (permalink / raw) To: Gregor Hoffleit, notmuch [-- Attachment #1: Type: text/plain, Size: 2066 bytes --] On Tue, 13 Apr 2010 18:37:57 +0200, Gregor Hoffleit <gregor@hoffleit.de> wrote: > The test suite doesn't yet cover --format=json output nor UTF-8 in > subject or body. > > This patch starts with test cases for 'search --format=json' and > 'show --format=json'. Thanks for the tests, Gregor! I was about to push this, but first noticed that I hadn't run the test suite in the last day and that it had recently broken (oops!). I fixed that, but then also noticed that I got failures with your tests. > +execute_expecting "show --format=json 'json-show-message'" '[[[{"id": > "'${gen_msg_id}'", "match": true, "filename": "'${gen_msg_filename}'", > "date_unix": 946728000, "date_relative": "2000-01-01", "tags": ... > +printf " Search message: json...\t" > +add_message '[subject]="json-search-subject"' '[date]="Sat, 01 Jan 2000 12:00:00 -0000"' '[body]="json-search-message"' > +execute_expecting "search --format=json 'json-search-message'" '[{"thread": "XXX", > +"timestamp": 946724400, I'm getting a timestamp value here of 946756800 which is clearly an interpretation of the above date as if it were my local time zone. That is: $ date -u +%s -d "Sat, 01 Jan 2000 12:00:00 -0800" 946756800 And the value you have appears to have been generated in your timezone: $ date -u +%s -d "Sat, 01 Jan 2000 12:00:00 +0100" 946724400 Meanwhile, the value that should be printed here[*] is the value from interpreting the original date in the timezone explicitly specified in that date: $ date -u +%s -d "Sat, 01 Jan 2000 12:00:00 -0000" 946728000 Note that the "notmuch show --format=json" test above does have the correct timestamp. So, a double thanks for this test, it seems to have uncovered another bug. -Carl [*] I say "should" because I don't believe we have any actual specification of the data coming out of the JSON output yet. One other thing that seems odd is the name of "date_unix" in the show output and "timestamp" in the search output for what is effectively the same field. [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] First tests for JSON output and UTF-8 in mail body and subject 2010-04-15 0:35 ` Carl Worth @ 2010-04-15 8:33 ` Michal Sojka 2010-04-15 19:58 ` Carl Worth 2010-04-16 11:49 ` [PATCH] First tests for JSON output and UTF-8 in mail body and subject David Edmondson 2010-04-23 0:15 ` Carl Worth 2 siblings, 1 reply; 12+ messages in thread From: Michal Sojka @ 2010-04-15 8:33 UTC (permalink / raw) To: Carl Worth, Gregor Hoffleit, notmuch On Thu, 15 Apr 2010, Carl Worth wrote: > On Tue, 13 Apr 2010 18:37:57 +0200, Gregor Hoffleit <gregor@hoffleit.de> wrote: > > The test suite doesn't yet cover --format=json output nor UTF-8 in > > subject or body. > > > > This patch starts with test cases for 'search --format=json' and > > 'show --format=json'. > > Thanks for the tests, Gregor! Carl, are you still interrested in modular test suite from git? If so, could you please look at id:87mxxg7bxo.fsf@steelpick.2x.cz and tell me your opinion. I'm still updating the modularized tests to match the state in master but every change in master takes me quite long time to convert. -Michal ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] First tests for JSON output and UTF-8 in mail body and subject 2010-04-15 8:33 ` Michal Sojka @ 2010-04-15 19:58 ` Carl Worth 2010-04-16 8:17 ` Michal Sojka 0 siblings, 1 reply; 12+ messages in thread From: Carl Worth @ 2010-04-15 19:58 UTC (permalink / raw) To: Michal Sojka, Gregor Hoffleit, notmuch [-- Attachment #1: Type: text/plain, Size: 1321 bytes --] On Thu, 15 Apr 2010 10:33:46 +0200, Michal Sojka <sojkam1@fel.cvut.cz> wrote: > are you still interrested in modular test suite from git? If so, could > you please look at id:87mxxg7bxo.fsf@steelpick.2x.cz and tell me your > opinion. I'm still updating the modularized tests to match the state in > master but every change in master takes me quite long time to convert. Hi Michal, I would still like to have a modular test suite, yes. Thanks for pointing out that other message to me, which I had missed in the general notmuch-mailing-list backlog I'm still dealing with. I've now replied to it over there. I am sorry that you keep having to re-do a bunch of work to keep your patch up-to-date. I'm just about to push another change which might further cause problems. But you might actually like that change since it's one you requested in your first version of the modular test suite. I'm dropping the annoying execute_expecting macro that both runs notmuch and tests the output. There's now a much cleaner separation such as: output=$($NOTMUCH search for-something) pass_if_equal "$output" "something was found" I still think it wouldn't be hard to just gradually implement any particular features we want in the test suite. But if the git thing ever does become available, then that will be fine too. -Carl [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] First tests for JSON output and UTF-8 in mail body and subject 2010-04-15 19:58 ` Carl Worth @ 2010-04-16 8:17 ` Michal Sojka 2010-04-22 21:14 ` Improved diff-based failure reports from the test suite Carl Worth 0 siblings, 1 reply; 12+ messages in thread From: Michal Sojka @ 2010-04-16 8:17 UTC (permalink / raw) To: Carl Worth, Gregor Hoffleit, notmuch > But you might actually like that change since it's one you requested in > your first version of the modular test suite. I'm dropping the annoying > execute_expecting macro that both runs notmuch and tests the > output. There's now a much cleaner separation such as: > > output=$($NOTMUCH search for-something) > pass_if_equal "$output" "something was found" It's definitely better than before. The current implementation of pass_if_equal has IMHO one drawback - if it compares multiline text and there is a difference, it is quite hard to see where. In my tests for maildir synchronization I use this approach: notmuch search tag:inbox | filter_output > actual && diff -u - actual <<EOF thread:XXX 2000-01-01 [1/1] Notmuch Test Suite; test message 3 (inbox) EOF Thanks to the usee of diff, I immediately see only the differences. -Michal ^ permalink raw reply [flat|nested] 12+ messages in thread
* Improved diff-based failure reports from the test suite 2010-04-16 8:17 ` Michal Sojka @ 2010-04-22 21:14 ` Carl Worth 0 siblings, 0 replies; 12+ messages in thread From: Carl Worth @ 2010-04-22 21:14 UTC (permalink / raw) To: Michal Sojka, Gregor Hoffleit, notmuch [-- Attachment #1: Type: text/plain, Size: 899 bytes --] On Fri, 16 Apr 2010 10:17:15 +0200, Michal Sojka <sojkam1@fel.cvut.cz> wrote: > It's definitely better than before. The current implementation of > pass_if_equal has IMHO one drawback - if it compares multiline text and > there is a difference, it is quite hard to see where. > > In my tests for maildir synchronization I use this approach: > > notmuch search tag:inbox | filter_output > actual && > diff -u - actual <<EOF > thread:XXX 2000-01-01 [1/1] Notmuch Test Suite; test message 3 (inbox) > EOF > > Thanks to the usee of diff, I immediately see only the differences. I just added a very long multi-line test to the test suite. So I took the opportunity to fix it to output the failures with diff. It also saves both the complete expected and actual output (for any failed tests) to files of the form test-XXX.expected and test-XXX.output. Enjoy! -Carl [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] First tests for JSON output and UTF-8 in mail body and subject 2010-04-15 0:35 ` Carl Worth 2010-04-15 8:33 ` Michal Sojka @ 2010-04-16 11:49 ` David Edmondson 2010-04-23 0:15 ` Carl Worth 2 siblings, 0 replies; 12+ messages in thread From: David Edmondson @ 2010-04-16 11:49 UTC (permalink / raw) To: Carl Worth, notmuch On Wed, 14 Apr 2010 17:35:44 -0700, Carl Worth <cworth@cworth.org> wrote: > [*] I say "should" because I don't believe we have any actual > specification of the data coming out of the JSON output yet. One other > thing that seems odd is the name of "date_unix" in the show output and > "timestamp" in the search output for what is effectively the same > field. The show output is updated to use `timestamp' in id:1271418469-19031-1-git-send-email-dme@dme.org (just sent). dme. -- David Edmondson, http://dme.org ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] First tests for JSON output and UTF-8 in mail body and subject 2010-04-15 0:35 ` Carl Worth 2010-04-15 8:33 ` Michal Sojka 2010-04-16 11:49 ` [PATCH] First tests for JSON output and UTF-8 in mail body and subject David Edmondson @ 2010-04-23 0:15 ` Carl Worth 2 siblings, 0 replies; 12+ messages in thread From: Carl Worth @ 2010-04-23 0:15 UTC (permalink / raw) To: Gregor Hoffleit, notmuch [-- Attachment #1: Type: text/plain, Size: 665 bytes --] On Wed, 14 Apr 2010 17:35:44 -0700, Carl Worth <cworth@cworth.org> wrote: > On Tue, 13 Apr 2010 18:37:57 +0200, Gregor Hoffleit <gregor@hoffleit.de> wrote: > > The test suite doesn't yet cover --format=json output nor UTF-8 in > > subject or body. > > > > This patch starts with test cases for 'search --format=json' and > > 'show --format=json'. ... > So, a double thanks for this test, it seems to have uncovered another > bug. I've now fixed that bug, (hurrah for deleting code!), and pushed out these tests, (which did have to be updated to apply to the current test suite, but that was straightforward). Thanks again for the tests. -Carl [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2010-04-23 0:15 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-03-04 10:40 [PATCH] json_quote_str should handle non-ASCII characters Gregor Hoffleit 2010-03-04 13:57 ` Sebastian Spaeth 2010-03-04 14:26 ` Gregor Hoffleit 2010-04-13 15:36 ` Carl Worth 2010-04-13 16:37 ` [PATCH] First tests for JSON output and UTF-8 in mail body and subject Gregor Hoffleit 2010-04-15 0:35 ` Carl Worth 2010-04-15 8:33 ` Michal Sojka 2010-04-15 19:58 ` Carl Worth 2010-04-16 8:17 ` Michal Sojka 2010-04-22 21:14 ` Improved diff-based failure reports from the test suite Carl Worth 2010-04-16 11:49 ` [PATCH] First tests for JSON output and UTF-8 in mail body and subject David Edmondson 2010-04-23 0:15 ` Carl Worth
Code repositories for project(s) associated with this public inbox https://yhetil.org/notmuch.git/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).