unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* [PATCH] json_quote_str should handle non-ASCII characters
@ 2010-03-04 10:40 Gregor Hoffleit
  2010-03-04 13:57 ` Sebastian Spaeth
  2010-04-13 15:36 ` Carl Worth
  0 siblings, 2 replies; 12+ messages in thread
From: Gregor Hoffleit @ 2010-03-04 10:40 UTC (permalink / raw)
  To: notmuch

The current code in json_quote_str() only accepts strict printable ASCII
code points (i.e. 32-127), all other code points are dropped from the
JSON output.

This patch accepts code points 32-255.

json_quote_str() should handle non-ASCII characters.
---
 json.c |    4 ++--
  1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/json.c b/json.c
index 9614143..6dc0345 100644
--- a/json.c
+++ b/json.c
@@ -59,7 +59,7 @@ json_quote_str(const void *ctx, const char *str)
 	return NULL;
 
     for (ptr = str; *ptr; len++, ptr++) {
-	if (*ptr < 32 || *ptr == '\"' || *ptr == '\\')
+	if ((unsigned char)(*ptr) < 32 || *ptr == '\"' || *ptr == '\\')
 	    len++;
     }
 
@@ -70,7 +70,7 @@ json_quote_str(const void *ctx, const char *str)
 
     *ptr2++ = '\"';
     while (*ptr) {
-	    if (*ptr > 31 && *ptr != '\"' && *ptr != '\\') {
+	    if ((unsigned char)(*ptr) > 31 && *ptr != '\"' && *ptr != '\\') {
 		*ptr2++ = *ptr++;
 	    } else {
 		*ptr2++ = '\\';
--
1.7.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] json_quote_str should handle non-ASCII characters
  2010-03-04 10:40 [PATCH] json_quote_str should handle non-ASCII characters Gregor Hoffleit
@ 2010-03-04 13:57 ` Sebastian Spaeth
  2010-03-04 14:26   ` Gregor Hoffleit
  2010-04-13 15:36 ` Carl Worth
  1 sibling, 1 reply; 12+ messages in thread
From: Sebastian Spaeth @ 2010-03-04 13:57 UTC (permalink / raw)
  To: Gregor Hoffleit, notmuch

On 2010-03-04, Sebastian Spaeth wrote:
> The current code in json_quote_str() only accepts strict printable ASCII
> code points (i.e. 32-127), all other code points are dropped from the
> JSON output.

That would explain why my web interface does not display any umlaut
symbols.

Nice finding,
Sebastian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] json_quote_str should handle non-ASCII characters
  2010-03-04 13:57 ` Sebastian Spaeth
@ 2010-03-04 14:26   ` Gregor Hoffleit
  0 siblings, 0 replies; 12+ messages in thread
From: Gregor Hoffleit @ 2010-03-04 14:26 UTC (permalink / raw)
  To: Sebastian Spaeth; +Cc: notmuch

* Sebastian Spaeth <sebastian@sspaeth.de> [Do Mär 04 14:57:27 +0100 2010]
> On 2010-03-04, Sebastian Spaeth wrote:
> > The current code in json_quote_str() only accepts strict printable ASCII
> > code points (i.e. 32-127), all other code points are dropped from the
> > JSON output.
> 
> That would explain why my web interface does not display any umlaut
> symbols.

Well, I noticed noneatall umlauts. That's how I found this problem.

Regards,
    Gregor

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] json_quote_str should handle non-ASCII characters
  2010-03-04 10:40 [PATCH] json_quote_str should handle non-ASCII characters Gregor Hoffleit
  2010-03-04 13:57 ` Sebastian Spaeth
@ 2010-04-13 15:36 ` Carl Worth
  2010-04-13 16:37   ` [PATCH] First tests for JSON output and UTF-8 in mail body and subject Gregor Hoffleit
  1 sibling, 1 reply; 12+ messages in thread
From: Carl Worth @ 2010-04-13 15:36 UTC (permalink / raw)
  To: Gregor Hoffleit, notmuch

[-- Attachment #1: Type: text/plain, Size: 563 bytes --]

On Thu, 04 Mar 2010 11:40:03 +0100, Gregor Hoffleit <gregor@hoffleit.de> wrote:
> The current code in json_quote_str() only accepts strict printable ASCII
> code points (i.e. 32-127), all other code points are dropped from the
> JSON output.
> 
> This patch accepts code points 32-255.

Thanks, Gregor!

I've pushed this patch out now.

And I noted in our TODO file that we really should add some tests to the
test suite to exercise our --format=json code paths. It would be nice to
ensure we don't have any regressions in this area later.

-Carl

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] First tests for JSON output and UTF-8 in mail body and subject
  2010-04-13 15:36 ` Carl Worth
@ 2010-04-13 16:37   ` Gregor Hoffleit
  2010-04-15  0:35     ` Carl Worth
  0 siblings, 1 reply; 12+ messages in thread
From: Gregor Hoffleit @ 2010-04-13 16:37 UTC (permalink / raw)
  To: notmuch

The test suite doesn't yet cover --format=json output nor UTF-8 in
subject or body.

This patch starts with test cases for 'search --format=json' and
'show --format=json'.

Furthermore, it has test cases for a search for a UTF-8 string in a mail
body for a UTF-8 string in a mail subject.

Finally, it has a test case for --format=json with UTF-8 messages,
demonstrating the fix in 1267697893-sup-4538@sam.mediasupervision.de.
---
 test/notmuch-test |   38 ++++++++++++++++++++++++++++++++++++++
 1 files changed, 38 insertions(+), 0 deletions(-)

diff --git a/test/notmuch-test b/test/notmuch-test
index 1c5191b..b2a0756 100755
--- a/test/notmuch-test
+++ b/test/notmuch-test
@@ -146,7 +146,7 @@ add_message ()
 }
 
 NOTMUCH_IGNORED_OUTPUT_REGEXP='^Processed [0-9]*( total)? file|Found [0-9]* total file'
-NOTMUCH_THREAD_ID_SQUELCH='s/thread:................/thread:XXX/'
+NOTMUCH_THREAD_ID_SQUELCH='s/thread:................/thread:XXX/;s/"thread": "................",/"thread": "XXX",/'
 execute_expecting ()
 {
     args=$1
@@ -348,6 +348,42 @@ add_message '[subject]="body search (phrase)"' '[date]="Sat, 01 Jan 2000 12:00:0
 add_message '[subject]="negative result"' '[date]="Sat, 01 Jan 2000 12:00:00 -0000"' '[body]="This phrase should not match the body search"'
 execute_expecting "search '\"body search (phrase)\"'" "thread:XXX   2000-01-01 [1/1] Notmuch Test Suite; body search (phrase) (inbox unread)"
 
+printf " Show message: json...\t\t"
+add_message '[subject]="json-show-subject"' '[date]="Sat, 01 Jan 2000 12:00:00 -0000"' '[body]="json-show-message"'
+execute_expecting "show --format=json 'json-show-message'" '[[[{"id": "'${gen_msg_id}'", "match": true, "filename": "'${gen_msg_filename}'", "date_unix": 946728000, "date_relative": "2000-01-01", "tags": ["inbox","unread"], "headers": {"Subject": "json-show-subject", "From": "Notmuch Test Suite <test_suite@notmuchmail.org>", "To": "Notmuch Test Suite <test_suite@notmuchmail.org>", "Cc": "", "Bcc": "", "Date": "Sat, 01 Jan 2000 12:00:00 -0000"}, "body": [{"id": 1, "content-type": "text/plain", "content": "json-show-message\n"}]}, []]]]'
+
+printf " Search message: json...\t"
+add_message '[subject]="json-search-subject"' '[date]="Sat, 01 Jan 2000 12:00:00 -0000"' '[body]="json-search-message"'
+execute_expecting "search --format=json 'json-search-message'" '[{"thread": "XXX",
+"timestamp": 946724400,
+"matched": 1,
+"total": 1,
+"authors": "Notmuch Test Suite",
+"subject": "json-search-subject",
+"tags": ["inbox", "unread"]}]'
+
+printf " Search by subject (utf-8):...\t"
+add_message [subject]=utf8-sübjéct '[date]="Sat, 01 Jan 2000 12:00:00 -0000"'
+execute_expecting "search subject:utf8-sübjéct" "thread:XXX   2000-01-01 [1/1] Notmuch Test Suite; utf8-sübjéct (inbox unread)"
+
+printf " Search body (utf-8):...\t"
+add_message '[subject]="utf8-message-body-subject"' '[date]="Sat, 01 Jan 2000 12:00:00 -0000"' '[body]="message body utf8: bödý"'
+execute_expecting "search 'bödý'" "thread:XXX   2000-01-01 [1/1] Notmuch Test Suite; utf8-message-body-subject (inbox unread)"
+
+printf " Show message: json, utf-8...\t"
+add_message '[subject]="json-show-utf8-body-sübjéct"' '[date]="Sat, 01 Jan 2000 12:00:00 -0000"' '[body]="jsön-show-méssage"'
+execute_expecting "show --format=json 'jsön-show-méssage'" '[[[{"id": "'${gen_msg_id}'", "match": true, "filename": "'${gen_msg_filename}'", "date_unix": 946728000, "date_relative": "2000-01-01", "tags": ["inbox","unread"], "headers": {"Subject": "json-show-utf8-body-sübjéct", "From": "Notmuch Test Suite <test_suite@notmuchmail.org>", "To": "Notmuch Test Suite <test_suite@notmuchmail.org>", "Cc": "", "Bcc": "", "Date": "Sat, 01 Jan 2000 12:00:00 -0000"}, "body": [{"id": 1, "content-type": "text/plain", "content": "jsön-show-méssage\n"}]}, []]]]'
+
+printf " Search message: json, utf-8...\t"
+add_message '[subject]="json-search-utf8-body-sübjéct"' '[date]="Sat, 01 Jan 2000 12:00:00 -0000"' '[body]="jsön-search-méssage"'
+execute_expecting "search --format=json 'jsön-search-méssage'" '[{"thread": "XXX",
+"timestamp": 946724400,
+"matched": 1,
+"total": 1,
+"authors": "Notmuch Test Suite",
+"subject": "json-search-utf8-body-sübjéct",
+"tags": ["inbox", "unread"]}]'
+
 printf " Search by from: (address)...\t"
 add_message '[subject]="search by from (address)"' '[date]="Sat, 01 Jan 2000 12:00:00 -0000"' [from]=searchbyfrom@example.com
 execute_expecting "search from:searchbyfrom@example.com" "thread:XXX   2000-01-01 [1/1] searchbyfrom@example.com; search by from (address) (inbox unread)"

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] First tests for JSON output and UTF-8 in mail body and subject
  2010-04-13 16:37   ` [PATCH] First tests for JSON output and UTF-8 in mail body and subject Gregor Hoffleit
@ 2010-04-15  0:35     ` Carl Worth
  2010-04-15  8:33       ` Michal Sojka
                         ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Carl Worth @ 2010-04-15  0:35 UTC (permalink / raw)
  To: Gregor Hoffleit, notmuch

[-- Attachment #1: Type: text/plain, Size: 2066 bytes --]

On Tue, 13 Apr 2010 18:37:57 +0200, Gregor Hoffleit <gregor@hoffleit.de> wrote:
> The test suite doesn't yet cover --format=json output nor UTF-8 in
> subject or body.
> 
> This patch starts with test cases for 'search --format=json' and
> 'show --format=json'.

Thanks for the tests, Gregor!

I was about to push this, but first noticed that I hadn't run the test
suite in the last day and that it had recently broken (oops!). I fixed
that, but then also noticed that I got failures with your tests.


> +execute_expecting "show --format=json 'json-show-message'" '[[[{"id":
> "'${gen_msg_id}'", "match": true, "filename": "'${gen_msg_filename}'",
> "date_unix": 946728000, "date_relative": "2000-01-01", "tags":
...
> +printf " Search message: json...\t"
> +add_message '[subject]="json-search-subject"' '[date]="Sat, 01 Jan 2000 12:00:00 -0000"' '[body]="json-search-message"'
> +execute_expecting "search --format=json 'json-search-message'" '[{"thread": "XXX",
> +"timestamp": 946724400,

I'm getting a timestamp value here of 946756800 which is clearly an
interpretation of the above date as if it were my local time zone. That
is:

	$ date -u +%s -d "Sat, 01 Jan 2000 12:00:00 -0800"
	946756800

And the value you have appears to have been generated in your timezone:

	$ date -u +%s -d "Sat, 01 Jan 2000 12:00:00 +0100"
	946724400

Meanwhile, the value that should be printed here[*] is the value from
interpreting the original date in the timezone explicitly specified in
that date:

	$ date -u +%s -d "Sat, 01 Jan 2000 12:00:00 -0000"
	946728000

Note that the "notmuch show --format=json" test above does have the
correct timestamp.

So, a double thanks for this test, it seems to have uncovered another
bug.

-Carl

[*] I say "should" because I don't believe we have any actual
specification of the data coming out of the JSON output yet. One other
thing that seems odd is the name of "date_unix" in the show output and
"timestamp" in the search output for what is effectively the same
field.




[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] First tests for JSON output and UTF-8 in mail body and subject
  2010-04-15  0:35     ` Carl Worth
@ 2010-04-15  8:33       ` Michal Sojka
  2010-04-15 19:58         ` Carl Worth
  2010-04-16 11:49       ` [PATCH] First tests for JSON output and UTF-8 in mail body and subject David Edmondson
  2010-04-23  0:15       ` Carl Worth
  2 siblings, 1 reply; 12+ messages in thread
From: Michal Sojka @ 2010-04-15  8:33 UTC (permalink / raw)
  To: Carl Worth, Gregor Hoffleit, notmuch

On Thu, 15 Apr 2010, Carl Worth wrote:
> On Tue, 13 Apr 2010 18:37:57 +0200, Gregor Hoffleit <gregor@hoffleit.de> wrote:
> > The test suite doesn't yet cover --format=json output nor UTF-8 in
> > subject or body.
> > 
> > This patch starts with test cases for 'search --format=json' and
> > 'show --format=json'.
> 
> Thanks for the tests, Gregor!

Carl,

are you still interrested in modular test suite from git? If so, could
you please look at id:87mxxg7bxo.fsf@steelpick.2x.cz and tell me your
opinion. I'm still updating the modularized tests to match the state in
master but every change in master takes me quite long time to convert.

-Michal

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] First tests for JSON output and UTF-8 in mail body and subject
  2010-04-15  8:33       ` Michal Sojka
@ 2010-04-15 19:58         ` Carl Worth
  2010-04-16  8:17           ` Michal Sojka
  0 siblings, 1 reply; 12+ messages in thread
From: Carl Worth @ 2010-04-15 19:58 UTC (permalink / raw)
  To: Michal Sojka, Gregor Hoffleit, notmuch

[-- Attachment #1: Type: text/plain, Size: 1321 bytes --]

On Thu, 15 Apr 2010 10:33:46 +0200, Michal Sojka <sojkam1@fel.cvut.cz> wrote:
> are you still interrested in modular test suite from git? If so, could
> you please look at id:87mxxg7bxo.fsf@steelpick.2x.cz and tell me your
> opinion. I'm still updating the modularized tests to match the state in
> master but every change in master takes me quite long time to convert.

Hi Michal,

I would still like to have a modular test suite, yes.

Thanks for pointing out that other message to me, which I had missed in
the general notmuch-mailing-list backlog I'm still dealing with. I've
now replied to it over there.

I am sorry that you keep having to re-do a bunch of work to keep your
patch up-to-date. I'm just about to push another change which might
further cause problems.

But you might actually like that change since it's one you requested in
your first version of the modular test suite. I'm dropping the annoying
execute_expecting macro that both runs notmuch and tests the
output. There's now a much cleaner separation such as:

	output=$($NOTMUCH search for-something)
	pass_if_equal "$output" "something was found"

I still think it wouldn't be hard to just gradually implement any
particular features we want in the test suite. But if the git thing ever
does become available, then that will be fine too.

-Carl

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] First tests for JSON output and UTF-8 in mail body and subject
  2010-04-15 19:58         ` Carl Worth
@ 2010-04-16  8:17           ` Michal Sojka
  2010-04-22 21:14             ` Improved diff-based failure reports from the test suite Carl Worth
  0 siblings, 1 reply; 12+ messages in thread
From: Michal Sojka @ 2010-04-16  8:17 UTC (permalink / raw)
  To: Carl Worth, Gregor Hoffleit, notmuch

> But you might actually like that change since it's one you requested in
> your first version of the modular test suite. I'm dropping the annoying
> execute_expecting macro that both runs notmuch and tests the
> output. There's now a much cleaner separation such as:
> 
> 	output=$($NOTMUCH search for-something)
> 	pass_if_equal "$output" "something was found"

It's definitely better than before. The current implementation of
pass_if_equal has IMHO one drawback - if it compares multiline text and
there is a difference, it is quite hard to see where.

In my tests for maildir synchronization I use this approach:

  notmuch search tag:inbox | filter_output > actual &&
  diff -u - actual <<EOF
  thread:XXX   2000-01-01 [1/1] Notmuch Test Suite; test message 3 (inbox)
  EOF

Thanks to the usee of diff, I immediately see only the differences.

-Michal

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] First tests for JSON output and UTF-8 in mail body and subject
  2010-04-15  0:35     ` Carl Worth
  2010-04-15  8:33       ` Michal Sojka
@ 2010-04-16 11:49       ` David Edmondson
  2010-04-23  0:15       ` Carl Worth
  2 siblings, 0 replies; 12+ messages in thread
From: David Edmondson @ 2010-04-16 11:49 UTC (permalink / raw)
  To: Carl Worth, notmuch

On Wed, 14 Apr 2010 17:35:44 -0700, Carl Worth <cworth@cworth.org> wrote:
> [*] I say "should" because I don't believe we have any actual
> specification of the data coming out of the JSON output yet. One other
> thing that seems odd is the name of "date_unix" in the show output and
> "timestamp" in the search output for what is effectively the same
> field.

The show output is updated to use `timestamp' in
id:1271418469-19031-1-git-send-email-dme@dme.org (just sent).

dme.
-- 
David Edmondson, http://dme.org

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Improved diff-based failure reports from the test suite
  2010-04-16  8:17           ` Michal Sojka
@ 2010-04-22 21:14             ` Carl Worth
  0 siblings, 0 replies; 12+ messages in thread
From: Carl Worth @ 2010-04-22 21:14 UTC (permalink / raw)
  To: Michal Sojka, Gregor Hoffleit, notmuch

[-- Attachment #1: Type: text/plain, Size: 899 bytes --]

On Fri, 16 Apr 2010 10:17:15 +0200, Michal Sojka <sojkam1@fel.cvut.cz> wrote:
> It's definitely better than before. The current implementation of
> pass_if_equal has IMHO one drawback - if it compares multiline text and
> there is a difference, it is quite hard to see where.
> 
> In my tests for maildir synchronization I use this approach:
> 
>   notmuch search tag:inbox | filter_output > actual &&
>   diff -u - actual <<EOF
>   thread:XXX   2000-01-01 [1/1] Notmuch Test Suite; test message 3 (inbox)
>   EOF
> 
> Thanks to the usee of diff, I immediately see only the differences.

I just added a very long multi-line test to the test suite.

So I took the opportunity to fix it to output the failures with diff. It
also saves both the complete expected and actual output (for any failed
tests) to files of the form test-XXX.expected and test-XXX.output.

Enjoy!

-Carl

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] First tests for JSON output and UTF-8 in mail body and subject
  2010-04-15  0:35     ` Carl Worth
  2010-04-15  8:33       ` Michal Sojka
  2010-04-16 11:49       ` [PATCH] First tests for JSON output and UTF-8 in mail body and subject David Edmondson
@ 2010-04-23  0:15       ` Carl Worth
  2 siblings, 0 replies; 12+ messages in thread
From: Carl Worth @ 2010-04-23  0:15 UTC (permalink / raw)
  To: Gregor Hoffleit, notmuch

[-- Attachment #1: Type: text/plain, Size: 665 bytes --]

On Wed, 14 Apr 2010 17:35:44 -0700, Carl Worth <cworth@cworth.org> wrote:
> On Tue, 13 Apr 2010 18:37:57 +0200, Gregor Hoffleit <gregor@hoffleit.de> wrote:
> > The test suite doesn't yet cover --format=json output nor UTF-8 in
> > subject or body.
> > 
> > This patch starts with test cases for 'search --format=json' and
> > 'show --format=json'.
...
> So, a double thanks for this test, it seems to have uncovered another
> bug.

I've now fixed that bug, (hurrah for deleting code!), and pushed out
these tests, (which did have to be updated to apply to the current test
suite, but that was straightforward).

Thanks again for the tests.

-Carl

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2010-04-23  0:15 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-03-04 10:40 [PATCH] json_quote_str should handle non-ASCII characters Gregor Hoffleit
2010-03-04 13:57 ` Sebastian Spaeth
2010-03-04 14:26   ` Gregor Hoffleit
2010-04-13 15:36 ` Carl Worth
2010-04-13 16:37   ` [PATCH] First tests for JSON output and UTF-8 in mail body and subject Gregor Hoffleit
2010-04-15  0:35     ` Carl Worth
2010-04-15  8:33       ` Michal Sojka
2010-04-15 19:58         ` Carl Worth
2010-04-16  8:17           ` Michal Sojka
2010-04-22 21:14             ` Improved diff-based failure reports from the test suite Carl Worth
2010-04-16 11:49       ` [PATCH] First tests for JSON output and UTF-8 in mail body and subject David Edmondson
2010-04-23  0:15       ` Carl Worth

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).