unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* [python] querystrings as unicode literals?
@ 2011-07-07 11:37 Patrick Totzke
  2011-07-11  9:32 ` Sebastian Spaeth
  2011-07-11  9:43 ` [PATCH] python: Encode query string as a utf-8 byte array Sebastian Spaeth
  0 siblings, 2 replies; 4+ messages in thread
From: Patrick Totzke @ 2011-07-07 11:37 UTC (permalink / raw)
  To: notmuch

[-- Attachment #1: Type: text/plain, Size: 612 bytes --]

Hi!
Something strange goes on when I use unicode literals as querystrings:
Database().create_query(u'teststring') yields different results than
Database().create_query('teststring')..

Now it should not be a problem to decode the string to whatever encoding
is used by notmuch/xapian internally using 'teststring'.encode('utf8')
for example. But can I reliably expect all strings in the index to be valid utf8?

At any rate, I think this conversion should be made from inside the bindings.
A query should return the same results for querystrings as string- and unicode literals. 
Any thoughts?
/p


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [python] querystrings as unicode literals?
  2011-07-07 11:37 [python] querystrings as unicode literals? Patrick Totzke
@ 2011-07-11  9:32 ` Sebastian Spaeth
  2011-07-11  9:43 ` [PATCH] python: Encode query string as a utf-8 byte array Sebastian Spaeth
  1 sibling, 0 replies; 4+ messages in thread
From: Sebastian Spaeth @ 2011-07-11  9:32 UTC (permalink / raw)
  To: Patrick Totzke, notmuch

[-- Attachment #1: Type: text/plain, Size: 1141 bytes --]

On Thu, 7 Jul 2011 12:37:00 +0100, Patrick Totzke <patricktotzke@googlemail.com> wrote:
> Hi!
> Something strange goes on when I use unicode literals as querystrings:
> Database().create_query(u'teststring') yields different results than
> Database().create_query('teststring')..
> 
> Now it should not be a problem to decode the string to whatever encoding
> is used by notmuch/xapian internally using 'teststring'.encode('utf8')
> for example. But can I reliably expect all strings in the index to be valid utf8?
> 
> At any rate, I think this conversion should be made from inside the bindings.
> A query should return the same results for querystrings as string- and unicode literals. 
> Any thoughts?

I hate encodings and they always confuse the heck out of me. I would
prefer if everything was always UTF8. notmuch.h actually doesn't state
which encoding the query string should be and neither did
http://xapian.org/docs/queryparser.html. ojwb said, it takes UTF-8, so
that's what we should be doing.

I'll send a patch as a reply shortly, Patrick, do you care to test if
this fixes things for you?

Sebastian

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH] python: Encode query string as a utf-8 byte array
  2011-07-07 11:37 [python] querystrings as unicode literals? Patrick Totzke
  2011-07-11  9:32 ` Sebastian Spaeth
@ 2011-07-11  9:43 ` Sebastian Spaeth
  2011-07-11  9:49   ` [PATCH v2] " Sebastian Spaeth
  1 sibling, 1 reply; 4+ messages in thread
From: Sebastian Spaeth @ 2011-07-11  9:43 UTC (permalink / raw)
  To: notmuch; +Cc: Patrick Totzke

If we pass in an unicode instance as query string, we would probably get
weird behavior (and indeed do so, see mail
id:"20110707113700.GA16347@megatron"). If a unicode instance is passed
in, make sure we encode it properly to an utf-8 encoded byte string.

Signed-off-by: Sebastian Spaeth <Sebastian@SSpaeth.de>
---
Patrick, can you test, if this fixes the inconsistent behavior in a good way?

 bindings/python/notmuch/database.py |    8 +++++---
 1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/bindings/python/notmuch/database.py b/bindings/python/notmuch/database.py
index 3770b13..859fb18 100644
--- a/bindings/python/notmuch/database.py
+++ b/bindings/python/notmuch/database.py
@@ -501,7 +501,7 @@ class Query(object):
         :param db: An open database which we derive the Query from.
         :type db: :class:`Database`
         :param querystr: The query string for the message.
-        :type querystr: str
+        :type querystr: utf-8 encoded str or unicode
         """
         self._db = None
         self._query = None
@@ -517,7 +517,7 @@ class Query(object):
         :param db: Database to create the query from.
         :type db: :class:`Database`
         :param querystr: The query string
-        :type querystr: str
+        :type querystr: utf-8 encoded str or unicode
         :returns: Nothing
         :exception: :exc:`NotmuchError`
 
@@ -529,7 +529,9 @@ class Query(object):
             raise NotmuchError(STATUS.NOT_INITIALIZED)            
         # create reference to parent db to keep it alive
         self._db = db
-        
+        if is_instance(querystr, unicode):
+            # xapian takes utf-8 encoded byte arrays
+            querystr = querystr.encode('utf-8')
         # create query, return None if too little mem available
         query_p = Query._create(db.db_p, querystr)
         if query_p is None:
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH v2] python: Encode query string as a utf-8 byte array
  2011-07-11  9:43 ` [PATCH] python: Encode query string as a utf-8 byte array Sebastian Spaeth
@ 2011-07-11  9:49   ` Sebastian Spaeth
  0 siblings, 0 replies; 4+ messages in thread
From: Sebastian Spaeth @ 2011-07-11  9:49 UTC (permalink / raw)
  To: notmuch; +Cc: Patrick Totzke

If we pass in an unicode instance as query string, we would probably get
weird behavior (and indeed do so, see mail
id:"20110707113700.GA16347@megatron"). If a unicode instance is passed
in, make sure we encode it properly to an utf-8 encoded byte string.

Signed-off-by: Sebastian Spaeth <Sebastian@SSpaeth.de>
---
DOH, I should test my patches for typos before actually sending them off. This one should be fine.

 bindings/python/notmuch/database.py |    8 +++++---
 1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/bindings/python/notmuch/database.py b/bindings/python/notmuch/database.py
index 3770b13..84cf79b 100644
--- a/bindings/python/notmuch/database.py
+++ b/bindings/python/notmuch/database.py
@@ -501,7 +501,7 @@ class Query(object):
         :param db: An open database which we derive the Query from.
         :type db: :class:`Database`
         :param querystr: The query string for the message.
-        :type querystr: str
+        :type querystr: utf-8 encoded str or unicode
         """
         self._db = None
         self._query = None
@@ -517,7 +517,7 @@ class Query(object):
         :param db: Database to create the query from.
         :type db: :class:`Database`
         :param querystr: The query string
-        :type querystr: str
+        :type querystr: utf-8 encoded str or unicode
         :returns: Nothing
         :exception: :exc:`NotmuchError`
 
@@ -529,7 +529,9 @@ class Query(object):
             raise NotmuchError(STATUS.NOT_INITIALIZED)            
         # create reference to parent db to keep it alive
         self._db = db
-        
+        if isinstance(querystr, unicode):
+            # xapian takes utf-8 encoded byte arrays
+            querystr = querystr.encode('utf-8')
         # create query, return None if too little mem available
         query_p = Query._create(db.db_p, querystr)
         if query_p is None:
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-07-11  9:49 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-07-07 11:37 [python] querystrings as unicode literals? Patrick Totzke
2011-07-11  9:32 ` Sebastian Spaeth
2011-07-11  9:43 ` [PATCH] python: Encode query string as a utf-8 byte array Sebastian Spaeth
2011-07-11  9:49   ` [PATCH v2] " Sebastian Spaeth

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).