* [python] querystrings as unicode literals?
@ 2011-07-07 11:37 Patrick Totzke
2011-07-11 9:32 ` Sebastian Spaeth
2011-07-11 9:43 ` [PATCH] python: Encode query string as a utf-8 byte array Sebastian Spaeth
0 siblings, 2 replies; 4+ messages in thread
From: Patrick Totzke @ 2011-07-07 11:37 UTC (permalink / raw)
To: notmuch
[-- Attachment #1: Type: text/plain, Size: 612 bytes --]
Hi!
Something strange goes on when I use unicode literals as querystrings:
Database().create_query(u'teststring') yields different results than
Database().create_query('teststring')..
Now it should not be a problem to decode the string to whatever encoding
is used by notmuch/xapian internally using 'teststring'.encode('utf8')
for example. But can I reliably expect all strings in the index to be valid utf8?
At any rate, I think this conversion should be made from inside the bindings.
A query should return the same results for querystrings as string- and unicode literals.
Any thoughts?
/p
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [python] querystrings as unicode literals?
2011-07-07 11:37 [python] querystrings as unicode literals? Patrick Totzke
@ 2011-07-11 9:32 ` Sebastian Spaeth
2011-07-11 9:43 ` [PATCH] python: Encode query string as a utf-8 byte array Sebastian Spaeth
1 sibling, 0 replies; 4+ messages in thread
From: Sebastian Spaeth @ 2011-07-11 9:32 UTC (permalink / raw)
To: Patrick Totzke, notmuch
[-- Attachment #1: Type: text/plain, Size: 1141 bytes --]
On Thu, 7 Jul 2011 12:37:00 +0100, Patrick Totzke <patricktotzke@googlemail.com> wrote:
> Hi!
> Something strange goes on when I use unicode literals as querystrings:
> Database().create_query(u'teststring') yields different results than
> Database().create_query('teststring')..
>
> Now it should not be a problem to decode the string to whatever encoding
> is used by notmuch/xapian internally using 'teststring'.encode('utf8')
> for example. But can I reliably expect all strings in the index to be valid utf8?
>
> At any rate, I think this conversion should be made from inside the bindings.
> A query should return the same results for querystrings as string- and unicode literals.
> Any thoughts?
I hate encodings and they always confuse the heck out of me. I would
prefer if everything was always UTF8. notmuch.h actually doesn't state
which encoding the query string should be and neither did
http://xapian.org/docs/queryparser.html. ojwb said, it takes UTF-8, so
that's what we should be doing.
I'll send a patch as a reply shortly, Patrick, do you care to test if
this fixes things for you?
Sebastian
[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH] python: Encode query string as a utf-8 byte array
2011-07-07 11:37 [python] querystrings as unicode literals? Patrick Totzke
2011-07-11 9:32 ` Sebastian Spaeth
@ 2011-07-11 9:43 ` Sebastian Spaeth
2011-07-11 9:49 ` [PATCH v2] " Sebastian Spaeth
1 sibling, 1 reply; 4+ messages in thread
From: Sebastian Spaeth @ 2011-07-11 9:43 UTC (permalink / raw)
To: notmuch; +Cc: Patrick Totzke
If we pass in an unicode instance as query string, we would probably get
weird behavior (and indeed do so, see mail
id:"20110707113700.GA16347@megatron"). If a unicode instance is passed
in, make sure we encode it properly to an utf-8 encoded byte string.
Signed-off-by: Sebastian Spaeth <Sebastian@SSpaeth.de>
---
Patrick, can you test, if this fixes the inconsistent behavior in a good way?
bindings/python/notmuch/database.py | 8 +++++---
1 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/bindings/python/notmuch/database.py b/bindings/python/notmuch/database.py
index 3770b13..859fb18 100644
--- a/bindings/python/notmuch/database.py
+++ b/bindings/python/notmuch/database.py
@@ -501,7 +501,7 @@ class Query(object):
:param db: An open database which we derive the Query from.
:type db: :class:`Database`
:param querystr: The query string for the message.
- :type querystr: str
+ :type querystr: utf-8 encoded str or unicode
"""
self._db = None
self._query = None
@@ -517,7 +517,7 @@ class Query(object):
:param db: Database to create the query from.
:type db: :class:`Database`
:param querystr: The query string
- :type querystr: str
+ :type querystr: utf-8 encoded str or unicode
:returns: Nothing
:exception: :exc:`NotmuchError`
@@ -529,7 +529,9 @@ class Query(object):
raise NotmuchError(STATUS.NOT_INITIALIZED)
# create reference to parent db to keep it alive
self._db = db
-
+ if is_instance(querystr, unicode):
+ # xapian takes utf-8 encoded byte arrays
+ querystr = querystr.encode('utf-8')
# create query, return None if too little mem available
query_p = Query._create(db.db_p, querystr)
if query_p is None:
--
1.7.4.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH v2] python: Encode query string as a utf-8 byte array
2011-07-11 9:43 ` [PATCH] python: Encode query string as a utf-8 byte array Sebastian Spaeth
@ 2011-07-11 9:49 ` Sebastian Spaeth
0 siblings, 0 replies; 4+ messages in thread
From: Sebastian Spaeth @ 2011-07-11 9:49 UTC (permalink / raw)
To: notmuch; +Cc: Patrick Totzke
If we pass in an unicode instance as query string, we would probably get
weird behavior (and indeed do so, see mail
id:"20110707113700.GA16347@megatron"). If a unicode instance is passed
in, make sure we encode it properly to an utf-8 encoded byte string.
Signed-off-by: Sebastian Spaeth <Sebastian@SSpaeth.de>
---
DOH, I should test my patches for typos before actually sending them off. This one should be fine.
bindings/python/notmuch/database.py | 8 +++++---
1 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/bindings/python/notmuch/database.py b/bindings/python/notmuch/database.py
index 3770b13..84cf79b 100644
--- a/bindings/python/notmuch/database.py
+++ b/bindings/python/notmuch/database.py
@@ -501,7 +501,7 @@ class Query(object):
:param db: An open database which we derive the Query from.
:type db: :class:`Database`
:param querystr: The query string for the message.
- :type querystr: str
+ :type querystr: utf-8 encoded str or unicode
"""
self._db = None
self._query = None
@@ -517,7 +517,7 @@ class Query(object):
:param db: Database to create the query from.
:type db: :class:`Database`
:param querystr: The query string
- :type querystr: str
+ :type querystr: utf-8 encoded str or unicode
:returns: Nothing
:exception: :exc:`NotmuchError`
@@ -529,7 +529,9 @@ class Query(object):
raise NotmuchError(STATUS.NOT_INITIALIZED)
# create reference to parent db to keep it alive
self._db = db
-
+ if isinstance(querystr, unicode):
+ # xapian takes utf-8 encoded byte arrays
+ querystr = querystr.encode('utf-8')
# create query, return None if too little mem available
query_p = Query._create(db.db_p, querystr)
if query_p is None:
--
1.7.4.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2011-07-11 9:49 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-07-07 11:37 [python] querystrings as unicode literals? Patrick Totzke
2011-07-11 9:32 ` Sebastian Spaeth
2011-07-11 9:43 ` [PATCH] python: Encode query string as a utf-8 byte array Sebastian Spaeth
2011-07-11 9:49 ` [PATCH v2] " Sebastian Spaeth
Code repositories for project(s) associated with this public inbox
https://yhetil.org/notmuch.git/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).