unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* [PATCH] nmbug: Allow Unicode tags and IDs in Python 2
@ 2016-02-15  5:30 W. Trevor King
  2016-02-16 13:04 ` David Bremner
  0 siblings, 1 reply; 4+ messages in thread
From: W. Trevor King @ 2016-02-15  5:30 UTC (permalink / raw)
  To: notmuch

Avoid a UnicodeWarning and broken pipe on 'nmbug commit' in Python 2
when a tag or message ID contains non-ASCII characters [1].

There are a number of Python bugs associated with this behavior
[2,3,4,5,6].  There's also some useful background in [8].  [3] lead to
the currently working Python 3 implementation, which encodes to UTF-8
by default and has 'encoding' and 'errors' arguments [7].  This commit
follows that approach in a way that's compatible with both Python 2
and Python 3.  Coercing to UTF-8 (regardless of locale) gives us
consistent tag IDs for sharing between users.

The 'isnumeric' check identifies Unicode instances in both Python 2
[9] and Python 3 [10].

[1]: id:87twlbv5vj.fsf@zancas.localnet
     http://thread.gmane.org/gmane.mail.notmuch.general/21855/focus=21862
     Subject: Re: problems with nmbug and empty prefix (UnicodeWarning and broken pipe)
     Date: Sun, 14 Feb 2016 08:22:24 -0400
[2]: http://bugs.python.org/issue2637
[3]: http://bugs.python.org/issue3300
[4]: http://bugs.python.org/issue22231
[5]: http://bugs.python.org/issue23885
[6]: http://bugs.python.org/issue1712522
[7]: https://docs.python.org/3/library/urllib.parse.html#urllib.parse.quote
[8]: https://mail.python.org/pipermail/python-dev/2006-July/067335.html
[9]: https://docs.python.org/2/library/stdtypes.html#unicode.isnumeric
[10]: https://docs.python.org/3/library/stdtypes.html#str.isnumeric
---
I haven't checked the other commands for issues with Unicode IDs or
tags.  It's possible that in addition to this explicit encoding to
UTF-8, we'll also want explicit decoding from UTF-8 when reading from
Git trees (for 'nmbug checkout' and 'nmbug status').

Cheers,
Trevor

 devel/nmbug/nmbug | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/devel/nmbug/nmbug b/devel/nmbug/nmbug
index 81f582c..284d374 100755
--- a/devel/nmbug/nmbug
+++ b/devel/nmbug/nmbug
@@ -1,6 +1,6 @@
 #!/usr/bin/env python
 #
-# Copyright (c) 2011-2014 David Bremner <david@tethera.net>
+# Copyright (c) 2011-2016 David Bremner <david@tethera.net>
 #                         W. Trevor King <wking@tremily.us>
 #
 # This program is free software: you can redistribute it and/or modify
@@ -95,7 +95,7 @@ except AttributeError:  # Python < 3.2
     _tempfile.TemporaryDirectory = _TemporaryDirectory
 
 
-def _hex_quote(string, safe='+@=:,'):
+def _hex_quote(string, safe='+@=:,', encoding='utf-8', errors='strict'):
     """
     quote('abc def') -> 'abc%20def'.
 
@@ -103,6 +103,15 @@ def _hex_quote(string, safe='+@=:,'):
     addition to letters, digits, and '_.-') and lowercase hex digits
     (e.g. '%3a' instead of '%3A').
     """
+    if hasattr(string, 'isnumeric'):
+        string = string.encode(encoding, errors)
+    if hasattr(safe, 'isnumeric'):
+        safe_bytes = safe.encode(encoding, errors)
+        if len(safe_bytes) != len(safe):
+            raise ValueError(
+                'some safe characters are encoded as multiple bytes '
+                '({!r} -> {!r})'.format(safe, safe_bytes))
+        safe = safe_bytes
     uppercase_escapes = _quote(string, safe)
     return _HEX_ESCAPE_REGEX.sub(
         lambda match: match.group(0).lower(),
-- 
2.1.0.60.g85f0837

^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-02-16 18:37 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-15  5:30 [PATCH] nmbug: Allow Unicode tags and IDs in Python 2 W. Trevor King
2016-02-16 13:04 ` David Bremner
2016-02-16 17:56   ` W. Trevor King
2016-02-16 18:37     ` David Bremner

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).