unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
From: "W. Trevor King" <wking@tremily.us>
To: notmuch@notmuchmail.org
Subject: [PATCH] nmbug: Allow Unicode tags and IDs in Python 2
Date: Sun, 14 Feb 2016 21:30:11 -0800	[thread overview]
Message-ID: <e287050a10ce1d2120db996d2d200f610370a44e.1455513965.git.wking@tremily.us> (raw)

Avoid a UnicodeWarning and broken pipe on 'nmbug commit' in Python 2
when a tag or message ID contains non-ASCII characters [1].

There are a number of Python bugs associated with this behavior
[2,3,4,5,6].  There's also some useful background in [8].  [3] lead to
the currently working Python 3 implementation, which encodes to UTF-8
by default and has 'encoding' and 'errors' arguments [7].  This commit
follows that approach in a way that's compatible with both Python 2
and Python 3.  Coercing to UTF-8 (regardless of locale) gives us
consistent tag IDs for sharing between users.

The 'isnumeric' check identifies Unicode instances in both Python 2
[9] and Python 3 [10].

[1]: id:87twlbv5vj.fsf@zancas.localnet
     http://thread.gmane.org/gmane.mail.notmuch.general/21855/focus=21862
     Subject: Re: problems with nmbug and empty prefix (UnicodeWarning and broken pipe)
     Date: Sun, 14 Feb 2016 08:22:24 -0400
[2]: http://bugs.python.org/issue2637
[3]: http://bugs.python.org/issue3300
[4]: http://bugs.python.org/issue22231
[5]: http://bugs.python.org/issue23885
[6]: http://bugs.python.org/issue1712522
[7]: https://docs.python.org/3/library/urllib.parse.html#urllib.parse.quote
[8]: https://mail.python.org/pipermail/python-dev/2006-July/067335.html
[9]: https://docs.python.org/2/library/stdtypes.html#unicode.isnumeric
[10]: https://docs.python.org/3/library/stdtypes.html#str.isnumeric
---
I haven't checked the other commands for issues with Unicode IDs or
tags.  It's possible that in addition to this explicit encoding to
UTF-8, we'll also want explicit decoding from UTF-8 when reading from
Git trees (for 'nmbug checkout' and 'nmbug status').

Cheers,
Trevor

 devel/nmbug/nmbug | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/devel/nmbug/nmbug b/devel/nmbug/nmbug
index 81f582c..284d374 100755
--- a/devel/nmbug/nmbug
+++ b/devel/nmbug/nmbug
@@ -1,6 +1,6 @@
 #!/usr/bin/env python
 #
-# Copyright (c) 2011-2014 David Bremner <david@tethera.net>
+# Copyright (c) 2011-2016 David Bremner <david@tethera.net>
 #                         W. Trevor King <wking@tremily.us>
 #
 # This program is free software: you can redistribute it and/or modify
@@ -95,7 +95,7 @@ except AttributeError:  # Python < 3.2
     _tempfile.TemporaryDirectory = _TemporaryDirectory
 
 
-def _hex_quote(string, safe='+@=:,'):
+def _hex_quote(string, safe='+@=:,', encoding='utf-8', errors='strict'):
     """
     quote('abc def') -> 'abc%20def'.
 
@@ -103,6 +103,15 @@ def _hex_quote(string, safe='+@=:,'):
     addition to letters, digits, and '_.-') and lowercase hex digits
     (e.g. '%3a' instead of '%3A').
     """
+    if hasattr(string, 'isnumeric'):
+        string = string.encode(encoding, errors)
+    if hasattr(safe, 'isnumeric'):
+        safe_bytes = safe.encode(encoding, errors)
+        if len(safe_bytes) != len(safe):
+            raise ValueError(
+                'some safe characters are encoded as multiple bytes '
+                '({!r} -> {!r})'.format(safe, safe_bytes))
+        safe = safe_bytes
     uppercase_escapes = _quote(string, safe)
     return _HEX_ESCAPE_REGEX.sub(
         lambda match: match.group(0).lower(),
-- 
2.1.0.60.g85f0837

             reply	other threads:[~2016-02-15  5:29 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-15  5:30 W. Trevor King [this message]
2016-02-16 13:04 ` [PATCH] nmbug: Allow Unicode tags and IDs in Python 2 David Bremner
2016-02-16 17:56   ` W. Trevor King
2016-02-16 18:37     ` David Bremner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://notmuchmail.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e287050a10ce1d2120db996d2d200f610370a44e.1455513965.git.wking@tremily.us \
    --to=wking@tremily.us \
    --cc=notmuch@notmuchmail.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).