From: "W. Trevor King" <wking@tremily.us>
To: notmuch@notmuchmail.org
Subject: [PATCH] nmbug: Allow Unicode tags and IDs in Python 2
Date: Sun, 14 Feb 2016 21:30:11 -0800 [thread overview]
Message-ID: <e287050a10ce1d2120db996d2d200f610370a44e.1455513965.git.wking@tremily.us> (raw)
Avoid a UnicodeWarning and broken pipe on 'nmbug commit' in Python 2
when a tag or message ID contains non-ASCII characters [1].
There are a number of Python bugs associated with this behavior
[2,3,4,5,6]. There's also some useful background in [8]. [3] lead to
the currently working Python 3 implementation, which encodes to UTF-8
by default and has 'encoding' and 'errors' arguments [7]. This commit
follows that approach in a way that's compatible with both Python 2
and Python 3. Coercing to UTF-8 (regardless of locale) gives us
consistent tag IDs for sharing between users.
The 'isnumeric' check identifies Unicode instances in both Python 2
[9] and Python 3 [10].
[1]: id:87twlbv5vj.fsf@zancas.localnet
http://thread.gmane.org/gmane.mail.notmuch.general/21855/focus=21862
Subject: Re: problems with nmbug and empty prefix (UnicodeWarning and broken pipe)
Date: Sun, 14 Feb 2016 08:22:24 -0400
[2]: http://bugs.python.org/issue2637
[3]: http://bugs.python.org/issue3300
[4]: http://bugs.python.org/issue22231
[5]: http://bugs.python.org/issue23885
[6]: http://bugs.python.org/issue1712522
[7]: https://docs.python.org/3/library/urllib.parse.html#urllib.parse.quote
[8]: https://mail.python.org/pipermail/python-dev/2006-July/067335.html
[9]: https://docs.python.org/2/library/stdtypes.html#unicode.isnumeric
[10]: https://docs.python.org/3/library/stdtypes.html#str.isnumeric
---
I haven't checked the other commands for issues with Unicode IDs or
tags. It's possible that in addition to this explicit encoding to
UTF-8, we'll also want explicit decoding from UTF-8 when reading from
Git trees (for 'nmbug checkout' and 'nmbug status').
Cheers,
Trevor
devel/nmbug/nmbug | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/devel/nmbug/nmbug b/devel/nmbug/nmbug
index 81f582c..284d374 100755
--- a/devel/nmbug/nmbug
+++ b/devel/nmbug/nmbug
@@ -1,6 +1,6 @@
#!/usr/bin/env python
#
-# Copyright (c) 2011-2014 David Bremner <david@tethera.net>
+# Copyright (c) 2011-2016 David Bremner <david@tethera.net>
# W. Trevor King <wking@tremily.us>
#
# This program is free software: you can redistribute it and/or modify
@@ -95,7 +95,7 @@ except AttributeError: # Python < 3.2
_tempfile.TemporaryDirectory = _TemporaryDirectory
-def _hex_quote(string, safe='+@=:,'):
+def _hex_quote(string, safe='+@=:,', encoding='utf-8', errors='strict'):
"""
quote('abc def') -> 'abc%20def'.
@@ -103,6 +103,15 @@ def _hex_quote(string, safe='+@=:,'):
addition to letters, digits, and '_.-') and lowercase hex digits
(e.g. '%3a' instead of '%3A').
"""
+ if hasattr(string, 'isnumeric'):
+ string = string.encode(encoding, errors)
+ if hasattr(safe, 'isnumeric'):
+ safe_bytes = safe.encode(encoding, errors)
+ if len(safe_bytes) != len(safe):
+ raise ValueError(
+ 'some safe characters are encoded as multiple bytes '
+ '({!r} -> {!r})'.format(safe, safe_bytes))
+ safe = safe_bytes
uppercase_escapes = _quote(string, safe)
return _HEX_ESCAPE_REGEX.sub(
lambda match: match.group(0).lower(),
--
2.1.0.60.g85f0837
next reply other threads:[~2016-02-15 5:29 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-15 5:30 W. Trevor King [this message]
2016-02-16 13:04 ` [PATCH] nmbug: Allow Unicode tags and IDs in Python 2 David Bremner
2016-02-16 17:56 ` W. Trevor King
2016-02-16 18:37 ` David Bremner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://notmuchmail.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e287050a10ce1d2120db996d2d200f610370a44e.1455513965.git.wking@tremily.us \
--to=wking@tremily.us \
--cc=notmuch@notmuchmail.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://yhetil.org/notmuch.git/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).