From: David Bremner <david@tethera.net>
To: "W. Trevor King" <wking@tremily.us>, notmuch@notmuchmail.org
Subject: Re: [PATCH] nmbug: Allow Unicode tags and IDs in Python 2
Date: Tue, 16 Feb 2016 09:04:07 -0400 [thread overview]
Message-ID: <87lh6kvmbc.fsf@zancas.localnet> (raw)
In-Reply-To: <e287050a10ce1d2120db996d2d200f610370a44e.1455513965.git.wking@tremily.us>
"W. Trevor King" <wking@tremily.us> writes:
> Avoid a UnicodeWarning and broken pipe on 'nmbug commit' in Python 2
> when a tag or message ID contains non-ASCII characters [1].
>
> There are a number of Python bugs associated with this behavior
> [2,3,4,5,6]. There's also some useful background in [8]. [3] lead to
> the currently working Python 3 implementation, which encodes to UTF-8
> by default and has 'encoding' and 'errors' arguments [7]. This commit
> follows that approach in a way that's compatible with both Python 2
> and Python 3. Coercing to UTF-8 (regardless of locale) gives us
> consistent tag IDs for sharing between users.
I'm not sure what "tag IDs" are. Do you mean message-ids here? or "tags
and IDs"?
At first I thought there might be problems with non-utf8 message-ids,
but that turns out not to be the case [1]. It seems like it would take
a fairly heroic effort to get non-UTF8 tags into the database (perhaps
by calling the library interface with bad strings?) so we can probably
ignore this case. It might be good to document the limitation though,
since AFAIK, dump and restore can roundtrip any old crap.
>
> The 'isnumeric' check identifies Unicode instances in both Python 2
> [9] and Python 3 [10].
>
I still haven't really tried to understand this part, but probably it
deserves inline documentation.
> ---
> I haven't checked the other commands for issues with Unicode IDs or
> tags. It's possible that in addition to this explicit encoding to
> UTF-8, we'll also want explicit decoding from UTF-8 when reading from
> Git trees (for 'nmbug checkout' and 'nmbug status').
Yes, this seems to be a problem, with the patch applied I can commit,
but the same utf-8 message-id causes problems.
bremner@zancas:~/software/upstream/notmuch$ ./devel/nmbug/nmbug status
U D1B4DEBCAFFC4A05A4D4349A6EC5C9D8@ÃÂÃ¥ðãÃ¥é-ÃÂàunread
A D1B4DEBCAFFC4A05A4D4349A6EC5C9D8@Ãåðãåé-Ãà unread
bremner@zancas:~/software/upstream/notmuch$ delve -a -1 ~/Maildir/.notmuch/xapian | grep D1B4DEBCAFFC4A05A4D4349A6EC5C9D8
QD1B4DEBCAFFC4A05A4D4349A6EC5C9D8@Ñåðãåé-ÏÊ
[1]: id:87si0svnim.fsf@zancas.localnet
next prev parent reply other threads:[~2016-02-16 13:04 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-15 5:30 [PATCH] nmbug: Allow Unicode tags and IDs in Python 2 W. Trevor King
2016-02-16 13:04 ` David Bremner [this message]
2016-02-16 17:56 ` W. Trevor King
2016-02-16 18:37 ` David Bremner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://notmuchmail.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87lh6kvmbc.fsf@zancas.localnet \
--to=david@tethera.net \
--cc=notmuch@notmuchmail.org \
--cc=wking@tremily.us \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://yhetil.org/notmuch.git/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).