* problems with nmbug and empty prefix @ 2016-02-13 19:10 David Bremner 2016-02-13 22:33 ` problems with nmbug and empty prefix (UnicodeWarning and broken pipe) W. Trevor King 0 siblings, 1 reply; 6+ messages in thread From: David Bremner @ 2016-02-13 19:10 UTC (permalink / raw) To: notmuch Currently nmbug doesn't seem to work with an empty prefix for me. ╭─ zancas:~ ╰─% bash bremner@zancas:~$ export NMBGIT=/tmp/nmbug bremner@zancas:~$ export NMBPREFIX="" bremner@zancas:~$ nmbug commit /usr/lib/python2.7/urllib.py:1303: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal return ''.join(map(quoter, s)) Error flushing output: <fd:8>: Broken pipe [u'notmuch', u'dump', u'--format=batch-tag', u'--', u'<censored>'] exited with 254 where <censored> is the complete list of tags in my database. /tmp/nmbug is a previously cloned repo. I'm not sure if this is because of some problem with a null prefix specifically, or just the number of messages involved. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: problems with nmbug and empty prefix (UnicodeWarning and broken pipe) 2016-02-13 19:10 problems with nmbug and empty prefix David Bremner @ 2016-02-13 22:33 ` W. Trevor King 2016-02-14 2:41 ` David Bremner 0 siblings, 1 reply; 6+ messages in thread From: W. Trevor King @ 2016-02-13 22:33 UTC (permalink / raw) To: David Bremner; +Cc: notmuch [-- Attachment #1: Type: text/plain, Size: 1669 bytes --] On Sat, Feb 13, 2016 at 03:10:16PM -0400, David Bremner wrote: > bremner@zancas:~$ export NMBGIT=/tmp/nmbug > bremner@zancas:~$ export NMBPREFIX="" > bremner@zancas:~$ nmbug commit > /usr/lib/python2.7/urllib.py:1303: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal > return ''.join(map(quoter, s)) > Error flushing output: <fd:8>: Broken pipe > [u'notmuch', u'dump', u'--format=batch-tag', u'--', u'<censored>'] exited with 254 I couldn't reproduce this in either Python 3.4.3 or 2.7.10. It might be your number-of-tags hypothesis, but the UnicodeWarning suggests an encoding issue involving the dump output, which might mean that you just have a strange tag. Can you try again with: $ nmbug --log-level debug commit which will give us the full traceback. We only call ‘notmuch dump …’ from _index_tags, where dump's stdout is tweaked and fed into ‘git update-index …’. Your urllib UnicodeWarning suggests the issue lies in: tags = [ _unquote(tag[len(prefix):]) for tag in tags_string.split() if tag.startswith(prefix)] in which case it would be useful to try something like: tags = [] for tag in tags_string.split(): try: if tag.startswith(prefix): tags.append(_unquote(tag[len(prefix):])) except UnicodeWarning as error: raise ValueError('{!r} ({!r}, {})'.format(tag, prefix, error)) Cheers, Trevor -- This email may be signed or encrypted with GnuPG (http://www.gnupg.org). For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: problems with nmbug and empty prefix (UnicodeWarning and broken pipe) 2016-02-13 22:33 ` problems with nmbug and empty prefix (UnicodeWarning and broken pipe) W. Trevor King @ 2016-02-14 2:41 ` David Bremner 2016-02-14 6:31 ` W. Trevor King 0 siblings, 1 reply; 6+ messages in thread From: David Bremner @ 2016-02-14 2:41 UTC (permalink / raw) To: W. Trevor King; +Cc: notmuch "W. Trevor King" <wking@tremily.us> writes: > > $ nmbug --log-level debug commit > > which will give us the full traceback. > Traceback (most recent call last): File "/home/bremner/.config/scripts/nmbug.real", line 834, in <module> args.func(**kwargs) File "/home/bremner/.config/scripts/nmbug.real", line 324, in commit status = get_status() File "/home/bremner/.config/scripts/nmbug.real", line 581, in get_status index = _index_tags() File "/home/bremner/.config/scripts/nmbug.real", line 621, in _index_tags git.stdin.write(line) > We only call ‘notmuch dump …’ from _index_tags, where dump's stdout is > tweaked and fed into ‘git update-index …’. Your urllib UnicodeWarning > suggests the issue lies in: > > tags = [ > _unquote(tag[len(prefix):]) > for tag in tags_string.split() > if tag.startswith(prefix)] Looking at the source for urllib, that line is actually in quote, which is called only from _hex_quote It turns out I only have a problem with python2.7; python3.4 completes the commit. At a guess, that suggests a unicode problem of some kind. Unfortunately despite my best efforts with filterwarnings, I couldn't figure out how to get a stack trace for that UnicodeWarning. d ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: problems with nmbug and empty prefix (UnicodeWarning and broken pipe) 2016-02-14 2:41 ` David Bremner @ 2016-02-14 6:31 ` W. Trevor King 2016-02-14 12:22 ` David Bremner 0 siblings, 1 reply; 6+ messages in thread From: W. Trevor King @ 2016-02-14 6:31 UTC (permalink / raw) To: David Bremner; +Cc: notmuch [-- Attachment #1: Type: text/plain, Size: 3331 bytes --] On Sat, Feb 13, 2016 at 10:41:40PM -0400, David Bremner wrote: > Traceback (most recent call last): > File "/home/bremner/.config/scripts/nmbug.real", line 834, in <module> > args.func(**kwargs) > File "/home/bremner/.config/scripts/nmbug.real", line 324, in commit > status = get_status() > File "/home/bremner/.config/scripts/nmbug.real", line 581, in get_status > index = _index_tags() > File "/home/bremner/.config/scripts/nmbug.real", line 621, in _index_tags > git.stdin.write(line) This traceback is pointing at what should be a stream write, so I don't see how urllib is involved there at all. I guess this traceback ends up in the “Broken pipe” message from your original post? Dropping some debugging prints into the: for line in notmuch.stdout: block will likely get us close enough to figure out which line in the ‘notmuch dump …’ output causing the problem. > > We only call ‘notmuch dump …’ from _index_tags, where dump's stdout is > > tweaked and fed into ‘git update-index …’. Your urllib UnicodeWarning > > suggests the issue lies in: > > > > tags = [ > > _unquote(tag[len(prefix):]) > > for tag in tags_string.split() > > if tag.startswith(prefix)] > > Looking at the source for urllib, that line is actually in quote, > which is called only from _hex_quote And we call _hex_quote from _index_tags_for_message, which is right before the git.stdin.write line from your traceback. So its certainly possible that we're feeding _hex_quote something it can't handle in Python 2. If I could reproduce this locally, I'd probably drop a debugging print in there as well: for tag in tags: _LOG.debug('building a quoted path for {!r} / {!r}'.format(id, tag)) path = 'tags/{id}/{tag}'.format( id=_hex_quote(string=id), tag=_hex_quote(string=tag)) yield '{mode} {hash}\t{path}\n'.format(mode=mode, hash=hash, path=path) > Unfortunately despite my best efforts with filterwarnings, I > couldn't figure out how to get a stack trace for that > UnicodeWarning. I haven't spent much time with filterwarnings. My guess is that: $ python -W error ./nmbug --log-level debug commit will turn it into a raised exception [1]. But you may have tried that, and it may not have worked for some reason :p. If dropping debugging prints into the relevant code sections doesn't turn up the problem, ‘strace -o /tmp/trace -f nmbug --log-level debug commit’ will likely capture enough of the data moving between processes for us to figure out what nmbug is choking on. Another alternative would be to check your list of censored tags for anything that looks like it might contain Unicode-issue-triggering characters. What is your locale? Do you have any tags with non-ASCII characters? You should be able to isolate this problem by iterating through all your tags: $ for TAG in <censored> > do > echo "${TAG}" > NMBPREFIX="${TAG%?}" nmbug commit > done and see which one acts up. Cheers, Trevor [1]: https://docs.python.org/2/library/warnings.html#warning-filter -- This email may be signed or encrypted with GnuPG (http://www.gnupg.org). For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: problems with nmbug and empty prefix (UnicodeWarning and broken pipe) 2016-02-14 6:31 ` W. Trevor King @ 2016-02-14 12:22 ` David Bremner 2016-02-14 22:33 ` W. Trevor King 0 siblings, 1 reply; 6+ messages in thread From: David Bremner @ 2016-02-14 12:22 UTC (permalink / raw) To: W. Trevor King; +Cc: notmuch "W. Trevor King" <wking@tremily.us> writes: > On Sat, Feb 13, 2016 at 10:41:40PM -0400, David Bremner wrote: >> Traceback (most recent call last): >> File "/home/bremner/.config/scripts/nmbug.real", line 834, in <module> >> args.func(**kwargs) >> File "/home/bremner/.config/scripts/nmbug.real", line 324, in commit >> status = get_status() >> File "/home/bremner/.config/scripts/nmbug.real", line 581, in get_status >> index = _index_tags() >> File "/home/bremner/.config/scripts/nmbug.real", line 621, in _index_tags >> git.stdin.write(line) > > This traceback is pointing at what should be a stream write, so I > don't see how urllib is involved there at all. I guess this traceback > ends up in the “Broken pipe” message from your original post? yes > > for tag in tags: > _LOG.debug('building a quoted path for {!r} / {!r}'.format(id, tag)) > path = 'tags/{id}/{tag}'.format( > id=_hex_quote(string=id), tag=_hex_quote(string=tag)) > yield '{mode} {hash}\t{path}\n'.format(mode=mode, hash=hash, path=path) > I think the problem is not a bad tag, but a bad message-id. The last line of output before the UnicodeWarning and the broken pipe is building a quoted path for u'D1B4DEBCAFFC4A05A4D4349A6EC5C9D8@\xd1\xe5\xf0\xe3\xe5\xe9-\xcf\xca' / u'unread' this corresponds to id:D1B4DEBCAFFC4A05A4D4349A6EC5C9D8@Ñåðãåé-ÏÊ The original header looks like Message-ID: <D1B4DEBCAFFC4A05A4D4349A6EC5C9D8@Ñåðãåé-ÏÊ> Visually I can't see what the encoding error might be, but I guess that doesn't matter; we should be able to handle message-ids as opaque byte strings. d ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: problems with nmbug and empty prefix (UnicodeWarning and broken pipe) 2016-02-14 12:22 ` David Bremner @ 2016-02-14 22:33 ` W. Trevor King 0 siblings, 0 replies; 6+ messages in thread From: W. Trevor King @ 2016-02-14 22:33 UTC (permalink / raw) To: David Bremner; +Cc: notmuch [-- Attachment #1: Type: text/plain, Size: 3912 bytes --] On Sun, Feb 14, 2016 at 08:22:24AM -0400, David Bremner wrote: > W. Trevor King writes: > > for tag in tags: > > _LOG.debug('building a quoted path for {!r} / {!r}'.format(id, tag)) > > path = 'tags/{id}/{tag}'.format( > > id=_hex_quote(string=id), tag=_hex_quote(string=tag)) > > yield '{mode} {hash}\t{path}\n'.format(mode=mode, hash=hash, path=path) > > > > I think the problem is not a bad tag, but a bad message-id. The last > line of output before the UnicodeWarning and the broken pipe is > > building a quoted path for u'D1B4DEBCAFFC4A05A4D4349A6EC5C9D8@\xd1\xe5\xf0\xe3\xe5\xe9-\xcf\xca' / u'unread' $ ln -s nmbug nmbug.py $ python2 -W error -c "import nmbug; nmbug._hex_quote(u'D1B4DEBCAFFC4A05A4D4349A6EC5C9D8@\xd1\xe5\xf0\xe3\xe5\xe9-\xcf\xca')" Traceback (most recent call last): File "<string>", line 1, in <module> File "nmbug.py", line 106, in _hex_quote uppercase_escapes = _quote(string, safe) File "/usr/lib64/python2.7/urllib.py", line 1303, in quote return ''.join(map(quoter, s)) UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal The problem seems to be having Unicode characters in either quote argument: $ python2 -W error -c "import urllib; urllib.quote(u'D1B4DEBCAFFC4A05A4D4349A6EC5C9D8@\xd1\xe5\xf0\xe3\xe5\xe9-\xcf\xca')" … UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal $ python2 -W error -c "import urllib; urllib.quote(u'D1B4DEBCAFFC4A05A4D4349A6EC5C9D8@\xd1\xe5\xf0\xe3\xe5\xe9-\xcf\xca', u'+@=:,')" … UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal $ python2 -W error -c "import urllib; urllib.quote(u'D1B4DEBCAFFC4A05A4D4349A6EC5C9D8@\xd1\xe5\xf0\xe3\xe5\xe9-\xcf\xca'.encode('utf-8'), u'+@=:,')" … UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 33: ordinal not in range(128) $ python2 -W error -c "import urllib; print(urllib.quote(u'D1B4DEBCAFFC4A05A4D4349A6EC5C9D8@\xd1\xe5\xf0\xe3\xe5\xe9-\xcf\xca'.encode('utf-8'), u'+@=:,'.encode('utf-8')))" D1B4DEBCAFFC4A05A4D4349A6EC5C9D8@%C3%91%C3%A5%C3%B0%C3%A3%C3%A5%C3%A9-%C3%8F%C3%8A Related Python issues [1,2,3,4,5]. [2] lead to the currently working Python 3 implementation, which encodes to UTF-8 by default and has an ‘encoding’ option [6]. There's some useful background in [7]. For compatibility with Python 3, I suggest patching _hex_quote to take an encoding option, defaulting to UTF-8, and encoding both strings that are passed to _quote. We should probably raise a ValueError if the length of the encoded safe characters doesn't match the length of the Unicode safe characters, because the caller will probably not expect the byte-level quoting that would cause. Python 3 covers that by restricting the safe characters to ASCII [6], although passing non-ASCII characters with safe doesn't seem to raise an exception: $ python3 -c "from urllib.parse import quote; print(quote('\u0091', '\u0091'))" %C2%91 $ python3 -c "from urllib.parse import quote; print(quote('\u203b', '\u203b'))" %E2%80%BB Anyhow, I'll file a patch adding UTF-8 encoding so Python 2 works like Python 3. Cheers, Trevor [1]: http://bugs.python.org/issue2637 [2]: http://bugs.python.org/issue3300 [3]: http://bugs.python.org/issue22231 [4]: http://bugs.python.org/issue23885 [5]: http://bugs.python.org/issue1712522 [6]: https://docs.python.org/3/library/urllib.parse.html#urllib.parse.quote [7]: https://mail.python.org/pipermail/python-dev/2006-July/067335.html -- This email may be signed or encrypted with GnuPG (http://www.gnupg.org). For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-02-14 22:33 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-02-13 19:10 problems with nmbug and empty prefix David Bremner 2016-02-13 22:33 ` problems with nmbug and empty prefix (UnicodeWarning and broken pipe) W. Trevor King 2016-02-14 2:41 ` David Bremner 2016-02-14 6:31 ` W. Trevor King 2016-02-14 12:22 ` David Bremner 2016-02-14 22:33 ` W. Trevor King
Code repositories for project(s) associated with this public inbox https://yhetil.org/notmuch.git/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).