unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* problems with nmbug and empty prefix
@ 2016-02-13 19:10 David Bremner
  2016-02-13 22:33 ` problems with nmbug and empty prefix (UnicodeWarning and broken pipe) W. Trevor King
  0 siblings, 1 reply; 6+ messages in thread
From: David Bremner @ 2016-02-13 19:10 UTC (permalink / raw)
  To: notmuch


Currently nmbug doesn't seem to work with an empty prefix for me.

╭─ zancas:~ 
╰─% bash
bremner@zancas:~$ export NMBGIT=/tmp/nmbug
bremner@zancas:~$ export NMBPREFIX=""
bremner@zancas:~$ nmbug commit
/usr/lib/python2.7/urllib.py:1303: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  return ''.join(map(quoter, s))
Error flushing output: <fd:8>: Broken pipe
[u'notmuch', u'dump', u'--format=batch-tag', u'--', u'<censored>'] exited with 254

where <censored> is the complete list of tags in my database.

/tmp/nmbug is a previously cloned repo.

I'm not sure if this is because of some problem with a null prefix
specifically, or just the number of messages involved.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: problems with nmbug and empty prefix (UnicodeWarning and broken pipe)
  2016-02-13 19:10 problems with nmbug and empty prefix David Bremner
@ 2016-02-13 22:33 ` W. Trevor King
  2016-02-14  2:41   ` David Bremner
  0 siblings, 1 reply; 6+ messages in thread
From: W. Trevor King @ 2016-02-13 22:33 UTC (permalink / raw)
  To: David Bremner; +Cc: notmuch

[-- Attachment #1: Type: text/plain, Size: 1669 bytes --]

On Sat, Feb 13, 2016 at 03:10:16PM -0400, David Bremner wrote:
> bremner@zancas:~$ export NMBGIT=/tmp/nmbug
> bremner@zancas:~$ export NMBPREFIX=""
> bremner@zancas:~$ nmbug commit
> /usr/lib/python2.7/urllib.py:1303: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
>   return ''.join(map(quoter, s))
> Error flushing output: <fd:8>: Broken pipe
> [u'notmuch', u'dump', u'--format=batch-tag', u'--', u'<censored>'] exited with 254

I couldn't reproduce this in either Python 3.4.3 or 2.7.10.  It might
be your number-of-tags hypothesis, but the UnicodeWarning suggests an
encoding issue involving the dump output, which might mean that you
just have a strange tag.  Can you try again with:

  $ nmbug --log-level debug commit

which will give us the full traceback.

We only call ‘notmuch dump …’ from _index_tags, where dump's stdout is
tweaked and fed into ‘git update-index …’.  Your urllib UnicodeWarning
suggests the issue lies in:

  tags = [
      _unquote(tag[len(prefix):])
      for tag in tags_string.split()
      if tag.startswith(prefix)]

in which case it would be useful to try something like:

  tags = []
  for tag in tags_string.split():
      try:
          if tag.startswith(prefix):
              tags.append(_unquote(tag[len(prefix):]))
      except UnicodeWarning as error:
          raise ValueError('{!r} ({!r}, {})'.format(tag, prefix, error))

Cheers,
Trevor

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: problems with nmbug and empty prefix (UnicodeWarning and broken pipe)
  2016-02-13 22:33 ` problems with nmbug and empty prefix (UnicodeWarning and broken pipe) W. Trevor King
@ 2016-02-14  2:41   ` David Bremner
  2016-02-14  6:31     ` W. Trevor King
  0 siblings, 1 reply; 6+ messages in thread
From: David Bremner @ 2016-02-14  2:41 UTC (permalink / raw)
  To: W. Trevor King; +Cc: notmuch

"W. Trevor King" <wking@tremily.us> writes:

>
>   $ nmbug --log-level debug commit
>
> which will give us the full traceback.
>

Traceback (most recent call last):
  File "/home/bremner/.config/scripts/nmbug.real", line 834, in <module>
    args.func(**kwargs)
  File "/home/bremner/.config/scripts/nmbug.real", line 324, in commit
    status = get_status()
  File "/home/bremner/.config/scripts/nmbug.real", line 581, in get_status
    index = _index_tags()
  File "/home/bremner/.config/scripts/nmbug.real", line 621, in _index_tags
    git.stdin.write(line)

> We only call ‘notmuch dump …’ from _index_tags, where dump's stdout is
> tweaked and fed into ‘git update-index …’.  Your urllib UnicodeWarning
> suggests the issue lies in:
>
>   tags = [
>       _unquote(tag[len(prefix):])
>       for tag in tags_string.split()
>       if tag.startswith(prefix)]

Looking at the source for urllib, that line is actually in quote, which
is called only from _hex_quote

It turns out I only have a problem with python2.7; python3.4 completes
the commit. At a guess, that suggests a unicode problem of some kind.
Unfortunately despite my best efforts with filterwarnings, I couldn't
figure out how to get a stack trace for that UnicodeWarning.

d

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: problems with nmbug and empty prefix (UnicodeWarning and broken pipe)
  2016-02-14  2:41   ` David Bremner
@ 2016-02-14  6:31     ` W. Trevor King
  2016-02-14 12:22       ` David Bremner
  0 siblings, 1 reply; 6+ messages in thread
From: W. Trevor King @ 2016-02-14  6:31 UTC (permalink / raw)
  To: David Bremner; +Cc: notmuch

[-- Attachment #1: Type: text/plain, Size: 3331 bytes --]

On Sat, Feb 13, 2016 at 10:41:40PM -0400, David Bremner wrote:
> Traceback (most recent call last):
>   File "/home/bremner/.config/scripts/nmbug.real", line 834, in <module>
>     args.func(**kwargs)
>   File "/home/bremner/.config/scripts/nmbug.real", line 324, in commit
>     status = get_status()
>   File "/home/bremner/.config/scripts/nmbug.real", line 581, in get_status
>     index = _index_tags()
>   File "/home/bremner/.config/scripts/nmbug.real", line 621, in _index_tags
>     git.stdin.write(line)

This traceback is pointing at what should be a stream write, so I
don't see how urllib is involved there at all.  I guess this traceback
ends up in the “Broken pipe” message from your original post?

Dropping some debugging prints into the:

  for line in notmuch.stdout:

block will likely get us close enough to figure out which line in the
‘notmuch dump …’ output causing the problem.

> > We only call ‘notmuch dump …’ from _index_tags, where dump's stdout is
> > tweaked and fed into ‘git update-index …’.  Your urllib UnicodeWarning
> > suggests the issue lies in:
> >
> >   tags = [
> >       _unquote(tag[len(prefix):])
> >       for tag in tags_string.split()
> >       if tag.startswith(prefix)]
> 
> Looking at the source for urllib, that line is actually in quote,
> which is called only from _hex_quote

And we call _hex_quote from _index_tags_for_message, which is right
before the git.stdin.write line from your traceback.  So its certainly
possible that we're feeding _hex_quote something it can't handle in
Python 2.  If I could reproduce this locally, I'd probably drop a
debugging print in there as well:

  for tag in tags:
      _LOG.debug('building a quoted path for {!r} / {!r}'.format(id, tag))
      path = 'tags/{id}/{tag}'.format(
          id=_hex_quote(string=id), tag=_hex_quote(string=tag))
      yield '{mode} {hash}\t{path}\n'.format(mode=mode, hash=hash, path=path)

> Unfortunately despite my best efforts with filterwarnings, I
> couldn't figure out how to get a stack trace for that
> UnicodeWarning.

I haven't spent much time with filterwarnings.  My guess is that:

  $ python -W error ./nmbug --log-level debug commit

will turn it into a raised exception [1].  But you may have tried
that, and it may not have worked for some reason :p.

If dropping debugging prints into the relevant code sections doesn't
turn up the problem, ‘strace -o /tmp/trace -f nmbug --log-level debug
commit’ will likely capture enough of the data moving between
processes for us to figure out what nmbug is choking on.

Another alternative would be to check your list of censored tags for
anything that looks like it might contain Unicode-issue-triggering
characters.  What is your locale?  Do you have any tags with non-ASCII
characters?  You should be able to isolate this problem by iterating
through all your tags:

  $ for TAG in <censored>
  > do
  >   echo "${TAG}"
  >   NMBPREFIX="${TAG%?}" nmbug commit
  > done

and see which one acts up.

Cheers,
Trevor

[1]: https://docs.python.org/2/library/warnings.html#warning-filter

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: problems with nmbug and empty prefix (UnicodeWarning and broken pipe)
  2016-02-14  6:31     ` W. Trevor King
@ 2016-02-14 12:22       ` David Bremner
  2016-02-14 22:33         ` W. Trevor King
  0 siblings, 1 reply; 6+ messages in thread
From: David Bremner @ 2016-02-14 12:22 UTC (permalink / raw)
  To: W. Trevor King; +Cc: notmuch

"W. Trevor King" <wking@tremily.us> writes:


> On Sat, Feb 13, 2016 at 10:41:40PM -0400, David Bremner wrote:
>> Traceback (most recent call last):
>>   File "/home/bremner/.config/scripts/nmbug.real", line 834, in <module>
>>     args.func(**kwargs)
>>   File "/home/bremner/.config/scripts/nmbug.real", line 324, in commit
>>     status = get_status()
>>   File "/home/bremner/.config/scripts/nmbug.real", line 581, in get_status
>>     index = _index_tags()
>>   File "/home/bremner/.config/scripts/nmbug.real", line 621, in _index_tags
>>     git.stdin.write(line)

>
> This traceback is pointing at what should be a stream write, so I
> don't see how urllib is involved there at all.  I guess this traceback
> ends up in the “Broken pipe” message from your original post?

yes

>
>   for tag in tags:
>       _LOG.debug('building a quoted path for {!r} / {!r}'.format(id, tag))
>       path = 'tags/{id}/{tag}'.format(
>           id=_hex_quote(string=id), tag=_hex_quote(string=tag))
>       yield '{mode} {hash}\t{path}\n'.format(mode=mode, hash=hash, path=path)
>

I think the problem is not a bad tag, but a bad message-id. The last
line of output before the UnicodeWarning and the broken pipe is

building a quoted path for u'D1B4DEBCAFFC4A05A4D4349A6EC5C9D8@\xd1\xe5\xf0\xe3\xe5\xe9-\xcf\xca' / u'unread'

this corresponds to

      id:D1B4DEBCAFFC4A05A4D4349A6EC5C9D8@Ñåðãåé-ÏÊ

The original header looks like

    Message-ID: <D1B4DEBCAFFC4A05A4D4349A6EC5C9D8@Ñåðãåé-ÏÊ>

Visually I can't see what the encoding error might be, but I guess that
doesn't matter; we should be able to handle message-ids as opaque byte
strings.

d

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: problems with nmbug and empty prefix (UnicodeWarning and broken pipe)
  2016-02-14 12:22       ` David Bremner
@ 2016-02-14 22:33         ` W. Trevor King
  0 siblings, 0 replies; 6+ messages in thread
From: W. Trevor King @ 2016-02-14 22:33 UTC (permalink / raw)
  To: David Bremner; +Cc: notmuch

[-- Attachment #1: Type: text/plain, Size: 3912 bytes --]

On Sun, Feb 14, 2016 at 08:22:24AM -0400, David Bremner wrote:
> W. Trevor King writes:
> >   for tag in tags:
> >       _LOG.debug('building a quoted path for {!r} / {!r}'.format(id, tag))
> >       path = 'tags/{id}/{tag}'.format(
> >           id=_hex_quote(string=id), tag=_hex_quote(string=tag))
> >       yield '{mode} {hash}\t{path}\n'.format(mode=mode, hash=hash, path=path)
> >
> 
> I think the problem is not a bad tag, but a bad message-id. The last
> line of output before the UnicodeWarning and the broken pipe is
> 
> building a quoted path for u'D1B4DEBCAFFC4A05A4D4349A6EC5C9D8@\xd1\xe5\xf0\xe3\xe5\xe9-\xcf\xca' / u'unread'

  $ ln -s nmbug nmbug.py
  $ python2 -W error -c "import nmbug; nmbug._hex_quote(u'D1B4DEBCAFFC4A05A4D4349A6EC5C9D8@\xd1\xe5\xf0\xe3\xe5\xe9-\xcf\xca')"
  Traceback (most recent call last):
    File "<string>", line 1, in <module>
    File "nmbug.py", line 106, in _hex_quote
      uppercase_escapes = _quote(string, safe)
    File "/usr/lib64/python2.7/urllib.py", line 1303, in quote
      return ''.join(map(quoter, s))
  UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal

The problem seems to be having Unicode characters in either quote argument:

  $ python2 -W error -c "import urllib; urllib.quote(u'D1B4DEBCAFFC4A05A4D4349A6EC5C9D8@\xd1\xe5\xf0\xe3\xe5\xe9-\xcf\xca')"
  …
  UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  $ python2 -W error -c "import urllib; urllib.quote(u'D1B4DEBCAFFC4A05A4D4349A6EC5C9D8@\xd1\xe5\xf0\xe3\xe5\xe9-\xcf\xca', u'+@=:,')"
  …
  UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  $ python2 -W error -c "import urllib; urllib.quote(u'D1B4DEBCAFFC4A05A4D4349A6EC5C9D8@\xd1\xe5\xf0\xe3\xe5\xe9-\xcf\xca'.encode('utf-8'), u'+@=:,')"
  …
  UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 33: ordinal not in range(128)
  $ python2 -W error -c "import urllib; print(urllib.quote(u'D1B4DEBCAFFC4A05A4D4349A6EC5C9D8@\xd1\xe5\xf0\xe3\xe5\xe9-\xcf\xca'.encode('utf-8'), u'+@=:,'.encode('utf-8')))"
  D1B4DEBCAFFC4A05A4D4349A6EC5C9D8@%C3%91%C3%A5%C3%B0%C3%A3%C3%A5%C3%A9-%C3%8F%C3%8A

Related Python issues [1,2,3,4,5].  [2] lead to the currently working
Python 3 implementation, which encodes to UTF-8 by default and has an
‘encoding’ option [6].  There's some useful background in [7].  For
compatibility with Python 3, I suggest patching _hex_quote to take an
encoding option, defaulting to UTF-8, and encoding both strings that
are passed to _quote.  We should probably raise a ValueError if the
length of the encoded safe characters doesn't match the length of the
Unicode safe characters, because the caller will probably not expect
the byte-level quoting that would cause.  Python 3 covers that by
restricting the safe characters to ASCII [6], although passing
non-ASCII characters with safe doesn't seem to raise an exception:

  $ python3 -c "from urllib.parse import quote; print(quote('\u0091', '\u0091'))"
  %C2%91
  $ python3 -c "from urllib.parse import quote; print(quote('\u203b', '\u203b'))"
  %E2%80%BB

Anyhow, I'll file a patch adding UTF-8 encoding so Python 2 works like
Python 3.

Cheers,
Trevor

[1]: http://bugs.python.org/issue2637
[2]: http://bugs.python.org/issue3300
[3]: http://bugs.python.org/issue22231
[4]: http://bugs.python.org/issue23885
[5]: http://bugs.python.org/issue1712522
[6]: https://docs.python.org/3/library/urllib.parse.html#urllib.parse.quote
[7]: https://mail.python.org/pipermail/python-dev/2006-July/067335.html

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-02-14 22:33 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-13 19:10 problems with nmbug and empty prefix David Bremner
2016-02-13 22:33 ` problems with nmbug and empty prefix (UnicodeWarning and broken pipe) W. Trevor King
2016-02-14  2:41   ` David Bremner
2016-02-14  6:31     ` W. Trevor King
2016-02-14 12:22       ` David Bremner
2016-02-14 22:33         ` W. Trevor King

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).