From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by arlo.cworth.org (Postfix) with ESMTP id 60AEA6DE0FC5 for ; Tue, 16 Feb 2016 09:56:49 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at cworth.org X-Spam-Flag: NO X-Spam-Score: 0.006 X-Spam-Level: X-Spam-Status: No, score=0.006 tagged_above=-999 required=5 tests=[AWL=0.107, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=disabled Received: from arlo.cworth.org ([127.0.0.1]) by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id yVnHj204P_0S for ; Tue, 16 Feb 2016 09:56:46 -0800 (PST) Received: from resqmta-po-07v.sys.comcast.net (resqmta-po-07v.sys.comcast.net [96.114.154.166]) by arlo.cworth.org (Postfix) with ESMTPS id 493176DE0274 for ; Tue, 16 Feb 2016 09:56:45 -0800 (PST) Received: from resomta-po-02v.sys.comcast.net ([96.114.154.226]) by resqmta-po-07v.sys.comcast.net with comcast id K5w71s0034tLnxL015wk9g; Tue, 16 Feb 2016 17:56:44 +0000 Received: from mail.tremily.us ([73.221.72.168]) by resomta-po-02v.sys.comcast.net with comcast id K5wj1s0013dr3C9015wj9s; Tue, 16 Feb 2016 17:56:43 +0000 Received: by mail.tremily.us (Postfix, from userid 1000) id 584F31BB68B5; Tue, 16 Feb 2016 09:56:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tremily.us; s=odin; t=1455645402; bh=TEP8Sw9j07KEJgNbC1J165nSutgHUfgYtmUFJc8IFxU=; h=Date:From:To:Cc:Subject:References:In-Reply-To; b=rj/Iby0ky+nNCncfscnlHBxaOFRhavBB80gpk2yzPCMl9rkPWU0bd4dkxmjk9SjZ+ saRZ5mkwMrcGpUQxpDdXFWQPwrlAE4+B44OyNT+GX8+/VnCQCNsTR0ZRF+D0Td9dE1 nidhiz4wb5I4JyUsWRi7bhIjGTUKK8WamHGRsacM= Date: Tue, 16 Feb 2016 09:56:42 -0800 From: "W. Trevor King" To: David Bremner Cc: notmuch@notmuchmail.org Subject: Re: [PATCH] nmbug: Allow Unicode tags and IDs in Python 2 Message-ID: <20160216175641.GN4265@odin.tremily.us> References: <87lh6kvmbc.fsf@zancas.localnet> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="b1CVx77D595wdcW8" Content-Disposition: inline In-Reply-To: <87lh6kvmbc.fsf@zancas.localnet> OpenPGP: id=39A2F3FA2AB17E5D8764F388FC29BDCDF15F5BE8; url=http://tremily.us/pubkey.txt User-Agent: Mutt/1.5.23 (2014-03-12) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20140121; t=1455645404; bh=GNHISTXPs64gA/u2kwNoMXs4UYForBVSixUbFcETY60=; h=Received:Received:Received:Date:From:To:Subject:Message-ID: MIME-Version:Content-Type; b=GNWDE72k34WFrwBeh149DZ/SYogshv7ydzZ0DM0gC9XO72mBtbwWl0QQK7vLtKLvx GUiVzdGDCYuMgEkOuKqab6Qz1r2T0psLO8hDZqpAJNr1qeLr3Xpkrr7hZRbeJp03+O 2eI28dzwpWdXmJqBI1dppjUD5lUAiH0PAHwgPjBOHgROTuhxYf6CKFRN+jPjZ1g82F 6yMgtfOkeaBuJMJdiCkCYYev1WKu3dTzyXL4vsrXY4vB5jVnMVDlToGdv+S/rCEwZt BKrXw1N+gtwQaJibVmj+5/4bkY4tReXnAbzX/KK7SnAzWdpxoqC4WWO1AxjizmBx0B yeiYueV7krl4g== X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Feb 2016 17:56:49 -0000 --b1CVx77D595wdcW8 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Feb 16, 2016 at 09:04:07AM -0400, David Bremner wrote: > W. Trevor King writes: > > Coercing to UTF-8 (regardless of locale) gives us consistent tag > > IDs for sharing between users. >=20 > I'm not sure what "tag IDs" are. Do you mean message-ids here? or "tags > and IDs"? Yeah. I'll fix that in v2. > At first I thought there might be problems with non-utf8 message-ids, > but that turns out not to be the case [1]. It seems like it would take > a fairly heroic effort to get non-UTF8 tags into the database (perhaps > by calling the library interface with bad strings?) so we can probably > ignore this case. It might be good to document the limitation though, > since AFAIK, dump and restore can roundtrip any old crap. How about in a NEWS entry in v2 of this series, and then echoing that NEWS entry in the notmuch-dtags (or whatever) man page once I work up that series? > > The 'isnumeric' check identifies Unicode instances in both Python > > 2 [9] and Python 3 [10]. >=20 > I still haven't really tried to understand this part, but probably > it deserves inline documentation. It's just =E2=80=9Cif you have a Unicode instance (str in Python 3, unicode= in Python 2), convert it to bytes (bytes in Python 3, str in Python 2). Only Unicode instances will have an =E2=80=98isnumeric=E2=80=99 method, so = it's a convenient marker for switching that logic. I'll add a =E2=80=9Cconvert fr= om Unicode if necessary=E2=80=9D comment to v2. > > I haven't checked the other commands for issues with Unicode IDs > > or tags. It's possible that in addition to this explicit encoding > > to UTF-8, we'll also want explicit decoding from UTF-8 when > > reading from Git trees (for 'nmbug checkout' and 'nmbug status'). >=20 > Yes, this seems to be a problem, with the patch applied I can > commit, but the same utf-8 message-id causes problems. Ugh. Thanks for checking. I'll try to fix all the places where this would have an impact in v2 of this series. Cheers, Trevor --=20 This email may be signed or encrypted with GnuPG (http://www.gnupg.org). For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy --b1CVx77D595wdcW8 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBAgAGBQJWw2LXAAoJEAPqygegUbGsRC8P/2Ao8dkv4CzkN95Fr6k5t/yR 3Y231DolowEHfS11uHE7JwD0XuQrM7K/Z8orm692Q8QaoNb6iAuf3KvtIoJKUWZB wJc4Fhv8MDNzeyNmCtLMGZw3u/Cm5OTL513c154qUtcrkO6WoQWknKQqzNLViamr 5zWzD/w0tW8dZHJWWntOHyz73mbC0E7Ib4UoEUJ06BWJKCel1qv8TumtsDr0sh+e F7mNMyZwnE85LHOePBPkwNedtjOq4fQ9xfma5moN3rZU3owYXkJUXcG8TqCzm8X4 kJaSUZv8B9ig9uyS7yBN9Gp7B4EMuKrI8k2PM/sDv9p/IncHsGZPCqf/skeoBHUg bnoget1HjpOiTjEau4gB+DPGrxEOCHbzt50USM12/6vIRVhQmWKZcV8kLA5jtA+V 91dEagZjMWavxXUr07E3YP4dzo3PH/vsPCLA5aaJVpbiIEIq/xm/J7QHkVTro66E sCpag5SjFDk3lkN4cvmDBWRF/VzT58qbQ+NM1nMg4Ydfiu+mmXZSe907EnBweRQd ffzoQpN7rubP4QLpVrVAr/kHB6sXYNJOMSn7SS4Dul5bLQOwk7LeqZZEQjA6B8Cb MeSXwIfJuoY+rnSdeaTFjvtF5c6Ri85ptQMpPnShA3u66ym9k8gDOINQOXsr/Max Sdg9K0czco6hXUfIcZN5 =bRRC -----END PGP SIGNATURE----- --b1CVx77D595wdcW8--