From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by arlo.cworth.org (Postfix) with ESMTP id 27E1C6DE12D6 for ; Wed, 19 Jun 2019 12:52:54 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at cworth.org X-Spam-Flag: NO X-Spam-Score: -0.05 X-Spam-Level: X-Spam-Status: No, score=-0.05 tagged_above=-999 required=5 tests=[AWL=-0.049, SPF_PASS=-0.001] autolearn=disabled Received: from arlo.cworth.org ([127.0.0.1]) by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 6lQKkjF8Q4Y6 for ; Wed, 19 Jun 2019 12:52:53 -0700 (PDT) Received: from fethera.tethera.net (fethera.tethera.net [198.245.60.197]) by arlo.cworth.org (Postfix) with ESMTPS id DED066DE0F58 for ; Wed, 19 Jun 2019 12:52:52 -0700 (PDT) Received: from remotemail by fethera.tethera.net with local (Exim 4.89) (envelope-from ) id 1hdgdR-0002KC-6B; Wed, 19 Jun 2019 15:52:45 -0400 Received: (nullmailer pid 5669 invoked by uid 1000); Wed, 19 Jun 2019 19:52:40 -0000 From: David Bremner To: Daniel Kahn Gillmor , notmuch@notmuchmail.org Subject: Re: locales and notmuch In-Reply-To: <87a7edzppt.fsf@fifthhorseman.net> References: <8736ohard7.fsf@tethera.net> <87a7edzppt.fsf@fifthhorseman.net> Date: Wed, 19 Jun 2019 21:52:40 +0200 Message-ID: <87y31xs67r.fsf@tethera.net> MIME-Version: 1.0 Content-Type: text/plain X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Jun 2019 19:52:54 -0000 Daniel Kahn Gillmor writes: > (sorry for the late reply to this thread) > > On Thu 2019-02-21 15:11:48 -0400, David Bremner wrote: >> to be unique case-insensitively, so I decided to convert them to lower >> case on input. This turns out to be "fun", if we try to handle things >> other than ASCII. So one option is to just insist prefixes are ASCII. >> > I have no problem with asserting that all character strings in the > notmuch database are UTF-8. That's just the only sane thing to do in > 2019. But if we build any feature into notmuch that makes assumptions > or requirements about upper-casing, lower-casing, or collating strings, > and that feature interacts between the currently-running locale and > whatever locale was used to store data in the the database in the past, > and those locales can differ, we may be inflicting some subtle pain on > users. I eventually settled on 4b9c03efc, which will probably do strange thing to people who define non-ascii prefix names in non-utf8 locales. I'm OK atm with just saying that is unsupported. d