From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by arlo.cworth.org (Postfix) with ESMTP id 37E2E6DE10A7 for ; Thu, 21 Feb 2019 11:11:54 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at cworth.org X-Spam-Flag: NO X-Spam-Score: -0.007 X-Spam-Level: X-Spam-Status: No, score=-0.007 tagged_above=-999 required=5 tests=[AWL=-0.006, SPF_PASS=-0.001] autolearn=disabled Received: from arlo.cworth.org ([127.0.0.1]) by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id zcBnhWJH5SVM for ; Thu, 21 Feb 2019 11:11:53 -0800 (PST) Received: from fethera.tethera.net (fethera.tethera.net [198.245.60.197]) by arlo.cworth.org (Postfix) with ESMTPS id 12E246DE109F for ; Thu, 21 Feb 2019 11:11:52 -0800 (PST) Received: from remotemail by fethera.tethera.net with local (Exim 4.89) (envelope-from ) id 1gwtl8-0001TD-BJ for notmuch@notmuchmail.org; Thu, 21 Feb 2019 14:11:50 -0500 Received: (nullmailer pid 16368 invoked by uid 1000); Thu, 21 Feb 2019 19:11:48 -0000 From: David Bremner To: notmuch@notmuchmail.org Subject: locales and notmuch X-List-To: notmuch Date: Thu, 21 Feb 2019 15:11:48 -0400 Message-ID: <8736ohard7.fsf@tethera.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Feb 2019 19:11:54 -0000 --=-=-= Content-Type: text/plain So I've been revisiting the "user defined headers" [1] patches. I need the in $ notmuch config set index.header. "blah" to be unique case-insensitively, so I decided to convert them to lower case on input. This turns out to be "fun", if we try to handle things other than ASCII. So one option is to just insist prefixes are ASCII. Otherwise we could insist they are UTF-8, ignoring the locale. The fullest generality (I think) is to first convert from the users locale to utf8, as in the attached sample program. The gotcha is that the call to setlocale is necessary, and can't really be local to string utility function. So we'd have to add that to notmuch startup. We mostly ignore locales, so I guess there shouldn't be too much side effects; otoh I don't have much experience with locales. So what do people think? ASCII? UTF-8? Locale sensitivitie? [1] id:20181117140901.1870-1-david@tethera.net --=-=-= Content-Type: text/x-csrc; charset=utf-8 Content-Disposition: inline; filename=test.c Content-Transfer-Encoding: quoted-printable #include #include #include #include int main (int argc, char **argv) { gchar *utf8_str, *lc_str; GError *err =3D NULL; setlocale(LC_ALL,""); utf8_str =3D g_locale_to_utf8 ("Sn=E2=98=83man",-1,NULL,NULL,&err); if (!utf8_str) { fprintf(stderr, "%s\n", err->message); abort(); } lc_str =3D g_utf8_strdown (utf8_str, -1); printf ("%s\n", lc_str); } --=-=-=--