From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by arlo.cworth.org (Postfix) with ESMTP id CC63E6DE0C45 for ; Wed, 21 Aug 2019 07:38:10 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at cworth.org X-Spam-Flag: NO X-Spam-Score: -0.053 X-Spam-Level: X-Spam-Status: No, score=-0.053 tagged_above=-999 required=5 tests=[AWL=-0.052, SPF_PASS=-0.001] autolearn=disabled Received: from arlo.cworth.org ([127.0.0.1]) by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 346YFm2Z-QUw for ; Wed, 21 Aug 2019 07:38:09 -0700 (PDT) Received: from fethera.tethera.net (fethera.tethera.net [198.245.60.197]) by arlo.cworth.org (Postfix) with ESMTPS id C816A6DE096A for ; Wed, 21 Aug 2019 07:38:09 -0700 (PDT) Received: from remotemail by fethera.tethera.net with local (Exim 4.89) (envelope-from ) id 1i0RkW-00033X-0j; Wed, 21 Aug 2019 10:38:08 -0400 Received: (nullmailer pid 31808 invoked by uid 1000); Wed, 21 Aug 2019 14:38:07 -0000 From: David Bremner To: "yury.t" , notmuch@notmuchmail.org Subject: Re: regex [X-Z] with non-ascii char returns different results from (X|Y|Z) In-Reply-To: References: Date: Wed, 21 Aug 2019 11:38:07 -0300 Message-ID: <878srmk2i8.fsf@tethera.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 Aug 2019 14:38:10 -0000 "yury.t" writes: > Some regular expression returns incorrect results if the pattern > contains multibyte characters in square brackets. The following > bracket expression matches subjects not starting with `[=EF=BC=91-=EF=BC= =99]` and > returns more results than the parenthesis expression. We rely on POSIX.2 regex functions (regcomp, regexec). I would be interested to know if the searches you are interested in work in a standalone C program using regcomp and regexec. d