From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Dima Kogan Newsgroups: gmane.emacs.devel Subject: Embedded modifiers in the regex engine Date: Wed, 24 Feb 2016 17:32:01 -0800 Message-ID: <87ziupinhq.fsf@secretsauce.net> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1456363952 12707 80.91.229.3 (25 Feb 2016 01:32:32 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 25 Feb 2016 01:32:32 +0000 (UTC) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Feb 25 02:32:22 2016 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1aYkn0-0001ln-AK for ged-emacs-devel@m.gmane.org; Thu, 25 Feb 2016 02:32:22 +0100 Original-Received: from localhost ([::1]:39438 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aYkmz-0001V6-Hp for ged-emacs-devel@m.gmane.org; Wed, 24 Feb 2016 20:32:21 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:43390) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aYkmm-0001Uk-Jb for emacs-devel@gnu.org; Wed, 24 Feb 2016 20:32:09 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aYkmi-0002Or-It for emacs-devel@gnu.org; Wed, 24 Feb 2016 20:32:08 -0500 Original-Received: from out5-smtp.messagingengine.com ([66.111.4.29]:36536) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aYkmi-0002Nv-ET for emacs-devel@gnu.org; Wed, 24 Feb 2016 20:32:04 -0500 Original-Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id 2326E2B0C3 for ; Wed, 24 Feb 2016 20:32:03 -0500 (EST) Original-Received: from frontend2 ([10.202.2.161]) by compute4.internal (MEProxy); Wed, 24 Feb 2016 20:32:03 -0500 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=secretsauce.net; h=content-type:date:from:message-id:mime-version:subject:to :x-sasl-enc:x-sasl-enc; s=mesmtp; bh=GGuAX6xsE632yjDawGWCIPAdnO4 =; b=dAoAdtEIRPD+zJSIdQ+KcXQnGDKpThZSi+zQCmoKu3/PENrVp2khA5uwW6s QTNJRbGZQen0F60lv8mVo/5zuDqRX7TvUixlehBiP7ejwVI4LePMqbvwSFwhb65X jqcxByCf9w0DFwFfI68Du+Xc+PzpmkxmdLXzfO419uNha4D0= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= messagingengine.com; h=content-type:date:from:message-id :mime-version:subject:to:x-sasl-enc:x-sasl-enc; s=smtpout; bh=GG uAX6xsE632yjDawGWCIPAdnO4=; b=D6gU9BR/0esBSlBok+ps30StIqyw/scB8P 5FLGgEWERwvLvWEUJ8fxwAig4Aa9ARDUvkrXYpdCQ+KcrGXvfltiRe3UqGU1z2KM O8lW8DMrbidhFwPcu++ePYCo+UItnRuCYgo1dHBIEjTAIT+vZObZSkCs/JzO8LdD XSUTYlUI8= X-Sasl-enc: mFMbJu5RzmJVlbv02r38Y49SKmEe8aGftB8J16okAflK 1456363922 Original-Received: from shorty.local (50-1-153-216.dsl.dynamic.fusionbroadband.com [50.1.153.216]) by mail.messagingengine.com (Postfix) with ESMTPA id AC3786801A6 for ; Wed, 24 Feb 2016 20:32:02 -0500 (EST) Original-Received: from ip6-localhost ([::1] helo=shorty) by shorty.local with esmtp (Exim 4.84) (envelope-from ) id 1aYkmf-0007Lr-Fw for emacs-devel@gnu.org; Wed, 24 Feb 2016 17:32:01 -0800 User-agent: mu4e 0.9.11; emacs 25.0.90.1 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 66.111.4.29 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:200647 Archived-At: Hi. I've been thinking of ways to make some fancier aspects of isearch and hi-lock work better, specifically, the way we handle the different modes: case-fold, char-fold, lax-whitespace, etc. The relevant bugs I filed recently: http://debbugs.gnu.org/22541 http://debbugs.gnu.org/22520 http://debbugs.gnu.org/22479 In short, different parts of emacs (isearch, isearch history, hi-lock, etc) treat these modes inconsistently, which results in unexpected behavior. The best solution I can think of to clean this up is also the most intrusive: adding support for pcre-style embedded modifiers to activate/deactivate the modes. So for instance "\\(?i\\)asdf" would be interpreted as a case-folding regex regardless of the value of case-fold-search. I think this would be a great thing to have in general, but for the specific issues in the bugs above, it'd make things simpler and more correct. As an example, currently hi-lock generates a complicated-looking regex to emulate char-folding and case-folding. If we supported the modifiers, this change would simply be a prepend of "\\(?i\\)" or whatever other modes we want. This is simple and expected to be bug-free on the hi-lock level. Bugs such as hi-lock not supporting char-fold and case-fold at the same time would not happen. Clearly this is a big change to a core component, so I want to talk about it first. I looked at our regex implementation, and it looks possible to add this. But I've seen talk of merging our regex implementation with the glibc one, so the merge should clearly happen first. Also I don't want to touch this without a test suite for our regex engine. So that would need to happen beforehand as well. Again, I think this feature would be useful even beyond the context of these bugs. Thanks for the input. dima