From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Helmut Eller Newsgroups: gmane.emacs.devel Subject: Re: prior work on non-backtracking regex engine? Date: Mon, 08 Apr 2024 14:19:13 +0200 Message-ID: <87edbgt5by.fsf@gmail.com> References: <3a9IKoS2YLqJYosdfpFVdq8ashG0LPPJdB-ugdUgJEqM6-O3RWFeCu01FUPYBsp87xchkX-z1PRlNqJQm8ge_h3v0ziCWcME2fx-6PW-UP4=@hypnicjerk.ai> <87il1qw9i9.fsf@localhost> <8eHWNAHASGpm2Qhs7mldB0VTdFaRW0NZJWi9ppNiCPZQ-QO8E7YLUXzJfT7LXYe1mTnwFGACouy7YroB8qsCFDbRuWYHr2gvOTagQY3TW7w=@hypnicjerk.ai> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="8032"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Cc: Ihor Radchenko , "emacs-devel@gnu.org" To: Danny McClanahan Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Mon Apr 08 14:20:26 2024 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1rtnz6-0001rF-TM for ged-emacs-devel@m.gmane-mx.org; Mon, 08 Apr 2024 14:20:24 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rtny4-0000ds-Jq; Mon, 08 Apr 2024 08:19:20 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rtny3-0000dd-38 for emacs-devel@gnu.org; Mon, 08 Apr 2024 08:19:19 -0400 Original-Received: from mail-ed1-x530.google.com ([2a00:1450:4864:20::530]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rtny1-0006bm-Fl for emacs-devel@gnu.org; Mon, 08 Apr 2024 08:19:18 -0400 Original-Received: by mail-ed1-x530.google.com with SMTP id 4fb4d7f45d1cf-56e67286bf5so834861a12.0 for ; Mon, 08 Apr 2024 05:19:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1712578755; x=1713183555; darn=gnu.org; h=mime-version:user-agent:message-id:date:references:in-reply-to :subject:cc:to:from:from:to:cc:subject:date:message-id:reply-to; bh=2gtnf51JRqvvHUxOLNOSAOOlUPjLF0+n/MC11t2jMSY=; b=OAD9FqTeB/WuA9MRBmBzLN1wZnvKTrkaJVTNKnowsy2svBgnmmVzv9zTA/0YrGZSHg cdsDX71/2yxrgHyRFd4rPQfawoMUrSFHGPYWfUBU949QKHSnjIYK6SB+dNbuuMT/H0Ac WqrejiTuZch90k/mQEdxMvgGlg+3d9nhSHuag1ctHqbx8qcOwlAHEA8t5aYqL6/ZVC8+ 6FPaL8Qi6dgvTRo30Nui/tj0muPfXBNZLFFOgMOQI8ch0mtjivUPpymFVV+Q1yGYpq99 sddKPfkJCxBFwBrZ9CLU830YEfPO/uRQftJ9iZr5rvxwhEtxoMToY/yuSlXvPged/BRj tlHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712578755; x=1713183555; h=mime-version:user-agent:message-id:date:references:in-reply-to :subject:cc:to:from:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=2gtnf51JRqvvHUxOLNOSAOOlUPjLF0+n/MC11t2jMSY=; b=kaEPfMfG3+asn3981Wx75pKAl9AUppi5O9DpduJ1B+ZkmnZEPeThdRPsFlrfLqO/48 BInor2boAzLCKdbsqjihnjBLKKSHBKC4P3nA5MpPTSOmMeKCuMKc8Kulq8GtJa3sf97f /vcNHGBDPk6USAtZCjsueJbi+Ze2+kfP06JZm815wWai23G3NzaY7J3N5NVRTmvCoax2 XLdh6Ur69imPKy8PgPZxCV/2V0+cpzV1xlSufnCUjvbQMxQx2iFz2WTvKWKWTwp0+9rL exerPZHojt5bzAaezXeV8m+ATzCWqVcngLX7EXseOcOXdKCNIVzYFZkmHn8tVIloLD+m 1uQg== X-Forwarded-Encrypted: i=1; AJvYcCXEQIMDXLWDChye0oSFx6NeFq6DbY+n7Z+rUN4ca9N+cQhG9PeD4GXZy5E1XoQu4J6BO05e41NPQidcay1sTRrckx6m X-Gm-Message-State: AOJu0YzB7gTeyo5w2IBgzyN2tzZpoIdsN5X3Y1nkwOmQ5BXWMQgw888O vy1zr4I8wZ3d/QKjzb72Rrms2F3vz8v+vSwrgY4hf5jtxHj99hDBmInbLJ80 X-Google-Smtp-Source: AGHT+IHYbUJMnqqlcbnzRRFM4x8vwKBgT3IkIVt33Q6jG2TzRSzSN83zg2+5lsQsHYn5mmB3A5s/Zg== X-Received: by 2002:a17:907:2d13:b0:a51:b0ac:373f with SMTP id gs19-20020a1709072d1300b00a51b0ac373fmr7733820ejc.23.1712578755185; Mon, 08 Apr 2024 05:19:15 -0700 (PDT) Original-Received: from caladan (dialin-233189.rol.raiffeisen.net. [195.254.233.189]) by smtp.gmail.com with ESMTPSA id gs44-20020a1709072d2c00b00a46a27794f6sm4342310ejc.123.2024.04.08.05.19.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 08 Apr 2024 05:19:14 -0700 (PDT) In-Reply-To: <8eHWNAHASGpm2Qhs7mldB0VTdFaRW0NZJWi9ppNiCPZQ-QO8E7YLUXzJfT7LXYe1mTnwFGACouy7YroB8qsCFDbRuWYHr2gvOTagQY3TW7w=@hypnicjerk.ai> (Danny McClanahan's message of "Sun, 07 Apr 2024 04:42:13 +0000") Received-SPF: pass client-ip=2a00:1450:4864:20::530; envelope-from=eller.helmut@gmail.com; helo=mail-ed1-x530.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:317605 Archived-At: On Sun, Apr 07 2024, Danny McClanahan wrote: > And I was also *super* > pleased to see that regex-emacs.h itself doesn't expose any dependency > on the gap buffer or other internal emacs representations (except > regarding multibyte encoding). So in my amateur evaluation, emacs > actually seems very well-placed to take advantage of high-performance > regex engine techniques without any big structural changes. What's the history of regex-emacs.h? It seems like in the past it was regex.h from Gnulib. That would explain why the regex engine is relatively well decoupled from the rest. But it also leads to the question: why does Emacs no longer use Gnulib's regex engine? My guess is that it has something to do with the way Emacs's performs case-insensitive matches. Another complication may be that Gnulib doesn't support the non-greedy variants of some operators. It seems that these days Gnulib uses a DFA based algorithm when possible and falls back to backtracking for backrefs (and presumably for POSIX compatible sub-match rules). So one could argue that Gnulib already has a lot of what we want and would be the natural place to add a clean API for the features that Emacs needs. So does somebody know the details why Gnulib and Emacs went separate ways in the past? Helmut