From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Lars Ingebrigtsen Newsgroups: gmane.emacs.devel Subject: Re: Make regexp handling more regular Date: Thu, 03 Dec 2020 09:31:56 +0100 Message-ID: <87ft4nz3wj.fsf@gnus.org> References: <87lfeg60iy.fsf@gnus.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="19664"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux) Cc: emacs-devel@gnu.org To: Stefan Kangas Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Thu Dec 03 09:32:55 2020 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1kkk2n-0004xW-6o for ged-emacs-devel@m.gmane-mx.org; Thu, 03 Dec 2020 09:32:53 +0100 Original-Received: from localhost ([::1]:46708 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kkk2m-0000nK-9v for ged-emacs-devel@m.gmane-mx.org; Thu, 03 Dec 2020 03:32:52 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:41610) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kkk26-0000Ev-9C for emacs-devel@gnu.org; Thu, 03 Dec 2020 03:32:10 -0500 Original-Received: from quimby.gnus.org ([2a01:4f9:2b:f0f::2]:60542) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kkk24-0001V5-FP for emacs-devel@gnu.org; Thu, 03 Dec 2020 03:32:10 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnus.org; s=20200322; h=Content-Transfer-Encoding:Content-Type:MIME-Version:Message-ID :In-Reply-To:Date:References:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=eYnN8umSFBcMfrWXfp5a0fobawysN7skHDP4jw4GmlU=; b=d4+sBIJ6y86MdKmH5vZhZ2qVwY LiP9KG39oLGahaSiVb4TXl81pNNn56RRtrZrUJ5naqYI0AMn3VkLqqim38fxVwJAx8jWEJA+Zt8BR RWBIJQgd8o5TCWbqUnoxtQ0WAHzV8ol5zjK7V+cx1W8K2lDSockEIgQTl6Kxl8dG9syM=; Original-Received: from cm-84.212.202.86.getinternet.no ([84.212.202.86] helo=xo) by quimby.gnus.org with esmtpsa (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1kkk1v-0000te-Cl; Thu, 03 Dec 2020 09:32:04 +0100 Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwAgMAAAAqbBEUAAAABGdBTUEAALGPC/xhBQAAACBj SFJNAAB6JgAAgIQAAPoAAACA6AAAdTAAAOpgAAA6mAAAF3CculE8AAAADFBMVEW6lGvVvpOJTTv/ //8WfddaAAAAAWJLR0QDEQxM8gAAAAd0SU1FB+QMAwgHLgvAe2sAAAFySURBVCjPTdDBatwwEAbg 3wKHrU9pQHc71ODVU4glAScnx0h72FN7CJQ8hSirHnx3z17IgjJP2RnZpdFFfGik+UcoPS9HdOEN WnAk3XqUG5JvjrQAZfmgb5J/HYlgPuK385/UmxOj6WP7+46InhldE5v7UZt9L2WX2Pw8zi+vfUXY 97F5Gq++tSqjZbhoIa918el0dbNFxX0G/cJl2kLBKgCztt7y/sYtKPqDzkjmFy1+rAvGV845FI9D 3aDIqe0Xh7baRniAQ6ECMhxkbRgKeWpZoaK1XegynOIs3bJiqGZ7a5ZqRT0fBhOqAP432+qTpTCt iNqAAn8Vw8yUHAWTMvQb8cxVQuGd8c80MUiw97aaFijBsPelpR2Q8cEj1zuoBPjh4r+Ds6hFcPU/ wFUyUDkmX2PXgeMABzqHe+rkDwXLZCjBrMhL3gB3Jg5AuPyDnLwL1IrD+RPodiszAvvpTr6/Iqnp P4IiSYN8ARvkJGX8BePRrLauUHjHAAAAJXRFWHRkYXRlOmNyZWF0ZQAyMDIwLTEyLTAzVDA4OjA3 OjQ2KzAwOjAwa8IcYwAAACV0RVh0ZGF0ZTptb2RpZnkAMjAyMC0xMi0wM1QwODowNzo0NiswMDow MBqfpN8AAAAASUVORK5CYII= X-Now-Playing: The Contortions's _Buy_: "Anesthetic" In-Reply-To: (Stefan Kangas's message of "Wed, 2 Dec 2020 05:12:25 -0600") Received-SPF: pass client-ip=2a01:4f9:2b:f0f::2; envelope-from=larsi@gnus.org; helo=quimby.gnus.org X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:260205 Archived-At: Stefan Kangas writes: > I like the idea of adding an entirely new built-in API based on the > current state of the art. I would begin such a project by looking into > what other Lisps are doing, such as CL, Clojure, Guile and Racket. Why > shouldn't Emacs Lisp be best-in-class? Sure. Common Lisp doesn't have regexps, but (some) implementations do, and there's a bunch of libraries, like http://edicl.github.io/cl-ppcre/ I'm not much in favour: * (scan "(a)*b" "xaaabd") 1 5 #(3) #(4) * (let ((s (create-scanner "(([a-c])+)x"))) (scan s "abcxy")) 0 4 #(0 2) #(3 3) And since it's Common Lisp, of course you have special forms for destructing:=20 * (register-groups-bind (first second third fourth) ("((a)|(b)|(c))+" "abababc" :sharedp t) (list first second third fourth)) ("c" "a" "b" "c") Guile: https://www.gnu.org/software/guile/manual/html_node/Regexp-Functions= .html (string-match "[0-9][0-9][0-9][0-9]" "blah2002") =E2=87=92 #("blah2002" (4 . 8)) (map match:substring (list-matches "[a-z]+" "abc 42 def 78")) =E2=87=92 ("abc" "def") Clojure: https://purelyfunctional.tv/mini-guide/regexes-in-clojure/ (re-matches #"abc(.*)" "abcxyz") ["abcxyz" "xyz"] I.e., if there's one match, we return the match substring, otherwise an array. It's nice in one way, but the cleverness leads to errors when (re-)writing code. (subs (re-matches #"[a-z]+" "fooo baar") 3) but then you add some more and you have to rewrite to something like: (let [[_ s1 s2] (re-matches #"([a-z]+) ([a-z]+)" full-name)] (subs s1 3)) I hate that. The thing that makes looking at other languages here slightly less useful is that Emacs has buffers. We're often not interested in the (sub-)matches themselves at all, but instead their buffer positions (i.e., match-beginning/end). > As for naming, how about just using a short prefix such as "re-"? > AFAICT, we currently have only five functions using that prefix. Sure. > Tangentially, I have always been wondering if its feasible to add a new > regular expression type to `read' where you don't have to incessantly > double quote all special characters. (One could take inspiration from > Python, for example, which adds an "r" character to strings to turn them > into regexps: r"regexp".) I'm all for adding a regexp object type (and a new read syntax), but I think it's a somewhat orthogonal? Not totally, though: I've long wished for match/searching functions to be generic, and work differently on strings and regexps. That is, if fed a string, then do comparison with `string-equal' and when fed a regexp, do the comparison with `string-match'. So you could say (search-forward "foo") and (search-forward #r"fo+") or (search-forward (re-make "fo+")) -- no reason for there to be separate functions if we have regexp objects. --=20 (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no