From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Peter Neidhardt Newsgroups: gmane.emacs.devel Subject: Re: rx.el sexp regexp syntax (WAS: Off Topic) Date: Fri, 25 May 2018 22:35:06 +0200 Message-ID: <87h8mvh3t1.fsf@gmail.com> References: <87h8mw3yoc.fsf@gmail.com> <20180525155126.GA4096@ACM> <87lgc7hebk.fsf@gmail.com> <20180525181710.GC4096@ACM> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" X-Trace: blaine.gmane.org 1527284227 18337 195.159.176.226 (25 May 2018 21:37:07 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Fri, 25 May 2018 21:37:07 +0000 (UTC) User-Agent: mu4e 1.0; emacs 26.1 Cc: van@scratch.space, eliz@gnu.org, emacs-devel@gnu.org, rms@gnu.org, Noam Postavsky To: Alan Mackenzie Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri May 25 23:37:03 2018 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fMKOT-0004dh-H5 for ged-emacs-devel@m.gmane.org; Fri, 25 May 2018 23:37:01 +0200 Original-Received: from localhost ([::1]:46791 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fMKQa-000069-EZ for ged-emacs-devel@m.gmane.org; Fri, 25 May 2018 17:39:12 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:40765) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fMJQh-0001Xt-T2 for emacs-devel@gnu.org; Fri, 25 May 2018 16:35:17 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fMJQe-0006x2-NO for emacs-devel@gnu.org; Fri, 25 May 2018 16:35:15 -0400 Original-Received: from mail-wr0-x232.google.com ([2a00:1450:400c:c0c::232]:36813) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fMJQe-0006vP-Dw; Fri, 25 May 2018 16:35:12 -0400 Original-Received: by mail-wr0-x232.google.com with SMTP id k5-v6so11189659wrn.3; Fri, 25 May 2018 13:35:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20161025; h=from:references:user-agent:to:cc:subject:in-reply-to:date :message-id:mime-version; bh=gnJ6BCl5qRW+akbkEODMGACR0EXm05XvcSTzY2CDNYQ=; b=vBn60Im6QnWrREnOspRXQVGO1Ij+d/b/eIHF3IZ69tk7/ixULIAVDj+Ze43Oi1f/Cc 7zxT3rc3foywG8qDGCkAAcMcwxEPwrO9+futJZPlsigwgSXXnBcjHkmrcqwE+rEDP54Z XeiTHgQtyfG7XdlP0YrwtuKnroJN1ZcjzqVcLCim5PiLdDmjQW7+SwllWW8eJld+lqM6 Fo5JycNVRZhoNDS6V+l9loDSVJ5jqf8c9gkWEP2CiEjhfyOuKF5kvKHJQ69PDeQEvJ/6 cMdW1n/vtgBsa1qCjlVNQPEV6OAbCEK5Q6b54+AqCMjv9mDidW1vOlqtBtjO2gRGS8Dd Iblg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:references:user-agent:to:cc:subject :in-reply-to:date:message-id:mime-version; bh=gnJ6BCl5qRW+akbkEODMGACR0EXm05XvcSTzY2CDNYQ=; b=GtGcwhUoexXpkeHd3WRz0I/KOZzLicV72DcK5gjd1aM+fFp9sYdt0rCt9g9vH8TL2s eMSw5lW+a1I8Z/7y1O6E17uN92/mCHOfoiff6cvmIibvLj2le/HiUTSzeTZySsaoKL9f 85QYMIX3a2fl/pgsZ0ExnCRncuz3np82pD8D51sDSy3mPGvRntYIrJ4OK94YMmagBkUJ 9qYFHYMUnQKK9yqkFgYuQKa6K0eUyURDI3wjc+SlwnTcHf12cDwfYgtq0q6c48KJAYcS fxXXX3dl3dH89Wvtj7rMjFLLPNMnPlwYs9hdSB0pfqHgRp5lxAamrS/SUE1U8MVspAUD PrLg== X-Gm-Message-State: ALKqPweY9qhExYyGsbl6DTlVgR4O0Y3a+Tm5K2QMIi3BVx5FlYVPClcR 3OIGCXTeKsFGgXFEb2VbjUt/ICFT X-Google-Smtp-Source: AB8JxZob2yoSaGFiQOItBZ69gnfhLtanvIuuXVK4/CkDtfAj1Ic1+HeEhMCXBfOEbs+ACKhdwlrI7w== X-Received: by 2002:adf:b3d7:: with SMTP id x23-v6mr3399792wrd.142.1527280510706; Fri, 25 May 2018 13:35:10 -0700 (PDT) Original-Received: from mimimi (87-89-234-173.abo.bbox.fr. [87.89.234.173]) by smtp.gmail.com with ESMTPSA id y6-v6sm8075826wmy.39.2018.05.25.13.35.07 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 25 May 2018 13:35:08 -0700 (PDT) X-Google-Original-From: Peter Neidhardt In-reply-to: <20180525181710.GC4096@ACM> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2a00:1450:400c:c0c::232 X-Mailman-Approved-At: Fri, 25 May 2018 17:37:22 -0400 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:225728 Archived-At: --=-=-= Content-Type: text/plain Alan Mackenzie writes: > It may be part of the explanation. But more salient, I think, is that > hackers prefer powerful means of expression. A single character in a > string regexp has the power of a sexp in the corresponding rx regexp. > Paul Graham (at http://www.paulgraham.com) has had quite a bit to say > about this in the (distant) past. Conciseness of expression is where > it's at. I think you are referring to this article: http://paulgraham.com/ineq.html > Another easy test is the number of characters in a program, but this > is not very good either; some languages (Perl, for example) just use > shorter identifiers than others. > > I think a better measure of the size of a program would be the number > of elements, where an element is anything that would be a distinct > node if you drew a tree representing the source code. The name of a > variable or function is an element; an integer or a floating-point > number is an element; a segment of literal text is an element; an > element of a pattern, or a format directive, is an element; a new > block is an element. There are borderline cases (is -5 two elements or > one?) but I think most of them are the same for every language, so > they don't affect comparisons much. With this definition, rx and regexp have the same length (except for `eval'). "Conciseness in characters" is not what Paul Graham was referring to. Others might think differently, for instance those who prefer Perl to Lisp. In the end this is all what it boils down to: the "Unix" hacker culture vs. the Lisp one. The Unix tradition has long spread the use of acronyms and and shortcuts. Lisp on the other hand (espcecially Scheme) put a lot of emphasis on explicit full names. My opinion is that acronyms and shortcuts were mostly useful in the era of teletypes and limited terminals and shells. Now we have completion and fuzzy-search, for which explicit full names not only make sense but are necessary. (It's much more intuitive to search for "string compare" in Emacs Lisp than "str cmp" in C.) In the end, rx vs. regexp reflects the same mindset difference. >> Have you used rx? > > No. Neither have I used Cobol (much). Cobol is not very relevant, let's focus on the discussion here. Try using rx on some midly complex regular expressions, it could be insightful for this discussion. > You seem to want to increase the readability for beginners, for people > who have laboriously to slog through an expression trying to make sense > of each bit of it. I don't think experienced regexp users have > difficulty with the syntax. I don't, for one. > > There was a time when people thought that > > ADD 1 TO A GIVING B > > was more readable than > > b = a + 1; This is not what rx is about though. Your example does not show any change in structure. rx does. > Hexadecimal CPU codes aren't and aren't intended to be human-readable. > String regular expressions are. Well, "readable" is not black and white. If we can have "more readable", then even better. > rx MUST be written over several lines and indented. A string regexp, by > contrast, usually fits onto a single line. No, it does not have to be written over several lines. I don't know where you got that from. That said, is "fitting onto a single line" necessarily good? >> - rx does not require escaping any character with backslashes. This >> is always a great source of confusion when switching from BRE to ERE, >> between different interpreters and when storing regexp in Lisp strings >> where backslashes must be escaped themselves for instance. > >> - Symbols with non-trivial meanings in regexp (e.g. \<, :, ^, etc.) have >> a trivial _English_ counterpart in rx: (respectively "word-start", >> nothing, "line-start" _and_ "not"). > > The "English" counterpart used in rx is bulky and difficult to learn. > Somehow, you've got to learn that it's "word-start" and not > "word-beginning", Could argue the same about "*" vs. "%". But words that have a meaning in a natural language are easier to remember than arbitrary symbols. > that it's "not" and not "non", and so on. This is more > difficult than just learning \< and ^. If your native language isn't > English, it might be much more difficult. All programmers learn some basic English, say, "if then else". I don't think that symbolic languages are easier to learn than natural languages for human beings. > Well, so far, on this list, two or three people have said they "like" > rx.el. Nobody has said "I'm going to be using rx.el in my programs from > now on". Which is precisely why we are talking about it. To let people know, pique their curiosity, let them try and report feedback. "Not famous" does not equal bad quality. That's why we need to communicate to give good products a better chance. -- Peter Neidhardt --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQEzBAEBCAAdFiEEUPM+LlsMPZAEJKvom9z0l6S7zH8FAlsIc3oACgkQm9z0l6S7 zH+nhgf6ArFkTnAQu8tF+ifu5fgZAMQAlpRZLCQTBSfc9qi6zohxo6Zc5i6X1WSn xffVNl4nFQJlkauu9r8dnQVe6/SxHU4qDJBuEIq4DSEGqwOZFKRJmAd83fJmqE9P OGTZzJz63I9dGh8EXmtYuii6SOiq4SjceBu5tsI5OQ0r0srI/0OOfUDvd/l0WPFT dPVg8lfB7VkP2hSQP6Db+4lf9R9r+6Sp7b77f6Fj8Y39kBgpwAxAUJcC/wfAh8qd CWNvFPvpC63d03sD+ou7Gv0nHcVzScxMiUm273AQxLzch25aJdf3/tUETZd8Mhop FnKgcTSi61UYijb6B7//S2aHheOvYA== =y518 -----END PGP SIGNATURE----- --=-=-=--