From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Noam Postavsky Newsgroups: gmane.emacs.devel Subject: Re: New rx implementation with extension constructs Date: Thu, 5 Sep 2019 11:38:23 -0400 Message-ID: References: <1C71289F-C5D5-4F9C-947C-374110C1D572@acm.org> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="260117"; mail-complaints-to="usenet@blaine.gmane.org" Cc: emacs-devel To: =?UTF-8?Q?Mattias_Engdeg=C3=A5rd?= Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Sep 05 17:39:38 2019 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1i5trE-0015PX-Bx for ged-emacs-devel@m.gmane.org; Thu, 05 Sep 2019 17:39:36 +0200 Original-Received: from localhost ([::1]:47536 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1i5trD-0001Kv-6T for ged-emacs-devel@m.gmane.org; Thu, 05 Sep 2019 11:39:35 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:46741) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1i5tqM-0001Kp-B6 for emacs-devel@gnu.org; Thu, 05 Sep 2019 11:38:43 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1i5tqL-0004e5-5O for emacs-devel@gnu.org; Thu, 05 Sep 2019 11:38:42 -0400 Original-Received: from mail-ot1-x32f.google.com ([2607:f8b0:4864:20::32f]:33733) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1i5tqK-0004dI-Vp for emacs-devel@gnu.org; Thu, 05 Sep 2019 11:38:41 -0400 Original-Received: by mail-ot1-x32f.google.com with SMTP id g25so1384773otl.0 for ; Thu, 05 Sep 2019 08:38:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=2xD32stjOj8lzCGK7xVLJiJLBFsj8XZvuMMJBednCZE=; b=rxZ1gGV7EdFuKw4aSGYqzVyz2gWUe322dzr1whey5SXx45Pw4x3ZGanbS1sj0zEfCo iZjU+w1/ShJc+elRrX7PiNtiwQUPxafhkht6OgwsXZ6bdDJf4QLYVXLeBTYiAWoAXhxg K2SaMGFfE5vTe54xcjF9gNtW1uN3anstSNcyZvKMt8nrUSZ21ABn/mof8xw5rjiA1+lI 20i2HoibyBgXg8L3MeLeGXtZKQjhvCjPhixWhsByzUJpFtQMA6d1HmYrzePi9I9Xlk2B UNV8k+L4Lbnuymi6MuX0cdXFzrzch+Y2U2FM4uObjQ7XuTzzvF/TVRGHc3sTqeR3DNN1 xYIQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=2xD32stjOj8lzCGK7xVLJiJLBFsj8XZvuMMJBednCZE=; b=PmV1g6jqvNwwlK2s2VVi+60h5yiAcShn0iLrB11aM80s2PfkTTtVAgUPB6UASRg4c/ IeIKO4sM6f+X6JgqOq8cyvb6BoQf7FbjU84NleVaRoN5obXTp1Tb61aBVeIilggpG79+ Vi/uz6P5rlY3YsJ/m2KKHWsmFujClWLOuikzczJzFsvdSwrvjeAsp8YSRAy7RnBKu9Wk sf9mO/QP+/uzwW7HO3WSYrYmGWrTo1ebbv8C28+gLqa4nbHR6/dSv63cKoZVl3khtptA uxFIcEvZKK+UIq7D2dcWRgm3WeKlDKo1aFr8Wf8hqtf8G43qaQU82ExNCgYJOJeUqIOX rCZQ== X-Gm-Message-State: APjAAAUnx0c6Z8zHesHrxiNz0Z9PHeP4YKmQza4rpUqF5xVkVuaf82YQ JXVaCmEaS8OL47pQ2SD/kKXQYHWi3QFS7GtA6Vc= X-Google-Smtp-Source: APXvYqyaa/QfM8TI6oJMQ8bfI8SWHgD6gPPOgVLQytjpPhaHnfXdHH49JrWW31d3yDkfWXyaUStWHM4UuM/MJ6rvISw= X-Received: by 2002:a9d:12e4:: with SMTP id g91mr2621578otg.368.1567697919611; Thu, 05 Sep 2019 08:38:39 -0700 (PDT) In-Reply-To: <1C71289F-C5D5-4F9C-947C-374110C1D572@acm.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:4864:20::32f X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:239871 Archived-At: > works just as expected. &rest arguments are permitted, and expand to > implicit (seq ...) forms. No provision was made for macros able to > execute arbitrary Lisp code; I just couldn't find a use for them, and > decided to wait until someone would tell me otherwise. Thus, all > parametrised forms work by plain substitution. Do you mean that macros don't support (literal LISP-FORM) and (regexp LISP-FORM)? Or something else? > +;; The `rx--translate...' functions below return (REGEXP . PRECEDENCE), > +;; where REGEXP is a list of string expressions that will be > +;; concatenated into a regexp, and PRECEDENCE is one of > +;; > +;; t -- can be used as argument to postfix operators > +;; seq -- can be concatenated in sequence with other seq or higher > +;; lseq -- can be concatenated to the left of rseq or higher > +;; rseq -- can be concatenated to the right of lseq or higher > +;; nil -- can only be used in alternatives > +;; > +;; They form a lattice: > +;; > +;; t highest precedence > +;; | > +;; seq > +;; / \ > +;; lseq rseq > +;; \ / > +;; nil lowest precedence It would help to add some concrete examples (i.e., of things that would count as `t', `seq', etc) to this abstract explanation. > +(defun rx--translate-symbol (sym) > + "Translate an rx symbol. Return (REGEXP . PRECEDENCE)." > + (pcase sym > + ((or 'nonl 'not-newline 'any) (cons (list ".") t)) Is there a reason not to use '((".") . t) here (and similar for the rest of the alternatives)? If yes, then it's probably worth mentioning in a comment. > +(defun rx--string-to-intervals (str) > + "Decode STR as intervals: A-Z becomes (?A . ?Z), and the single > +character X becomes (?X . ?X). Return the intervals in a list." > + ;; We could just do string-to-multibyte on the string and work with > + ;; that instead of this `decode-char' workaround. > (let ((decode-char > - ;; Make sure raw bytes are decoded as such, to avoid confusion with > - ;; U+0080..U+00FF. > (if (multibyte-string-p str) > #'identity > (lambda (c) (if (<= #x80 c #xff) > @@ -483,477 +280,657 @@ rx-check-any-string > c)))) If not using string-to-multibyte, I think this lambda can be replaced with #'unibyte-char-to-multibyte.