From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp1 ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id u7dzCzNIW2CKLAAA0tVLHw (envelope-from ) for ; Wed, 24 Mar 2021 14:09:55 +0000 Received: from aspmx1.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp1 with LMTPS id OPK6BjNIW2B4PgAAbx9fmQ (envelope-from ) for ; Wed, 24 Mar 2021 14:09:55 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 2DB0514447 for ; Wed, 24 Mar 2021 15:09:54 +0100 (CET) Received: from localhost ([::1]:40858 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lP4Cm-0000nf-L1 for larch@yhetil.org; Wed, 24 Mar 2021 10:09:52 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:46320) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lP4Bz-0000mN-1M for emacs-orgmode@gnu.org; Wed, 24 Mar 2021 10:09:03 -0400 Received: from mail-lf1-x129.google.com ([2a00:1450:4864:20::129]:43570) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1lP4Bw-0002tq-RU for emacs-orgmode@gnu.org; Wed, 24 Mar 2021 10:09:02 -0400 Received: by mail-lf1-x129.google.com with SMTP id m12so32171289lfq.10 for ; Wed, 24 Mar 2021 07:08:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:in-reply-to:references:date:message-id :mime-version; bh=yYRJEKykgZUJvINNvwq3f4SQ/eVGqxzV2veM+tSHDhM=; b=twblgOtWL4sSVMWB5YA1oc5Oj6++m0L+SgpKoEcuFfHSTFK/jaeDHkqm5PcCZw4PPe CiqZszrfP4F75mW51TsoiNLpSE9h/Zm45WGXn/7gOqrnZ7vI7HuCQDd+TnvkMSCdWLYR mc4/xZl8fjyLmnzzRY0/O0yJErfyB2f1fIrYDaiRuJ4Lgrc3Apn94HHHm5mhPcoQX4yz YaCrcSl2wgruf56QGUj5GiNz/OBe2JmG8tki6HkWfb5UjATt4g6RPHgE30B7joRuC+E0 Gqip524JPKaIoDzKc5vFeiqGteR7icwDoGpp+svEn1BQhB+7x0fmY683T0nDQZzSQnO2 ABKA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:in-reply-to:references:date :message-id:mime-version; bh=yYRJEKykgZUJvINNvwq3f4SQ/eVGqxzV2veM+tSHDhM=; b=MYVMJgeI8/4VGOE/ASGTp29BiaMbAVdU4/shHgwRpKo1dOyMv7H9ZFDJD3qjaG1ffD MfwYw5DFOUkyZ2gt6SJ1V6OtCELgr7yL026h++F3QCzvyxhLtVnpBqPSTeBBvobgRT3F y/3ZG8CjbgW4zSts7MmpXEWei+dXEg9yYJghTUTLm4fS3ZWK4qfBSf0ptxYn2GrPQQsG G31GGVyUBY7Z9xZJL3dWJICrppljL/aYkZWDZs/TpFFYiY+86oZ3Lh3jnFlGNWcYnBgN W5Uxtiwtu67Gdbc/bNrd9BRFSEN9Rd0UjSS+u/GVXitwuD9+12qRLrbYU9lBRMZbKAr0 jknA== X-Gm-Message-State: AOAM530II1R4cptZa4V9V4sPh/5Qwh/p6YHE1G8Yfg7JS6qsAsRHW0bX 0AdIqKmQqM+4picpCpDTbbs= X-Google-Smtp-Source: ABdhPJwrg9TZSYslrtAKj+Pu1Z7Uu5YZjN1VtTnxA9k3nLWQ0qirMtfLRGo4BK28fMRBsqNTyFMA0w== X-Received: by 2002:ac2:47ef:: with SMTP id b15mr1992565lfp.615.1616594938060; Wed, 24 Mar 2021 07:08:58 -0700 (PDT) Received: from localhost ([158.255.2.14]) by smtp.gmail.com with ESMTPSA id p13sm341440ljc.72.2021.03.24.07.08.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Mar 2021 07:08:57 -0700 (PDT) From: Ihor Radchenko To: Nicolas Goaziou Subject: Re: Bug: Plain https links with brackets are not recognised [9.4.4 (release_9.4.4-625-g763c7a @ /home/yantar92/.emacs.d/straight/build/org/)] In-Reply-To: <871rc466k6.fsf@localhost> References: <87pn03g3rr.fsf@localhost> <87wntw4svz.fsf@nicolasgoaziou.fr> <871rc466k6.fsf@localhost> X-Woof-Patch: yes Date: Wed, 24 Mar 2021 22:13:01 +0800 Message-ID: <87v99g4p3m.fsf@localhost> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Received-SPF: pass client-ip=2a00:1450:4864:20::129; envelope-from=yantar92@gmail.com; helo=mail-lf1-x129.google.com X-Spam_score_int: -17 X-Spam_score: -1.8 X-Spam_bar: - X-Spam_report: (-1.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-orgmode@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: emacs-orgmode@gnu.org Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org Sender: "Emacs-orgmode" X-Migadu-Flow: FLOW_IN ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1616594994; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:in-reply-to:in-reply-to: references:references:list-id:list-help:list-unsubscribe: list-subscribe:list-post:dkim-signature; bh=vOphgXaETuvA1goxce3jJQ4aJo3bLrQEse5pcozyvVA=; b=kISpKHQd1zwWxJAZZQDuPsIg67EQhQ+5oBa74Nrx6/km53f/x7PYHX9aE8oMig0bzD6uMw NjVARF7a8DBfbnaBZqm0DyzqtkfhK8Zgxn1N5osvUxpNNV+MBldBTpsi70TDfIt3wdF3LA 37s/XqQMHLqgTa2TipyZ/Oj5sdCynVhbFUIl1XhEYJp1LykiRejPWzkQVqKboKb0o4llzE Oxbv1tpQckNy/4jQQCnsS++QCkpvBEEr0vrR0ETMlucH1HCi86uCLXn2iC3ogxoN9xAl6F KArC1aNhIZa4AjGjKTRGIYQV3j1Vm3tFk9cop3+kHUIJ84E8+FEv7mwL/5LiCw== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1616594994; a=rsa-sha256; cv=none; b=aKmeIrF1h49KXOr/W3oiwN/Z76GuAqR5dcY+OnrH1t4uQDewIbjBy3oVtzroCR3HP3pyWz XCvjsS1cUoipBDuI7SM5dFOSPTQMeatP6d5DnAWWnPXzPRD89lS8pLl//JDXaZMR7u0s+b 64BWqzfIfWGSZ/heQP195aQulmcjBtmX3KsmGXAmPlJnWGy0vymMr9yCpUFYeWbIgs/X2v TXe+iDGALYJhS2RhfhOaJUHk7r8mUNLLqqWm3Rb8d0wb8yA53Ri56nxdasWiacKK4YInxs YbQ6GBgvo/9NtJ5xn70bml11ODXQ4DvaveQq3UvehaJc6eT5AIAZ14ufFoSmuA== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=fail ("body hash did not verify") header.d=gmail.com header.s=20161025 header.b=twblgOtW; dmarc=fail reason="SPF not aligned (relaxed)" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of emacs-orgmode-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=emacs-orgmode-bounces@gnu.org X-Migadu-Spam-Score: -0.82 Authentication-Results: aspmx1.migadu.com; dkim=fail ("body hash did not verify") header.d=gmail.com header.s=20161025 header.b=twblgOtW; dmarc=fail reason="SPF not aligned (relaxed)" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of emacs-orgmode-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=emacs-orgmode-bounces@gnu.org X-Migadu-Queue-Id: 2DB0514447 X-Spam-Score: -0.82 X-Migadu-Scanner: scn0.migadu.com X-TUID: ytNfnneF1fb3 --=-=-= Content-Type: text/plain Updated version of the patch. Ihor Radchenko writes: > Nicolas Goaziou writes: >> Actually, this is (was?) intentional. By forbidding parenthesis in >> a plain URL, you allow one to type, e.g., (https://orgmode.org), which >> is, IMO, a more frequent need than having to deal with parenthesis in >> the URL. > > The patch correctly recognises situations like this. > https://orgmode.org is recognised correctly without parenthesis. > > I guess this example may be another addition to the tests. > > Best, > Ihor --=-=-= Content-Type: text/x-diff; charset=utf-8 Content-Disposition: inline; filename=0001-Improve-org-link-plain-re.patch Content-Transfer-Encoding: quoted-printable >From 08efc990a578c925d42315c45e0b9b76536b92af Mon Sep 17 00:00:00 2001 From: Ihor Radchenko Date: Wed, 24 Mar 2021 21:27:24 +0800 Subject: [PATCH] Improve org-link-plain-re (org-link-plain-re): Update docstring. Now, the docstring explicitly mentions that the regexp must contain groups for the link type and the path. * lisp/ol.el (org-link-make-regexps): Allow URLs with up to two levels of nested brackets. Now, URLs like [1] can be matched. The new regexp is based on [2]. * testing/lisp/test-ol.el: Add tests for the plain link regexp [1] https://doi.org/10.1016/0160-791x(79)90023-x [2] https://daringfireball.net/2010/07/improved_regex_for_matching_urls --- lisp/ol.el | 41 +++++++++++---- testing/lisp/test-ol.el | 110 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 141 insertions(+), 10 deletions(-) diff --git a/lisp/ol.el b/lisp/ol.el index b8bd7d234..550c0cff6 100644 --- a/lisp/ol.el +++ b/lisp/ol.el @@ -519,7 +519,10 @@ links more efficient." "Matches link with angular brackets, spaces are allowed.") =20 (defvar org-link-plain-re nil - "Matches plain link, without spaces.") + "Matches plain link, without spaces. +Group 1 must contain the link type (i.e. https). +Group 2 must contain the link path (i.e. //example.com). +Used by `org-element-link-parser'.") =20 (defvar org-link-bracket-re nil "Matches a link in double brackets.") @@ -807,15 +810,33 @@ This should be called after the variable `org-link-pa= rameters' has changed." (format "<%s:\\([^>\n]*\\(?:\n[ \t]*[^> \t\n][^>\n]*\\)*\\)>" types-re) org-link-plain-re - (concat - "\\<" types-re ":" - "\\([^][ \t\n()<>]+\\(?:([[:word:]0-9_]+)\\|\\([^[:punct:] \t\n]\\|/\\= )\\)\\)") - ;; "\\([^]\t\n\r<>() ]+[^]\t\n\r<>,.;() ]\\)") - org-link-bracket-re - (rx (seq "[[" - ;; URI part: match group 1. - (group - (one-or-more + (let* ((non-space-bracket "[^][ \t\n()<>]") + (parenthesis + `(seq "(" + (0+ (or (regex ,non-space-bracket) + (seq "(" + (0+ (regex ,non-space-bracket)) + ")"))) + ")"))) + ;; Heuristics for an URL link inspired by + ;; https://daringfireball.net/2010/07/improved_regex_for_matching_urls + (rx-to-string + `(seq word-start + ;; Link type: match group 1. + (regexp ,types-re) + ":" + ;; Link path: match group 2. + (group + (1+ (or (regex ,non-space-bracket) + ,parenthesis)) + (or (regexp "[^[:punct:] \t\n]") + ?/ + ,parenthesis))))) + org-link-bracket-re + (rx (seq "[[" + ;; URI part: match group 1. + (group + (one-or-more (or (not (any "[]\\")) (and "\\" (zero-or-more "\\\\") (any "[]")) (and (one-or-more "\\") (not (any "[]")))))) diff --git a/testing/lisp/test-ol.el b/testing/lisp/test-ol.el index 5b7dc513b..ddcc570b3 100644 --- a/testing/lisp/test-ol.el +++ b/testing/lisp/test-ol.el @@ -491,5 +491,115 @@ (org-previous-link)) (buffer-substring (point) (line-end-position)))))) =20 + +;;; Link regexps + + +(defmacro test-ol-parse-link-in-text (text) + "Return list of :type and :path of link parsed in TEXT. +\"\" string must be at the beginning of the link to be parsed." + (declare (indent 1)) + `(org-test-with-temp-text ,text + (list (org-element-property :type (org-element-link-parser)) + (org-element-property :path (org-element-link-parser))))) + +(ert-deftest test-ol/plain-link-re () + "Test `org-link-plain-re'." + (should + (equal + '("https" "//example.com") + (test-ol-parse-link-in-text + "(https://example.com)"))) + (should + (equal + '("https" "//example.com/qwe()") + (test-ol-parse-link-in-text + "(Some text https://example.com/qwe())"))) + (should + (equal + '("https" "//doi.org/10.1016/0160-791x(79)90023-x") + (test-ol-parse-link-in-text + "https://doi.org/10.1016/0160-791x(79)90023-x"))) + (should + (equal + '("file" "aa") + (test-ol-parse-link-in-text + "The file:aa link"))) + (should + (equal + '("file" "a(b)c") + (test-ol-parse-link-in-text + "The file:a(b)c link"))) + (should + (equal + '("file" "a()") + (test-ol-parse-link-in-text + "The file:a() link"))) + (should + (equal + '("file" "aa((a))") + (test-ol-parse-link-in-text + "The file:aa((a)) link"))) + (should + (equal + '("file" "aa(())") + (test-ol-parse-link-in-text + "The file:aa(()) link"))) + (should + (equal + '("file" "/a") + (test-ol-parse-link-in-text + "The file:/a link"))) + (should + (equal + '("file" "/a/") + (test-ol-parse-link-in-text + "The file:/a/ link"))) + (should + (equal + '("http" "//") + (test-ol-parse-link-in-text + "The http:// link"))) + (should + (equal + '("file" "ab") + (test-ol-parse-link-in-text + "The (some file:ab) link"))) + (should + (equal + '("file" "aa") + (test-ol-parse-link-in-text + "The file:aa) link"))) + (should + (equal + '("file" "aa") + (test-ol-parse-link-in-text + "The file:aa( link"))) + (should + (equal + '("http" "//foo.com/more_(than)_one_(parens)") + (test-ol-parse-link-in-text + "The http://foo.com/more_(than)_one_(parens) link"))) + (should + (equal + '("http" "//foo.com/blah_(wikipedia)#cite-1") + (test-ol-parse-link-in-text + "The http://foo.com/blah_(wikipedia)#cite-1 link"))) + (should + (equal + '("http" "//foo.com/blah_(wikipedia)_blah#cite-1") + (test-ol-parse-link-in-text + "The http://foo.com/blah_(wikipedia)_blah#cite-1 link"))) + (should + (equal + '("http" "//foo.com/unicode_(=E2=9C=AA)_in_parens") + (test-ol-parse-link-in-text + "The http://foo.com/unicode_(=E2=9C=AA)_in_parens link"))) + (should + (equal + '("http" "//foo.com/(something)?after=3Dparens") + (test-ol-parse-link-in-text + "The http://foo.com/(something)?after=3Dparens link")))) + (provide 'test-ol) ;;; test-ol.el ends here --=20 2.26.2 --=-=-=--