From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0.migadu.com ([2001:41d0:303:e224::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms13.migadu.com with LMTPS id iOo2M3YIb2Yu0wAAqHPOHw:P1 (envelope-from ) for ; Sun, 16 Jun 2024 15:44:55 +0000 Received: from aspmx1.migadu.com ([2001:41d0:303:e224::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0.migadu.com with LMTPS id iOo2M3YIb2Yu0wAAqHPOHw (envelope-from ) for ; Sun, 16 Jun 2024 17:44:55 +0200 X-Envelope-To: larch@yhetil.org Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of "emacs-orgmode-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="emacs-orgmode-bounces+larch=yhetil.org@gnu.org" ARC-Seal: i=1; s=key1; d=yhetil.org; t=1718552694; a=rsa-sha256; cv=none; b=foi7JcFouHozNU16Ag6X47CwNFA7tVPDPA35Tv9mS+U/4QJKQLJSKQhup1zlB0zETlI/By 7XEZNzKWu7Y+HKpzlWI2UNA7OEHvyd5wDa6GUYdMZZ3H7+GMViIRMy1g8C7rHio82H8I2S X2G/hvt9nXN/fIzq7irNqiATGiYQ0n+M6eGaX7mRNmYAWr/wfG8/qoHF9XbHp1rSDqW6Bu gVougqAIbP0I8C+q91e/Q87TQ0yFOt4nVa5TUmd5SYJQJMUGlHnHN+QcyAmLKgMuuLH/Je H5mypA8swXrqHCa0wxPaWp0G/RNu8ORIWTAiVIwCs3ZP9X0pGzHpKyzYiwG1pg== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of "emacs-orgmode-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="emacs-orgmode-bounces+larch=yhetil.org@gnu.org" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1718552694; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post; bh=MEy+Ei4ncuZm3ak5f/2QEECeEJpkER937HyUr2aE3Eo=; b=HsM+t2hGbrmuz//wEwBO1QiiswEg/ZDGj89aHKTRv8d/CzxCuzNweeCwLwChKR1ojsYkT0 9YtqBcEN3lMfYpsSoKo1fZzuP8kkV6/4vDo5smB0JdwID/hy78QalxZ+ykmHInwowZL/sR gdcZvey59Us5amJ9XpLqGM1fz1CpPTHxzvWGU11lgScFSzRefPkPIsvj73yoxMoNurSILo 6Xwg9vCj2Fa+AxO33/Ui/4A0nT4qsOcpMdgmNmJBWI1LfT4bgx0/bCmhctN43uNihUyJYu F4FKI0YjuW/HXmR1MvioJatiyE2su+usmUqUXcDFyTInEiHgiVplfkKIinpxig== Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id C8D1B69070 for ; Sun, 16 Jun 2024 17:44:53 +0200 (CEST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sIs2t-0004NW-Fw; Sun, 16 Jun 2024 11:43:55 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sIs2s-0004NI-2X for emacs-orgmode@gnu.org; Sun, 16 Jun 2024 11:43:54 -0400 Received: from ciao.gmane.io ([116.202.254.214]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sIs2q-0000Kn-8t for emacs-orgmode@gnu.org; Sun, 16 Jun 2024 11:43:53 -0400 Received: from list by ciao.gmane.io with local (Exim 4.92) (envelope-from ) id 1sIs2o-0004eA-6x for emacs-orgmode@gnu.org; Sun, 16 Jun 2024 17:43:50 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: emacs-orgmode@gnu.org From: Max Nikulin Subject: Re: [BUG] Trailing dash is not included in link [9.7.3 (9.7.3-2f1844 @ /home/mwillcock/.emacs.d/elpa/org-9.7.3/)] Date: Sun, 16 Jun 2024 22:43:39 +0700 Message-ID: References: <87sexh9ddv.fsf@ice9.digital> <87le37k4c8.fsf@localhost> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit User-Agent: Mozilla Thunderbird Content-Language: en-US, ru-RU In-Reply-To: <87le37k4c8.fsf@localhost> Received-SPF: pass client-ip=116.202.254.214; envelope-from=geo-emacs-orgmode@m.gmane-mx.org; helo=ciao.gmane.io X-Spam_score_int: 26 X-Spam_score: 2.6 X-Spam_bar: ++ X-Spam_report: (2.6 / 5.0 requ) BAYES_00=-1.9, DKIM_ADSP_CUSTOM_MED=0.001, FORGED_GMAIL_RCVD=1, FORGED_MUA_MOZILLA=2.309, FREEMAIL_FORGED_FROMDOMAIN=0.001, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.249, NML_ADSP_CUSTOM_MED=0.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-orgmode@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org Sender: emacs-orgmode-bounces+larch=yhetil.org@gnu.org X-Migadu-Flow: FLOW_IN X-Migadu-Country: US X-Migadu-Spam-Score: -5.44 X-Migadu-Scanner: mx12.migadu.com X-Spam-Score: -5.44 X-Migadu-Queue-Id: C8D1B69070 X-TUID: vQxzkKMLtXLF On 14/06/2024 21:04, Ihor Radchenko wrote: > Morgan Willcock writes: > >> i.e. Inserting "https://domain/test-" into the buffer will create a >> clickable link for "https://domain/test". >> > I improved the heuristics we use to detect plain links. > Fixed, on main. > https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=73da6beb5 > +++ b/etc/ORG-NEWS [...] > +*** Trailing =-= is now allowed in plain links After a look into 7dcb1afb6 2021-03-24 21:27:24 +0800 Ihor Radchenko: Improve org-link-plain-re I suspect, it worked prior to v9.5. Without a unit test it may be accidentally broken again. > +: https://domain/test- example.org, example.net, example.com are domains reserved for usage in examples: > (or (regexp "[^[:punct:] \t\n]") I have realized that some Org regexps use [:punct:] *regexp class* and others *syntax class*, see latex math regexp. I am in doubts if the discrepancy is intentional. I have noticed that the following change 09ced6d2c 2024-02-03 15:15:46 +0100 Ihor Radchenko: org-link-plain-re: Improve regexp heuristics that causes (link http://example.org/a (link http://example.org/a%3Cb)

I expect that ")" should not be parsed as a part of the link. Balanced brackets are tricky with regexps (and it is not possible to match arbitrary nested ones). Perhaps "[^[:punct:] \t\n]" is too strict in respect to spaces. It does not allow the recommended workaround with zero width space: (org-export-string-as "http://example.org\N{ZERO WIDTH SPACE}[fn::footnote]" 'html 'body) "

http://example.org​[fn::footnote]

" Actually some kind of non-breakable space should be better in such cases: (org-export-string-as "http://example.org\N{NO-BREAK SPACE}[fn::footnote]" 'html 'body) "

http://example.org [fn::footnote]

" I would consider [:space:] or \s-. As to the original bug report, while reading it, I noticed that thunderbird includes dash into the recognized link for "https://domain/test-" I decided to look into its implementation and to my surprise I found: ``punctation chars and "-" at the end are stipped off.'' I realized that double quotes along with angle brackets are treated as a recommended way to mark URLs in plain text. Thunderbird does not consider dash as a part of links for e.g. http://example.org/t- It might be an attempt to reserve possibility to assemble URLs wrapped into several lines with added hyphenation marks, but it has not been implemented (RFC2396 appendix E warns about accidentally added hyphens). https://www.bucksch.org/1/projects/mozilla/16507/ https://searchfox.org/mozilla-central/source/netwerk/streamconv/converters/mozTXTToHTMLConv.cpp#line-243 mozTXTToHTMLConv::FindURLEnd Implementation is tricky, I have not noticed anything that may be reused to improve heuristics for Org. Nowadays it is likely better to inspect autolinking code for GitHub/GitLab or widely used python packages.