unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#74132: 31.0.50; thing-at-pt, ffap and Github markdown
@ 2024-10-31 10:36 Madhu
       [not found] ` <handler.74132.B.173037135112690.ack@debbugs.gnu.org>
  2024-11-09 10:29 ` bug#74132: 31.0.50; thing-at-pt, ffap and Github markdown Eli Zaretskii
  0 siblings, 2 replies; 8+ messages in thread
From: Madhu @ 2024-10-31 10:36 UTC (permalink / raw)
  To: 74132

[-- Attachment #1: Type: Text/Plain, Size: 877 bytes --]

Consider the following text as is typically found on README.md

```
[![GitHub Releases Downloads](https://img.shields.io/github/downloads/raysan5/raylib/total)](https://github.com/raysan5/raylib/releases)
```

If the point is say at "r" at "raylib/releases", invoking
(ffap-url-at-point) fails.  this eventually calls
thing-at-point-bounds-of-url-at-point, which has hardcoded behaviour
to, skip over "allowed characters" backwards to find the beginning of
the bound. here it it finds the space character (in "Release
Downloads") and the whole thing fails.

This particular failure can be addressed by supplying the lim
paramater to the skip-chars-backward, as shown in the attached
patch.

does this look like a problem which ought to be solved? and is this
appropriate? (I was going to post on emacs-devel but decided to post
to the bug list instead) -- Best Regards, Madhu


[-- Attachment #2: 0001-lisp-thingatpt.el-recognize-urls-better-in-markdown-.patch --]
[-- Type: Text/X-Patch, Size: 1752 bytes --]

From 5971b7c10d7c38d540fdf278a0cd559c96b10ed2 Mon Sep 17 00:00:00 2001
From: Madhu <enometh@net.meer>
Date: Thu, 31 Oct 2024 15:40:42 +0530
Subject: [PATCH] lisp/thingatpt.el: recognize urls better in markdown text

* lisp/thingatpt.el: (thing-at-point-bounds-of-url-at-point): supply a
LIM when calling (skip-chars-backward allowed-chars), which is the
position where `thing-at-point-beginning-of-url-regexp' matches
backwards

problematic url e.g.
```
[![GitHub Releases Downloads](https://img.shields.io/github/downloads/raysan5/raylib/total)](https://github.com/raysan5/raylib/releases)
```
If the point is in the the second url, skip-chars-backwards goes to the
space (between s and D) and `ffap-url-at-point' eventually fails.
but if we supply a limit with a left anchor, we work around it.
---
 lisp/thingatpt.el | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/lisp/thingatpt.el b/lisp/thingatpt.el
index 3cfd3905701..0b8e28af5b9 100644
--- a/lisp/thingatpt.el
+++ b/lisp/thingatpt.el
@@ -502,9 +502,14 @@ thing-at-point-bounds-of-url-at-point
       (let* ((allowed-chars "--:=&?$+@-Z_[:alpha:]~#,%;*()!'[]")
 	     (skip-before "^[0-9a-zA-Z]")
 	     (skip-after  ":;.,!?'")
+             (hard-beg (and thing-at-point-beginning-of-url-regexp
+                            (save-excursion
+                              (re-search-backward
+                               thing-at-point-beginning-of-url-regexp nil t)
+                              (point))))
 	     (pt (point))
 	     (beg (save-excursion
-		    (skip-chars-backward allowed-chars)
+		    (skip-chars-backward allowed-chars hard-beg)
 		    (skip-chars-forward skip-before pt)
 		    (point)))
 	     (end (save-excursion
-- 
2.46.0.27.gfa3b914457


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* bug#74132: Acknowledgement (31.0.50; thing-at-pt, ffap and Github markdown)
       [not found] ` <handler.74132.B.173037135112690.ack@debbugs.gnu.org>
@ 2024-11-01  5:38   ` Madhu
  2024-11-02 22:04     ` Stefan Kangas
  0 siblings, 1 reply; 8+ messages in thread
From: Madhu @ 2024-11-01  5:38 UTC (permalink / raw)
  To: 74132

There was a typo in the patch I posted. It should instead look like this
```
diff --git a/lisp/thingatpt.el b/lisp/thingatpt.el
--- a/lisp/thingatpt.el
+++ b/lisp/thingatpt.el
@@ -502,9 +502,15 @@ thing-at-point-bounds-of-url-at-point
       (let* ((allowed-chars "--:=&?$+@-Z_[:alpha:]~#,%;*()!'[]")
 	     (skip-before "^[0-9a-zA-Z]")
 	     (skip-after  ":;.,!?'")
+             (hard-beg (and thing-at-point-beginning-of-url-regexp
+                            (save-excursion
+                              (and
+                               (re-search-backward
+                                thing-at-point-beginning-of-url-regexp nil t)
+                               (point)))))
 	     (pt (point))
 	     (beg (save-excursion
-		    (skip-chars-backward allowed-chars)
+		    (skip-chars-backward allowed-chars hard-beg)
 		    (skip-chars-forward skip-before pt)
 		    (point)))
 	     (end (save-excursion
```





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#74132: Acknowledgement (31.0.50; thing-at-pt, ffap and Github markdown)
  2024-11-01  5:38   ` bug#74132: Acknowledgement (31.0.50; thing-at-pt, ffap and Github markdown) Madhu
@ 2024-11-02 22:04     ` Stefan Kangas
  0 siblings, 0 replies; 8+ messages in thread
From: Stefan Kangas @ 2024-11-02 22:04 UTC (permalink / raw)
  To: Madhu, 74132

Madhu <enometh@meer.net> writes:

> There was a typo in the patch I posted. It should instead look like this

Could you please resend the amended patch as an attachment?

Also, how about having some tests for this?

> ```
> diff --git a/lisp/thingatpt.el b/lisp/thingatpt.el
> --- a/lisp/thingatpt.el
> +++ b/lisp/thingatpt.el
> @@ -502,9 +502,15 @@ thing-at-point-bounds-of-url-at-point
>        (let* ((allowed-chars "--:=&?$+@-Z_[:alpha:]~#,%;*()!'[]")
>  	     (skip-before "^[0-9a-zA-Z]")
>  	     (skip-after  ":;.,!?'")
> +             (hard-beg (and thing-at-point-beginning-of-url-regexp
> +                            (save-excursion
> +                              (and
> +                               (re-search-backward
> +                                thing-at-point-beginning-of-url-regexp nil t)
> +                               (point)))))
>  	     (pt (point))
>  	     (beg (save-excursion
> -		    (skip-chars-backward allowed-chars)
> +		    (skip-chars-backward allowed-chars hard-beg)
>  		    (skip-chars-forward skip-before pt)
>  		    (point)))
>  	     (end (save-excursion
> ```





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#74132: 31.0.50; thing-at-pt, ffap and Github markdown
  2024-10-31 10:36 bug#74132: 31.0.50; thing-at-pt, ffap and Github markdown Madhu
       [not found] ` <handler.74132.B.173037135112690.ack@debbugs.gnu.org>
@ 2024-11-09 10:29 ` Eli Zaretskii
  2024-11-09 15:52   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 1 reply; 8+ messages in thread
From: Eli Zaretskii @ 2024-11-09 10:29 UTC (permalink / raw)
  To: Madhu, Stefan Monnier; +Cc: 74132

> Date: Thu, 31 Oct 2024 16:06:49 +0530 (IST)
> From: Madhu <enometh@meer.net>
> 
> Consider the following text as is typically found on README.md
> 
> ```
> [![GitHub Releases Downloads](https://img.shields.io/github/downloads/raysan5/raylib/total)](https://github.com/raysan5/raylib/releases)
> ```
> 
> If the point is say at "r" at "raylib/releases", invoking
> (ffap-url-at-point) fails.  this eventually calls
> thing-at-point-bounds-of-url-at-point, which has hardcoded behaviour
> to, skip over "allowed characters" backwards to find the beginning of
> the bound. here it it finds the space character (in "Release
> Downloads") and the whole thing fails.
> 
> This particular failure can be addressed by supplying the lim
> paramater to the skip-chars-backward, as shown in the attached
> patch.
> 
> does this look like a problem which ought to be solved? and is this
> appropriate? (I was going to post on emacs-devel but decided to post
> to the bug list instead) -- Best Regards, Madhu
> 
> >From 5971b7c10d7c38d540fdf278a0cd559c96b10ed2 Mon Sep 17 00:00:00 2001
> From: Madhu <enometh@net.meer>
> Date: Thu, 31 Oct 2024 15:40:42 +0530
> Subject: [PATCH] lisp/thingatpt.el: recognize urls better in markdown text
> 
> * lisp/thingatpt.el: (thing-at-point-bounds-of-url-at-point): supply a
> LIM when calling (skip-chars-backward allowed-chars), which is the
> position where `thing-at-point-beginning-of-url-regexp' matches
> backwards
> 
> problematic url e.g.
> ```
> [![GitHub Releases Downloads](https://img.shields.io/github/downloads/raysan5/raylib/total)](https://github.com/raysan5/raylib/releases)
> ```
> If the point is in the the second url, skip-chars-backwards goes to the
> space (between s and D) and `ffap-url-at-point' eventually fails.
> but if we supply a limit with a left anchor, we work around it.

What will this do to URLs such as

  http://web.archive.org/web/20240221082647/https://www.imdb.com/

?  More generally, to any URL that has another URL embedded in it?

I'm not sure I see how to resolve this dilemma.  Stefan, any ideas?





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#74132: 31.0.50; thing-at-pt, ffap and Github markdown
  2024-11-09 10:29 ` bug#74132: 31.0.50; thing-at-pt, ffap and Github markdown Eli Zaretskii
@ 2024-11-09 15:52   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-11-09 16:33     ` Eli Zaretskii
  0 siblings, 1 reply; 8+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-11-09 15:52 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Madhu, 74132

> What will this do to URLs such as
>
>   http://web.archive.org/web/20240221082647/https://www.imdb.com/

Depends where point is: if it's after the `https`, then you get the
"sub-URL" and if it's on or before the `https` then you get the whole URL.

> I'm not sure I see how to resolve this dilemma.  Stefan, any ideas?

"url at point" is inherently heuristic, so I'm not too worried.
But I do very much agree with Stefan that we need tests, because it's
all too easy to run around in circles otherwise, fixing the heuristic to
handle case A but breaking case B along the way.


        Stefan






^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#74132: 31.0.50; thing-at-pt, ffap and Github markdown
  2024-11-09 15:52   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-11-09 16:33     ` Eli Zaretskii
  2024-11-09 16:59       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 8+ messages in thread
From: Eli Zaretskii @ 2024-11-09 16:33 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: enometh, 74132

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Madhu <enometh@meer.net>,  74132@debbugs.gnu.org
> Date: Sat, 09 Nov 2024 10:52:06 -0500
> 
> > What will this do to URLs such as
> >
> >   http://web.archive.org/web/20240221082647/https://www.imdb.com/
> 
> Depends where point is: if it's after the `https`, then you get the
> "sub-URL" and if it's on or before the `https` then you get the whole URL.

Exactly.  This could be considered a bug, because the actual URL is
the entire thing.

> > I'm not sure I see how to resolve this dilemma.  Stefan, any ideas?
> 
> "url at point" is inherently heuristic, so I'm not too worried.
> But I do very much agree with Stefan that we need tests, because it's
> all too easy to run around in circles otherwise, fixing the heuristic to
> handle case A but breaking case B along the way.

I'm okay with adding tests, of course, but I'm not sure which of the
two behaviors leave you "not too worried": the current or the new one
after the proposed change.  And why.





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#74132: 31.0.50; thing-at-pt, ffap and Github markdown
  2024-11-09 16:33     ` Eli Zaretskii
@ 2024-11-09 16:59       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-11-09 17:59         ` Eli Zaretskii
  0 siblings, 1 reply; 8+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-11-09 16:59 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: enometh, 74132

>> > What will this do to URLs such as
>> >   http://web.archive.org/web/20240221082647/https://www.imdb.com/
>> Depends where point is: if it's after the `https`, then you get the
>> "sub-URL" and if it's on or before the `https` then you get the whole URL.
> Exactly.  This could be considered a bug, because the actual URL is
> the entire thing.

We could refine our heuristic to as to keep looking backward when the
apparent beginning of the URL is immediately preceded by a /, of course,
but I'm not sure it's worth the trouble.

AFAICT any behavior we come up with will have such cases.

>> > I'm not sure I see how to resolve this dilemma.  Stefan, any ideas?
>> "url at point" is inherently heuristic, so I'm not too worried.
>> But I do very much agree with Stefan that we need tests, because it's
>> all too easy to run around in circles otherwise, fixing the heuristic to
>> handle case A but breaking case B along the way.
> I'm okay with adding tests, of course, but I'm not sure which of the
> two behaviors leave you "not too worried": the current or the new one
> after the proposed change.  And why.

The [...](...) case mentioned by Madhu is a fairly common one IME, so
I'm in favor of fixing it.  As for the behavior in your example, to the
extent that the users can control which URL they get (depending on where
they place point (or click)), I'm OK with either behavior.


        Stefan






^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#74132: 31.0.50; thing-at-pt, ffap and Github markdown
  2024-11-09 16:59       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-11-09 17:59         ` Eli Zaretskii
  0 siblings, 0 replies; 8+ messages in thread
From: Eli Zaretskii @ 2024-11-09 17:59 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: enometh, 74132

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: enometh@meer.net,  74132@debbugs.gnu.org
> Date: Sat, 09 Nov 2024 11:59:44 -0500
> 
> >> > What will this do to URLs such as
> >> >   http://web.archive.org/web/20240221082647/https://www.imdb.com/
> >> Depends where point is: if it's after the `https`, then you get the
> >> "sub-URL" and if it's on or before the `https` then you get the whole URL.
> > Exactly.  This could be considered a bug, because the actual URL is
> > the entire thing.
> 
> We could refine our heuristic to as to keep looking backward when the
> apparent beginning of the URL is immediately preceded by a /, of course,
> but I'm not sure it's worth the trouble.

The problem is that AFAIU '/' is not the only such character.  It
could also be '=', I think (as in query URLs), and perhaps some
others.

> AFAICT any behavior we come up with will have such cases.

Yes, which is why I said I didn't know how to solve this.

> The [...](...) case mentioned by Madhu is a fairly common one IME, so
> I'm in favor of fixing it.

Not universally so, IME.  It is common in Markdown files and perhaps
also in Org.  So maybe this should be fine-tuned by major modes?





^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-11-09 17:59 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-31 10:36 bug#74132: 31.0.50; thing-at-pt, ffap and Github markdown Madhu
     [not found] ` <handler.74132.B.173037135112690.ack@debbugs.gnu.org>
2024-11-01  5:38   ` bug#74132: Acknowledgement (31.0.50; thing-at-pt, ffap and Github markdown) Madhu
2024-11-02 22:04     ` Stefan Kangas
2024-11-09 10:29 ` bug#74132: 31.0.50; thing-at-pt, ffap and Github markdown Eli Zaretskii
2024-11-09 15:52   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-11-09 16:33     ` Eli Zaretskii
2024-11-09 16:59       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-11-09 17:59         ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).