unofficial mirror of emacs-orgmode@gnu.org
 help / color / mirror / Atom feed
* greedy substitution in org-open-file
@ 2021-01-20 16:08 Maxim Nikulin
  2021-02-12  7:16 ` Kyle Meyer
  0 siblings, 1 reply; 5+ messages in thread
From: Maxim Nikulin @ 2021-01-20 16:08 UTC (permalink / raw)
  To: emacs-orgmode

Looking into the code related to 'pty problem with 
start-process-shell-command, I have realized that the following case is 
not handled correctly:

#+begin_src elisp
   (setq org-file-apps '(("\\.pdf::\\([0-9]+\\)\\'" . "xpdf %s %1")))
#+end_src

I hope, I adapted an example from [[help:org-file-apps]] correctly. When 
I try to open the following link using C-c C-o

[[file:test-%1f.pdf::42]]

I get

Running xpdf /home/ubuntu/examples/org/test-\42f.pdf 42...done

I believe, it should be

Running xpdf /home/ubuntu/examples/org/test-%1f.pdf 42...done

Or

Running xpdf /home/ubuntu/examples/org/test-\%1f.pdf 42...done

Org mode version 9.4.4 (release_9.4.4-164-g7a9a8a

The source of the problem is that %s substitution is expanded at first 
and regexp groups are replaced later. Ideally, it should be performed in 
a single pass. I have found format-spec function but I am unsure 
concerning it, maybe it is avoided intentionally.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: greedy substitution in org-open-file
  2021-01-20 16:08 greedy substitution in org-open-file Maxim Nikulin
@ 2021-02-12  7:16 ` Kyle Meyer
  2021-02-12 16:46   ` Maxim Nikulin
  0 siblings, 1 reply; 5+ messages in thread
From: Kyle Meyer @ 2021-02-12  7:16 UTC (permalink / raw)
  To: Maxim Nikulin; +Cc: emacs-orgmode

Maxim Nikulin writes:

> Looking into the code related to 'pty problem with 
> start-process-shell-command, I have realized that the following case is 
> not handled correctly:
>
> #+begin_src elisp
>    (setq org-file-apps '(("\\.pdf::\\([0-9]+\\)\\'" . "xpdf %s %1")))
> #+end_src

Not relevant for the underlying issue, but doesn't xpdf require a colon
before the page number (i.e. ":%1")?

> I hope, I adapted an example from [[help:org-file-apps]] correctly. When 
> I try to open the following link using C-c C-o
>
> [[file:test-%1f.pdf::42]]
>
> I get
>
> Running xpdf /home/ubuntu/examples/org/test-\42f.pdf 42...done
>
> I believe, it should be
>
> Running xpdf /home/ubuntu/examples/org/test-%1f.pdf 42...done
>
> Or
>
> Running xpdf /home/ubuntu/examples/org/test-\%1f.pdf 42...done

Yep, it should show up as "Running xpdf /tmp/test-\%1f.pdf :42...done",
with the backslash coming from shell-quote-argument.

> The source of the problem is that %s substitution is expanded at first 
> and regexp groups are replaced later. Ideally, it should be performed in 
> a single pass. I have found format-spec function but I am unsure 
> concerning it, maybe it is avoided intentionally.

I believe format-spec requires the placeholder to be A-z:

  (format-spec "xpdf %s" '((?s . "a")))  => "xpdf a"
  (format-spec "xpdf %s %1" '((?s . "a") (?1 . "b")))  ;; Invalid format string

What about flipping the processing, handling the %N placeholders first
and then formatting the file name?  Seems to work on my end, though I
haven't tested it thoroughly.

diff --git a/lisp/org.el b/lisp/org.el
index c61b8fb56..31dbe352a 100644
--- a/lisp/org.el
+++ b/lisp/org.el
@@ -8714,12 +8714,6 @@ (defun org-open-file (path &optional in-emacs line search)
       ;; Remove quotes around the file name - we'll use shell-quote-argument.
       (while (string-match "['\"]%s['\"]" cmd)
 	(setq cmd (replace-match "%s" t t cmd)))
-      (setq cmd (replace-regexp-in-string
-		 "%s"
-		 (shell-quote-argument (convert-standard-filename file))
-		 cmd
-		 nil t))
-
       ;; Replace "%1", "%2" etc. in command with group matches from regex
       (save-match-data
 	(let ((match-index 1)
@@ -8731,7 +8725,11 @@ (defun org-open-file (path &optional in-emacs line search)
 	      (while (string-match regex cmd)
 		(setq cmd (replace-match replace-with t t cmd))))
 	    (setq match-index (+ match-index 1)))))
-
+      (setq cmd (replace-regexp-in-string
+		 "%s"
+		 (shell-quote-argument (convert-standard-filename file))
+		 cmd
+		 nil t))
       (save-window-excursion
 	(message "Running %s...done" cmd)
 	(start-process-shell-command cmd nil cmd)



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: greedy substitution in org-open-file
  2021-02-12  7:16 ` Kyle Meyer
@ 2021-02-12 16:46   ` Maxim Nikulin
  2021-02-13  4:38     ` Kyle Meyer
  0 siblings, 1 reply; 5+ messages in thread
From: Maxim Nikulin @ 2021-02-12 16:46 UTC (permalink / raw)
  To: emacs-orgmode

On 12/02/2021 14:16, Kyle Meyer wrote:
>> #+begin_src elisp
>>     (setq org-file-apps '(("\\.pdf::\\([0-9]+\\)\\'" . "xpdf %s %1")))
>> #+end_src
> 
> Not relevant for the underlying issue, but doesn't xpdf require a colon
> before the page number (i.e. ":%1")?

At least for the application in debian & ubuntu xpdf package, page 
number should be specified without a colon. It is Xt interface to 
poppler PDF library, recently its maintainer decided to switch to 
xpopple project as upstream. UI is derived from old version of xpdf. 
Latest original xpdf version is based on Qt and might have different 
convention in respect to page numbers.

> I believe format-spec requires the placeholder to be A-z:
> 
>    (format-spec "xpdf %s" '((?s . "a")))  => "xpdf a"
>    (format-spec "xpdf %s %1" '((?s . "a") (?1 . "b")))  ;; Invalid format string

You are right. I missed that format-spec allows to specify field width, 
so digits could not be used.

> What about flipping the processing, handling the %N placeholders first
> and then formatting the file name?  Seems to work on my end, though I
> haven't tested it thoroughly.

I could anticipate similar problems if named destinations are involved. 
I have not checked but I expect that internal links might have "%s" in 
their names at least for some file types. That is why I would strongly 
prefer substitutions performed in a single pass. I do not like it, but 
it seems that simplified variant of format-spec is better. It should 
allows substitutions with digit. I hope, single digit should be enough.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: greedy substitution in org-open-file
  2021-02-12 16:46   ` Maxim Nikulin
@ 2021-02-13  4:38     ` Kyle Meyer
  2021-02-15 17:04       ` Maxim Nikulin
  0 siblings, 1 reply; 5+ messages in thread
From: Kyle Meyer @ 2021-02-13  4:38 UTC (permalink / raw)
  To: Maxim Nikulin; +Cc: emacs-orgmode

Maxim Nikulin writes:

> On 12/02/2021 14:16, Kyle Meyer wrote:

>> Not relevant for the underlying issue, but doesn't xpdf require a colon
>> before the page number (i.e. ":%1")?
>
> At least for the application in debian & ubuntu xpdf package, page 
> number should be specified without a colon. It is Xt interface to 
> poppler PDF library, recently its maintainer decided to switch to 
> xpopple project as upstream. UI is derived from old version of xpdf. 
> Latest original xpdf version is based on Qt and might have different 
> convention in respect to page numbers.

Okay.  Fwiw the xpdf version I have that requires ":" before the page is
4.03.

>> What about flipping the processing, handling the %N placeholders first
>> and then formatting the file name?  Seems to work on my end, though I
>> haven't tested it thoroughly.
>
> I could anticipate similar problems if named destinations are involved. 
> I have not checked but I expect that internal links might have "%s" in 
> their names at least for some file types.

Indeed, flipping the order unsurprisingly just flips which placeholders
can be problematic.  A very contrived example:

  (setq org-file-apps
        '(("\\.pdf::\\([A-z%]+\\)\\'" . "doesntmatter %s %1")))

  ;; file:/tmp/test.pdf::a%sb =>
  ;; Running doesntmatter /tmp/test.pdf a/tmp/test.pdfb...done

> That is why I would strongly prefer substitutions performed in a
> single pass. I do not like it, but it seems that simplified variant of
> format-spec is better. It should allows substitutions with digit. I
> hope, single digit should be enough.

True, realistically I don't think anyone has a command in org-file-apps
that relies on more than a couple of capture groups.  All right, here's
a format-spec-inspired fix.  At the very least it needs doc updates and
a comment or two.

diff --git a/lisp/org.el b/lisp/org.el
index 5b1443c4e..e8f60fd83 100644
--- a/lisp/org.el
+++ b/lisp/org.el
@@ -8644,6 +8644,23 @@ (defun org--file-apps-regexp-alist (list &optional add-auto-mode)
    (when add-auto-mode
      (mapcar (lambda (x) (cons (car x) 'emacs)) auto-mode-alist))))
 
+(defun org--open-file-format-spec (format specification)
+  (with-temp-buffer
+    (insert format)
+    (goto-char (point-min))
+    (while (search-forward "%" nil t)
+      (cond ((eq (char-after) ?%)
+             (delete-char 1))
+            ((looking-at "[s0-9]")
+             (replace-match
+              (or (cdr (assoc (match-string 0) specification))
+                  (error "Invalid format string"))
+              'fixed-case 'literal)
+             (delete-region (1- (match-beginning 0)) (match-beginning 0)))
+            (t
+             (error "Invalid format string"))))
+    (buffer-string)))
+
 ;;;###autoload
 (defun org-open-file (path &optional in-emacs line search)
   "Open the file at PATH.
@@ -8745,24 +8762,20 @@ (defun org-open-file (path &optional in-emacs line search)
       ;; Remove quotes around the file name - we'll use shell-quote-argument.
       (while (string-match "['\"]%s['\"]" cmd)
 	(setq cmd (replace-match "%s" t t cmd)))
-      (setq cmd (replace-regexp-in-string
-		 "%s"
-		 (shell-quote-argument (convert-standard-filename file))
-		 cmd
-		 nil t))
-
-      ;; Replace "%1", "%2" etc. in command with group matches from regex
-      (save-match-data
-	(let ((match-index 1)
-	      (number-of-groups (- (/ (length link-match-data) 2) 1)))
-	  (set-match-data link-match-data)
-	  (while (<= match-index number-of-groups)
-	    (let ((regex (concat "%" (number-to-string match-index)))
-		  (replace-with (match-string match-index dlink)))
-	      (while (string-match regex cmd)
-		(setq cmd (replace-match replace-with t t cmd))))
-	    (setq match-index (+ match-index 1)))))
-
+      (setq cmd
+            (org--open-file-format-spec
+             cmd
+             (cons
+              (cons "s" (shell-quote-argument
+                         (convert-standard-filename file)))
+              (let ((ngroups (- (/ (length link-match-data) 2) 1)))
+                (and (> ngroups 0)
+                     (progn
+                       (set-match-data link-match-data)
+                       (mapcar (lambda (n)
+                                 (cons (number-to-string n)
+                                       (match-string-no-properties n dlink)))
+                               (number-sequence 1 ngroups))))))))
       (save-window-excursion
 	(message "Running %s...done" cmd)
 	(start-process-shell-command cmd nil cmd)


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: greedy substitution in org-open-file
  2021-02-13  4:38     ` Kyle Meyer
@ 2021-02-15 17:04       ` Maxim Nikulin
  0 siblings, 0 replies; 5+ messages in thread
From: Maxim Nikulin @ 2021-02-15 17:04 UTC (permalink / raw)
  To: emacs-orgmode

On 13/02/2021 11:38, Kyle Meyer wrote:
>
> All right, here's
> a format-spec-inspired fix.  At the very least it needs doc updates and
> a comment or two.

Thank you. I am hardly familiar with elisp so it would be difficult for 
me to express the same. My comments are mostly a matter of taste.

Sorry, I have not tried to run the code yet.

> diff --git a/lisp/org.el b/lisp/org.el
> index 5b1443c4e..e8f60fd83 100644
> --- a/lisp/org.el
> +++ b/lisp/org.el
> @@ -8644,6 +8644,23 @@ (defun org--file-apps-regexp-alist (list &optional add-auto-mode)
>      (when add-auto-mode
>        (mapcar (lambda (x) (cons (car x) 'emacs)) auto-mode-alist))))
>   
> +(defun org--open-file-format-spec (format specification)
> +  (with-temp-buffer
> +    (insert format)
> +    (goto-char (point-min))
> +    (while (search-forward "%" nil t)
> +      (cond ((eq (char-after) ?%)
> +             (delete-char 1))
> +            ((looking-at "[s0-9]")

Personally I do not see a reason to limit substitutions just to just 
"%s". I would consider "[a-zA-Z0-9]".

> +             (replace-match
> +              (or (cdr (assoc (match-string 0) specification))
> +                  (error "Invalid format string"))

I think, substitution character in the error message could be useful 
during debugging, something like (format "Invalid format character %%%s" 
(match-string 0)).

> +              'fixed-case 'literal)
> +             (delete-region (1- (match-beginning 0)) (match-beginning 0)))
> +            (t
> +             (error "Invalid format string"))))
> +    (buffer-string)))
> +
>   ;;;###autoload
>   (defun org-open-file (path &optional in-emacs line search)
>     "Open the file at PATH.
> @@ -8745,24 +8762,20 @@ (defun org-open-file (path &optional in-emacs line search)
>         ;; Remove quotes around the file name - we'll use shell-quote-argument.
>         (while (string-match "['\"]%s['\"]" cmd)
>   	(setq cmd (replace-match "%s" t t cmd)))
> -      (setq cmd (replace-regexp-in-string
> -		 "%s"
> -		 (shell-quote-argument (convert-standard-filename file))
> -		 cmd
> -		 nil t))
> -
> -      ;; Replace "%1", "%2" etc. in command with group matches from regex
> -      (save-match-data
> -	(let ((match-index 1)
> -	      (number-of-groups (- (/ (length link-match-data) 2) 1)))
> -	  (set-match-data link-match-data)
> -	  (while (<= match-index number-of-groups)
> -	    (let ((regex (concat "%" (number-to-string match-index)))
> -		  (replace-with (match-string match-index dlink)))
> -	      (while (string-match regex cmd)
> -		(setq cmd (replace-match replace-with t t cmd))))
> -	    (setq match-index (+ match-index 1)))))
> -
> +      (setq cmd
> +            (org--open-file-format-spec
> +             cmd
> +             (cons
> +              (cons "s" (shell-quote-argument
> +                         (convert-standard-filename file)))
> +              (let ((ngroups (- (/ (length link-match-data) 2) 1)))
> +                (and (> ngroups 0)
> +                     (progn
> +                       (set-match-data link-match-data)
> +                       (mapcar (lambda (n)
> +                                 (cons (number-to-string n)
> +                                       (match-string-no-properties n dlink)))

Should not be numbered substitutions passed through shell-quote-argument 
as well? Besides just page numbers the link could be named internal 
anchor where more characters are allowed. It is the reason why in 
general I prefer bare exec to shell commands.

I am unsure concerning optional matches as

     "\\(?:\\.pdf\\)\\(?:::\\([0-9]+\\)\\)?\\'"

Maybe they should not be used at all in favor of separate entries with 
and without page number. Maybe nil should coalesce to empty string "". 
Certainly I am against "nil" string for a missed group. By the way, in 
the original format-spec, cdr is applied after the check whether assoc 
value is nil.

> +                               (number-sequence 1 ngroups))))))))
>         (save-window-excursion
>   	(message "Running %s...done" cmd)
>   	(start-process-shell-command cmd nil cmd)





^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-02-15 17:10 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-20 16:08 greedy substitution in org-open-file Maxim Nikulin
2021-02-12  7:16 ` Kyle Meyer
2021-02-12 16:46   ` Maxim Nikulin
2021-02-13  4:38     ` Kyle Meyer
2021-02-15 17:04       ` Maxim Nikulin

unofficial mirror of emacs-orgmode@gnu.org

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://yhetil.org/orgmode/0 orgmode/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 orgmode orgmode/ https://yhetil.org/orgmode \
		emacs-orgmode@gnu.org
	public-inbox-index orgmode

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.yhetil.org/yhetil.emacs.orgmode
	nntp://news.gmane.io/gmane.emacs.orgmode


AGPL code for this site: git clone http://ou63pmih66umazou.onion/public-inbox.git