all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: "Ludovic Courtès" <ludo@gnu.org>
To: 74542@debbugs.gnu.org
Cc: "Ludovic Courtès" <ludo@gnu.org>,
	"Christopher Baines" <guix@cbaines.net>,
	"Josselin Poiret" <dev@jpoiret.xyz>,
	"Ludovic Courtès" <ludo@gnu.org>,
	"Mathieu Othacehe" <othacehe@gnu.org>,
	"Simon Tournier" <zimon.toutoune@gmail.com>,
	"Tobias Geerinckx-Rice" <me@tobias.gr>
Subject: [bug#74542] [PATCH v2 12/16] gnu-maintenance: ‘generic-html’ update honors <base href="…">.
Date: Fri, 29 Nov 2024 10:40:15 +0100	[thread overview]
Message-ID: <112b57b3d8cf1208f3390602dfab6932fac7c505.1732872499.git.ludo@gnu.org> (raw)
In-Reply-To: <cover.1732615193.git.ludo@gnu.org>

This fixes updates of ‘curl’: <https://curl.se/download/> includes
<base href="…"> in its head and ignoring it would lead to incorrect
download URLs.

* guix/gnu-maintenance.scm (html-links): Keep track of <base href="…">
in ‘loop’.  Rewrite relative links at the end.

Change-Id: I989da78df3431034c9a584f8e10cad87ae6dc920
---
 guix/gnu-maintenance.scm | 41 +++++++++++++++++++++++++++-------------
 1 file changed, 28 insertions(+), 13 deletions(-)

diff --git a/guix/gnu-maintenance.scm b/guix/gnu-maintenance.scm
index b612b11c00..ee4882326f 100644
--- a/guix/gnu-maintenance.scm
+++ b/guix/gnu-maintenance.scm
@@ -39,6 +39,7 @@ (define-module (guix gnu-maintenance)
   #:use-module (guix utils)
   #:use-module (guix diagnostics)
   #:use-module (guix i18n)
+  #:autoload   (guix combinators) (fold2)
   #:use-module (guix memoization)
   #:use-module (guix records)
   #:use-module (guix upstream)
@@ -483,19 +484,33 @@ (define* (import-release* package #:key (version #f))
 
 (define (html-links sxml)
   "Return the list of links found in SXML, the SXML tree of an HTML page."
-  (let loop ((sxml sxml)
-             (links '()))
-    (match sxml
-      (('a ('@ attributes ...) body ...)
-       (match (assq 'href attributes)
-         (#f          (fold loop links body))
-         (('href url) (fold loop (cons url links) body))))
-      ((tag ('@ _ ...) body ...)
-       (fold loop links body))
-      ((tag body ...)
-       (fold loop links body))
-      (_
-       links))))
+  (define-values (links base)
+    (let loop ((sxml sxml)
+               (links '())
+               (base #f))
+      (match sxml
+        (('a ('@ attributes ...) body ...)
+         (match (assq 'href attributes)
+           (#f          (fold2 loop links base body))
+           (('href url) (fold2 loop (cons url links) base body))))
+        (('base ('@ ('href new-base)))
+         ;; The base against which relative URL paths must be resolved.
+         (values links new-base))
+        ((tag ('@ _ ...) body ...)
+         (fold2 loop links base body))
+        ((tag body ...)
+         (fold2 loop links base body))
+        (_
+         (values links base)))))
+
+  (if base
+      (map (lambda (link)
+             (let ((uri (string->uri link)))
+               (if (or uri (string-prefix? "/" link))
+                   link
+                   (in-vicinity base link))))
+           links)
+      links))
 
 (define (url->links url)
   "Return the unique links on the HTML page accessible at URL."
-- 
2.46.0





  parent reply	other threads:[~2024-11-29  9:43 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-26 10:32 [bug#74542] [PATCH 00/11] Improved tooling for package updates Ludovic Courtès
2024-11-26 10:33 ` [bug#74542] [PATCH 01/11] transformations: Export ‘package-with-upstream-version’ Ludovic Courtès
2024-11-26 15:00   ` Simon Tournier
2024-11-26 10:33 ` [bug#74542] [PATCH 02/11] gnu-maintenance: ‘import-html-release’ doesn’t abort upon HTTP 404 Ludovic Courtès
2024-11-26 15:09   ` Simon Tournier
2024-11-26 17:16     ` Ludovic Courtès
2024-11-27 17:05       ` Simon Tournier
2024-11-26 10:33 ` [bug#74542] [PATCH 03/11] gnu-maintenance: Savannah/Xorg updaters no longer abort on network errors Ludovic Courtès
2024-11-26 15:12   ` Simon Tournier
2024-11-26 10:33 ` [bug#74542] [PATCH 04/11] build: Add ‘--development’ option Ludovic Courtès
2024-11-26 15:26   ` Simon Tournier
2024-11-28 10:49     ` Ludovic Courtès
2024-11-26 10:33 ` [bug#74542] [PATCH 05/11] packages: Factorize ‘all-packages’ Ludovic Courtès
2024-11-27 18:45   ` Simon Tournier
2024-11-26 10:33 ` [bug#74542] [PATCH 06/11] guix build: Add ‘--dependents’ Ludovic Courtès
2024-11-27 19:12   ` Simon Tournier
2024-11-28 10:57     ` Ludovic Courtès
2024-11-26 10:33 ` [bug#74542] [PATCH 07/11] import: gnome: Keep going upon HTTP errors Ludovic Courtès
2024-11-26 15:26   ` Simon Tournier
2024-11-26 10:33 ` [bug#74542] [PATCH 08/11] gnu-maintenance: ‘gnu-ftp’ updater excludes GnuPG-hosted packages Ludovic Courtès
2024-11-26 15:28   ` Simon Tournier
2024-11-26 10:33 ` [bug#74542] [PATCH 09/11] gnu: Update updater properties for GnuPG-related packages Ludovic Courtès
2024-11-26 15:28   ` Simon Tournier
2024-11-26 10:33 ` [bug#74542] [PATCH 10/11] guix build: Validate that the file passed to ‘-m’ returns a manifest Ludovic Courtès
2024-11-26 15:36   ` Simon Tournier
2024-11-26 10:33 ` [bug#74542] [PATCH 11/11] etc: Add upgrade manifest Ludovic Courtès
2024-11-26 15:49   ` Simon Tournier
2024-11-26 17:18     ` Ludovic Courtès
2024-11-27 19:23       ` Simon Tournier
2024-11-26 14:42 ` [bug#74542] [PATCH 00/11] Improved tooling for package updates Ludovic Courtès
2024-11-26 16:04   ` Simon Tournier
2024-11-26 14:59 ` Simon Tournier
2024-11-26 17:21   ` Ludovic Courtès
2024-11-27 19:26     ` Simon Tournier
2024-11-26 16:32 ` Suhail Singh
2024-11-26 17:23   ` Ludovic Courtès
2024-11-29  9:40 ` [bug#74542] [PATCH v2 00/16] " Ludovic Courtès
2024-11-29 14:46   ` Maxim Cournoyer
2024-12-01 16:30     ` Ludovic Courtès
2024-11-29 15:17   ` Suhail Singh
2024-12-01 16:34     ` Ludovic Courtès
2024-11-29 15:23   ` Simon Tournier
2024-11-29  9:40 ` [bug#74542] [PATCH v2 01/16] transformations: Export ‘package-with-upstream-version’ Ludovic Courtès
2024-11-29  9:40 ` [bug#74542] [PATCH v2 02/16] gnu-maintenance: ‘import-html-release’ doesn’t abort upon HTTP 404 Ludovic Courtès
2024-11-29 14:42   ` Maxim Cournoyer
2024-11-29  9:40 ` [bug#74542] [PATCH v2 03/16] gnu-maintenance: Savannah/Xorg updaters no longer abort on network errors Ludovic Courtès
2024-11-29  9:40 ` [bug#74542] [PATCH v2 04/16] guix build: Add ‘--development’ option Ludovic Courtès
2024-11-29 14:49   ` Maxim Cournoyer
2024-11-29  9:40 ` [bug#74542] [PATCH v2 05/16] packages: Factorize ‘all-packages’ Ludovic Courtès
2024-11-29 14:53   ` Maxim Cournoyer
2024-12-01 16:37     ` Ludovic Courtès
2024-11-29  9:40 ` [bug#74542] [PATCH v2 06/16] guix build: Add ‘--dependents’ Ludovic Courtès
2024-11-29  9:40 ` [bug#74542] [PATCH v2 07/16] import: gnome: Keep going upon HTTP errors Ludovic Courtès
2024-11-29  9:40 ` [bug#74542] [PATCH v2 08/16] gnu-maintenance: ‘gnu-ftp’ updater excludes GnuPG-hosted packages Ludovic Courtès
2024-11-29  9:40 ` [bug#74542] [PATCH v2 09/16] gnu: Update updater properties for GnuPG-related packages Ludovic Courtès
2024-11-29  9:40 ` [bug#74542] [PATCH v2 10/16] gnu: gnutls: Change release monitoring URL Ludovic Courtès
2024-11-29  9:40 ` [bug#74542] [PATCH v2 11/16] gnu: git-minimal: Add ‘upstream-name’ property Ludovic Courtès
2024-11-29  9:40 ` Ludovic Courtès [this message]
2024-11-29  9:40 ` [bug#74542] [PATCH v2 13/16] guix build: Validate that the file passed to ‘-m’ returns a manifest Ludovic Courtès
2024-11-29  9:40 ` [bug#74542] [PATCH v2 14/16] transformations: ‘package-with-upstream-version’ can preserve patches Ludovic Courtès
2024-11-29  9:40 ` [bug#74542] [PATCH v2 15/16] transformations: Add #:authenticate? to ‘package-with-upstream-version’ Ludovic Courtès
2024-11-29  9:40 ` [bug#74542] [PATCH v2 16/16] etc: Add upgrade manifest Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=112b57b3d8cf1208f3390602dfab6932fac7c505.1732872499.git.ludo@gnu.org \
    --to=ludo@gnu.org \
    --cc=74542@debbugs.gnu.org \
    --cc=dev@jpoiret.xyz \
    --cc=guix@cbaines.net \
    --cc=me@tobias.gr \
    --cc=othacehe@gnu.org \
    --cc=zimon.toutoune@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.