From: "Ludovic Courtès" <ludo@gnu.org>
To: 74542@debbugs.gnu.org
Cc: "Ludovic Courtès" <ludo@gnu.org>,
"Christopher Baines" <guix@cbaines.net>,
"Josselin Poiret" <dev@jpoiret.xyz>,
"Ludovic Courtès" <ludo@gnu.org>,
"Mathieu Othacehe" <othacehe@gnu.org>,
"Simon Tournier" <zimon.toutoune@gmail.com>,
"Tobias Geerinckx-Rice" <me@tobias.gr>
Subject: [bug#74542] [PATCH v2 12/16] gnu-maintenance: ‘generic-html’ update honors <base href="…">.
Date: Fri, 29 Nov 2024 10:40:15 +0100 [thread overview]
Message-ID: <112b57b3d8cf1208f3390602dfab6932fac7c505.1732872499.git.ludo@gnu.org> (raw)
In-Reply-To: <cover.1732615193.git.ludo@gnu.org>
This fixes updates of ‘curl’: <https://curl.se/download/> includes
<base href="…"> in its head and ignoring it would lead to incorrect
download URLs.
* guix/gnu-maintenance.scm (html-links): Keep track of <base href="…">
in ‘loop’. Rewrite relative links at the end.
Change-Id: I989da78df3431034c9a584f8e10cad87ae6dc920
---
guix/gnu-maintenance.scm | 41 +++++++++++++++++++++++++++-------------
1 file changed, 28 insertions(+), 13 deletions(-)
diff --git a/guix/gnu-maintenance.scm b/guix/gnu-maintenance.scm
index b612b11c00..ee4882326f 100644
--- a/guix/gnu-maintenance.scm
+++ b/guix/gnu-maintenance.scm
@@ -39,6 +39,7 @@ (define-module (guix gnu-maintenance)
#:use-module (guix utils)
#:use-module (guix diagnostics)
#:use-module (guix i18n)
+ #:autoload (guix combinators) (fold2)
#:use-module (guix memoization)
#:use-module (guix records)
#:use-module (guix upstream)
@@ -483,19 +484,33 @@ (define* (import-release* package #:key (version #f))
(define (html-links sxml)
"Return the list of links found in SXML, the SXML tree of an HTML page."
- (let loop ((sxml sxml)
- (links '()))
- (match sxml
- (('a ('@ attributes ...) body ...)
- (match (assq 'href attributes)
- (#f (fold loop links body))
- (('href url) (fold loop (cons url links) body))))
- ((tag ('@ _ ...) body ...)
- (fold loop links body))
- ((tag body ...)
- (fold loop links body))
- (_
- links))))
+ (define-values (links base)
+ (let loop ((sxml sxml)
+ (links '())
+ (base #f))
+ (match sxml
+ (('a ('@ attributes ...) body ...)
+ (match (assq 'href attributes)
+ (#f (fold2 loop links base body))
+ (('href url) (fold2 loop (cons url links) base body))))
+ (('base ('@ ('href new-base)))
+ ;; The base against which relative URL paths must be resolved.
+ (values links new-base))
+ ((tag ('@ _ ...) body ...)
+ (fold2 loop links base body))
+ ((tag body ...)
+ (fold2 loop links base body))
+ (_
+ (values links base)))))
+
+ (if base
+ (map (lambda (link)
+ (let ((uri (string->uri link)))
+ (if (or uri (string-prefix? "/" link))
+ link
+ (in-vicinity base link))))
+ links)
+ links))
(define (url->links url)
"Return the unique links on the HTML page accessible at URL."
--
2.46.0
next prev parent reply other threads:[~2024-11-29 9:43 UTC|newest]
Thread overview: 62+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-26 10:32 [bug#74542] [PATCH 00/11] Improved tooling for package updates Ludovic Courtès
2024-11-26 10:33 ` [bug#74542] [PATCH 01/11] transformations: Export ‘package-with-upstream-version’ Ludovic Courtès
2024-11-26 15:00 ` Simon Tournier
2024-11-26 10:33 ` [bug#74542] [PATCH 02/11] gnu-maintenance: ‘import-html-release’ doesn’t abort upon HTTP 404 Ludovic Courtès
2024-11-26 15:09 ` Simon Tournier
2024-11-26 17:16 ` Ludovic Courtès
2024-11-27 17:05 ` Simon Tournier
2024-11-26 10:33 ` [bug#74542] [PATCH 03/11] gnu-maintenance: Savannah/Xorg updaters no longer abort on network errors Ludovic Courtès
2024-11-26 15:12 ` Simon Tournier
2024-11-26 10:33 ` [bug#74542] [PATCH 04/11] build: Add ‘--development’ option Ludovic Courtès
2024-11-26 15:26 ` Simon Tournier
2024-11-28 10:49 ` Ludovic Courtès
2024-11-26 10:33 ` [bug#74542] [PATCH 05/11] packages: Factorize ‘all-packages’ Ludovic Courtès
2024-11-27 18:45 ` Simon Tournier
2024-11-26 10:33 ` [bug#74542] [PATCH 06/11] guix build: Add ‘--dependents’ Ludovic Courtès
2024-11-27 19:12 ` Simon Tournier
2024-11-28 10:57 ` Ludovic Courtès
2024-11-26 10:33 ` [bug#74542] [PATCH 07/11] import: gnome: Keep going upon HTTP errors Ludovic Courtès
2024-11-26 15:26 ` Simon Tournier
2024-11-26 10:33 ` [bug#74542] [PATCH 08/11] gnu-maintenance: ‘gnu-ftp’ updater excludes GnuPG-hosted packages Ludovic Courtès
2024-11-26 15:28 ` Simon Tournier
2024-11-26 10:33 ` [bug#74542] [PATCH 09/11] gnu: Update updater properties for GnuPG-related packages Ludovic Courtès
2024-11-26 15:28 ` Simon Tournier
2024-11-26 10:33 ` [bug#74542] [PATCH 10/11] guix build: Validate that the file passed to ‘-m’ returns a manifest Ludovic Courtès
2024-11-26 15:36 ` Simon Tournier
2024-11-26 10:33 ` [bug#74542] [PATCH 11/11] etc: Add upgrade manifest Ludovic Courtès
2024-11-26 15:49 ` Simon Tournier
2024-11-26 17:18 ` Ludovic Courtès
2024-11-27 19:23 ` Simon Tournier
2024-11-26 14:42 ` [bug#74542] [PATCH 00/11] Improved tooling for package updates Ludovic Courtès
2024-11-26 16:04 ` Simon Tournier
2024-11-26 14:59 ` Simon Tournier
2024-11-26 17:21 ` Ludovic Courtès
2024-11-27 19:26 ` Simon Tournier
2024-11-26 16:32 ` Suhail Singh
2024-11-26 17:23 ` Ludovic Courtès
2024-11-29 9:40 ` [bug#74542] [PATCH v2 00/16] " Ludovic Courtès
2024-11-29 14:46 ` Maxim Cournoyer
2024-12-01 16:30 ` Ludovic Courtès
2024-11-29 15:17 ` Suhail Singh
2024-12-01 16:34 ` Ludovic Courtès
2024-11-29 15:23 ` Simon Tournier
2024-11-29 9:40 ` [bug#74542] [PATCH v2 01/16] transformations: Export ‘package-with-upstream-version’ Ludovic Courtès
2024-11-29 9:40 ` [bug#74542] [PATCH v2 02/16] gnu-maintenance: ‘import-html-release’ doesn’t abort upon HTTP 404 Ludovic Courtès
2024-11-29 14:42 ` Maxim Cournoyer
2024-11-29 9:40 ` [bug#74542] [PATCH v2 03/16] gnu-maintenance: Savannah/Xorg updaters no longer abort on network errors Ludovic Courtès
2024-11-29 9:40 ` [bug#74542] [PATCH v2 04/16] guix build: Add ‘--development’ option Ludovic Courtès
2024-11-29 14:49 ` Maxim Cournoyer
2024-11-29 9:40 ` [bug#74542] [PATCH v2 05/16] packages: Factorize ‘all-packages’ Ludovic Courtès
2024-11-29 14:53 ` Maxim Cournoyer
2024-12-01 16:37 ` Ludovic Courtès
2024-11-29 9:40 ` [bug#74542] [PATCH v2 06/16] guix build: Add ‘--dependents’ Ludovic Courtès
2024-11-29 9:40 ` [bug#74542] [PATCH v2 07/16] import: gnome: Keep going upon HTTP errors Ludovic Courtès
2024-11-29 9:40 ` [bug#74542] [PATCH v2 08/16] gnu-maintenance: ‘gnu-ftp’ updater excludes GnuPG-hosted packages Ludovic Courtès
2024-11-29 9:40 ` [bug#74542] [PATCH v2 09/16] gnu: Update updater properties for GnuPG-related packages Ludovic Courtès
2024-11-29 9:40 ` [bug#74542] [PATCH v2 10/16] gnu: gnutls: Change release monitoring URL Ludovic Courtès
2024-11-29 9:40 ` [bug#74542] [PATCH v2 11/16] gnu: git-minimal: Add ‘upstream-name’ property Ludovic Courtès
2024-11-29 9:40 ` Ludovic Courtès [this message]
2024-11-29 9:40 ` [bug#74542] [PATCH v2 13/16] guix build: Validate that the file passed to ‘-m’ returns a manifest Ludovic Courtès
2024-11-29 9:40 ` [bug#74542] [PATCH v2 14/16] transformations: ‘package-with-upstream-version’ can preserve patches Ludovic Courtès
2024-11-29 9:40 ` [bug#74542] [PATCH v2 15/16] transformations: Add #:authenticate? to ‘package-with-upstream-version’ Ludovic Courtès
2024-11-29 9:40 ` [bug#74542] [PATCH v2 16/16] etc: Add upgrade manifest Ludovic Courtès
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=112b57b3d8cf1208f3390602dfab6932fac7c505.1732872499.git.ludo@gnu.org \
--to=ludo@gnu.org \
--cc=74542@debbugs.gnu.org \
--cc=dev@jpoiret.xyz \
--cc=guix@cbaines.net \
--cc=me@tobias.gr \
--cc=othacehe@gnu.org \
--cc=zimon.toutoune@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/guix.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.