From: Eli Zaretskii <eliz@gnu.org>
To: Ruijie Yu <ruijie@netyu.xyz>
Cc: 63125@debbugs.gnu.org
Subject: bug#63125: 30.0.50; [BUG] last argument of libxml2-parse-html-region has no effect?
Date: Thu, 27 Apr 2023 20:08:14 +0300 [thread overview]
Message-ID: <83h6t1s16p.fsf@gnu.org> (raw)
In-Reply-To: <sdv5y9hnv4c.fsf@netyu.xyz> (bug-gnu-emacs@gnu.org)
> Date: Fri, 28 Apr 2023 00:19:22 +0800
> From: Ruijie Yu via "Bug reports for GNU Emacs,
> the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
>
> I'm trying out the function `libxml2-parse-html-region' as recommended
> by a thread in help-gnu-emacs. However, I discovered that the last
> argument of this function does not help me normalize a relative url.
>
> Reproducer:
>
> Visit the attached toy html file. I imagine that it is hosted at
> "https://example.com/good/day".
>
> Run this snippet:
>
> (pp (libxml-parse-html-region
> (point-min) (point-max)
> "https://example.com/good/day"))
>
> Compare it with this snippet:
>
> (pp (libxml-parse-html-region
> (point-min) (point-max)))
>
> What I get is this result for both snippets (which is shown twice, once
> "pretty-printed", and once returned as a string):
>
> --8<---------------cut here---------------start------------->8---
> (html nil
> (body nil "\n "
> (a
> ((href . "/hello"))
> "1")
> "\n "
> (a
> ((href . "../world"))
> "2")
> "\n "
> (a
> ((href . "good"))
> "3")
> "\n "
> (a
> ((href . "morning/or/night"))
> "4")
> "\n "))
> --8<---------------cut here---------------end--------------->8---
>
> Notice, that the href values are not normalized: they are copied
> verbatim from the original html file.
>
> If I understand the docstring correctly, the last argument of
> `libxml2-parse-html-region', when specified as a url string, should be
> used as the "base point" of resolving relative paths found within the
> html document. But the <a href=xxx> paths are not resolved at the
> moment.
If you look at xml.c, you will see that we just call a libxml function
passing it this URL. So if anything isn't as expected, the answer is
in libxml, not in Emacs.
next prev parent reply other threads:[~2023-04-27 17:08 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-27 16:19 bug#63125: 30.0.50; [BUG] last argument of libxml2-parse-html-region has no effect? Ruijie Yu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-04-27 17:08 ` Eli Zaretskii [this message]
2023-04-28 1:30 ` Ruijie Yu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-04-28 10:18 ` Ruijie Yu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-04-28 10:40 ` bug#63125: 30.0.50; [BUG] last argument of libxml-parse-html-region " Ruijie Yu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-04-28 11:31 ` Eli Zaretskii
2023-04-29 0:58 ` Ruijie Yu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-04-29 6:40 ` Eli Zaretskii
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=83h6t1s16p.fsf@gnu.org \
--to=eliz@gnu.org \
--cc=63125@debbugs.gnu.org \
--cc=ruijie@netyu.xyz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.