From: Ruijie Yu via "Bug reports for GNU Emacs, the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
To: 63125@debbugs.gnu.org
Subject: bug#63125: 30.0.50; [BUG] last argument of libxml2-parse-html-region has no effect?
Date: Fri, 28 Apr 2023 00:19:22 +0800 [thread overview]
Message-ID: <sdv5y9hnv4c.fsf@netyu.xyz> (raw)
[I know I'm running a one-month old master. I will try to reproduce
this issue again within a day with an up-to-date master unless someone
else does it first. And -Q as well.]
I'm trying out the function `libxml2-parse-html-region' as recommended
by a thread in help-gnu-emacs. However, I discovered that the last
argument of this function does not help me normalize a relative url.
Reproducer:
Visit the attached toy html file. I imagine that it is hosted at
"https://example.com/good/day".
Run this snippet:
(pp (libxml-parse-html-region
(point-min) (point-max)
"https://example.com/good/day"))
Compare it with this snippet:
(pp (libxml-parse-html-region
(point-min) (point-max)))
What I get is this result for both snippets (which is shown twice, once
"pretty-printed", and once returned as a string):
--8<---------------cut here---------------start------------->8---
(html nil
(body nil "\n "
(a
((href . "/hello"))
"1")
"\n "
(a
((href . "../world"))
"2")
"\n "
(a
((href . "good"))
"3")
"\n "
(a
((href . "morning/or/night"))
"4")
"\n "))
--8<---------------cut here---------------end--------------->8---
Notice, that the href values are not normalized: they are copied
verbatim from the original html file.
If I understand the docstring correctly, the last argument of
`libxml2-parse-html-region', when specified as a url string, should be
used as the "base point" of resolving relative paths found within the
html document. But the <a href=xxx> paths are not resolved at the
moment.
---
In GNU Emacs 30.0.50 (build 1, x86_64-pc-linux-gnu, GTK+ Version
3.24.37, cairo version 1.17.8) of 2023-03-25 built on ruijie
Repository revision: db7e95531ac36ae842787b6c5f2859d0642c78cc
Repository branch: makepkg
System Description: Arch Linux
Configured using:
'configure --prefix=/usr --sysconfdir=/etc --libexecdir=/usr/lib
--localstatedir=/var --mandir=/usr/share/man --with-gameuser=:games
--with-modules --without-libotf --without-m17n-flt --without-gconf
--enable-link-time-optimization --with-native-compilation=yes
--with-xinput2 --with-pgtk --without-xaw3d --with-sound=alsa
--with-tree-sitter '--program-transform-name=s/\([ec]tags\)/\1.emacs/'
'CFLAGS=-march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions
-Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security
-fstack-clash-protection -fcf-protection'
LDFLAGS=-Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now'
Configured features:
ACL CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GPM GSETTINGS HARFBUZZ JPEG
JSON LCMS2 LIBSYSTEMD LIBXML2 MODULES NATIVE_COMP NOTIFY INOTIFY PDUMPER
PGTK PNG RSVG SECCOMP SOUND SQLITE3 THREADS TIFF TOOLKIT_SCROLL_BARS
TREE_SITTER WEBP XIM GTK3 ZLIB
Important settings:
value of $LANG: en_US.UTF-8
value of $XMODIFIERS: @im=fcitx
locale-coding-system: utf-8-unix
--
Best,
RY
[Please note that this mail might go to spam due to some
misconfiguration in my mail server -- still investigating.]
next reply other threads:[~2023-04-27 16:19 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-27 16:19 Ruijie Yu via Bug reports for GNU Emacs, the Swiss army knife of text editors [this message]
2023-04-27 17:08 ` bug#63125: 30.0.50; [BUG] last argument of libxml2-parse-html-region has no effect? Eli Zaretskii
2023-04-28 1:30 ` Ruijie Yu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-04-28 10:18 ` Ruijie Yu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-04-28 10:40 ` bug#63125: 30.0.50; [BUG] last argument of libxml-parse-html-region " Ruijie Yu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-04-28 11:31 ` Eli Zaretskii
2023-04-29 0:58 ` Ruijie Yu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-04-29 6:40 ` Eli Zaretskii
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=sdv5y9hnv4c.fsf@netyu.xyz \
--to=bug-gnu-emacs@gnu.org \
--cc=63125@debbugs.gnu.org \
--cc=ruijie@netyu.xyz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.