unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#63125: 30.0.50; [BUG] last argument of libxml2-parse-html-region has no effect?
@ 2023-04-27 16:19 Ruijie Yu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-04-27 17:08 ` Eli Zaretskii
  0 siblings, 1 reply; 8+ messages in thread
From: Ruijie Yu via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-04-27 16:19 UTC (permalink / raw)
  To: 63125

[I know I'm running a one-month old master.  I will try to reproduce
this issue again within a day with an up-to-date master unless someone
else does it first.  And -Q as well.]

I'm trying out the function `libxml2-parse-html-region' as recommended
by a thread in help-gnu-emacs.  However, I discovered that the last
argument of this function does not help me normalize a relative url.

Reproducer:

Visit the attached toy html file.  I imagine that it is hosted at
"https://example.com/good/day".

Run this snippet:

    (pp (libxml-parse-html-region
         (point-min) (point-max)
         "https://example.com/good/day"))

Compare it with this snippet:

    (pp (libxml-parse-html-region
         (point-min) (point-max)))

What I get is this result for both snippets (which is shown twice, once
"pretty-printed", and once returned as a string):

--8<---------------cut here---------------start------------->8---
(html nil
      (body nil "\n    "
            (a
             ((href . "/hello"))
             "1")
            "\n    "
            (a
             ((href . "../world"))
             "2")
            "\n    "
            (a
             ((href . "good"))
             "3")
            "\n    "
            (a
             ((href . "morning/or/night"))
             "4")
            "\n  "))
--8<---------------cut here---------------end--------------->8---

Notice, that the href values are not normalized: they are copied
verbatim from the original html file.

If I understand the docstring correctly, the last argument of
`libxml2-parse-html-region', when specified as a url string, should be
used as the "base point" of resolving relative paths found within the
html document.  But the <a href=xxx> paths are not resolved at the
moment.

---

In GNU Emacs 30.0.50 (build 1, x86_64-pc-linux-gnu, GTK+ Version
 3.24.37, cairo version 1.17.8) of 2023-03-25 built on ruijie
Repository revision: db7e95531ac36ae842787b6c5f2859d0642c78cc
Repository branch: makepkg
System Description: Arch Linux

Configured using:
 'configure --prefix=/usr --sysconfdir=/etc --libexecdir=/usr/lib
 --localstatedir=/var --mandir=/usr/share/man --with-gameuser=:games
 --with-modules --without-libotf --without-m17n-flt --without-gconf
 --enable-link-time-optimization --with-native-compilation=yes
 --with-xinput2 --with-pgtk --without-xaw3d --with-sound=alsa
 --with-tree-sitter '--program-transform-name=s/\([ec]tags\)/\1.emacs/'
 'CFLAGS=-march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions
 -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security
 -fstack-clash-protection -fcf-protection'
 LDFLAGS=-Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now'

Configured features:
ACL CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GPM GSETTINGS HARFBUZZ JPEG
JSON LCMS2 LIBSYSTEMD LIBXML2 MODULES NATIVE_COMP NOTIFY INOTIFY PDUMPER
PGTK PNG RSVG SECCOMP SOUND SQLITE3 THREADS TIFF TOOLKIT_SCROLL_BARS
TREE_SITTER WEBP XIM GTK3 ZLIB

Important settings:
  value of $LANG: en_US.UTF-8
  value of $XMODIFIERS: @im=fcitx
  locale-coding-system: utf-8-unix

-- 
Best,


RY

[Please note that this mail might go to spam due to some
misconfiguration in my mail server -- still investigating.]





^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-04-29  6:40 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-04-27 16:19 bug#63125: 30.0.50; [BUG] last argument of libxml2-parse-html-region has no effect? Ruijie Yu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-04-27 17:08 ` Eli Zaretskii
2023-04-28  1:30   ` Ruijie Yu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-04-28 10:18     ` Ruijie Yu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-04-28 10:40       ` bug#63125: 30.0.50; [BUG] last argument of libxml-parse-html-region " Ruijie Yu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-04-28 11:31         ` Eli Zaretskii
2023-04-29  0:58           ` Ruijie Yu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-04-29  6:40             ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).