all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Ruijie Yu via "Bug reports for GNU Emacs, the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
To: Eli Zaretskii <eliz@gnu.org>
Cc: 63125@debbugs.gnu.org
Subject: bug#63125: 30.0.50; [BUG] last argument of libxml2-parse-html-region has no effect?
Date: Fri, 28 Apr 2023 09:30:30 +0800	[thread overview]
Message-ID: <sdv1qk4okoh.fsf@netyu.xyz> (raw)
In-Reply-To: <83h6t1s16p.fsf@gnu.org>

[-- Attachment #1: Type: text/plain, Size: 2482 bytes --]


Eli Zaretskii <eliz@gnu.org> writes:

>> Date: Fri, 28 Apr 2023 00:19:22 +0800
>> From:  Ruijie Yu via "Bug reports for GNU Emacs,
>>  the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
>> 
>> I'm trying out the function `libxml2-parse-html-region' as recommended
>> by a thread in help-gnu-emacs.  However, I discovered that the last
>> argument of this function does not help me normalize a relative url.
>> 
>> Reproducer:
>> 
>> Visit the attached toy html file.  I imagine that it is hosted at
>> "https://example.com/good/day".
>> 
>> Run this snippet:
>> 
>>     (pp (libxml-parse-html-region
>>          (point-min) (point-max)
>>          "https://example.com/good/day"))
>> 
>> Compare it with this snippet:
>> 
>>     (pp (libxml-parse-html-region
>>          (point-min) (point-max)))
>> 
>> What I get is this result for both snippets (which is shown twice, once
>> "pretty-printed", and once returned as a string):
>> 
>> --8<---------------cut here---------------start------------->8---
>> (html nil
>>       (body nil "\n    "
>>             (a
>>              ((href . "/hello"))
>>              "1")
>>             "\n    "
>>             (a
>>              ((href . "../world"))
>>              "2")
>>             "\n    "
>>             (a
>>              ((href . "good"))
>>              "3")
>>             "\n    "
>>             (a
>>              ((href . "morning/or/night"))
>>              "4")
>>             "\n  "))
>> --8<---------------cut here---------------end--------------->8---
>> 
>> Notice, that the href values are not normalized: they are copied
>> verbatim from the original html file.
>> 
>> If I understand the docstring correctly, the last argument of
>> `libxml2-parse-html-region', when specified as a url string, should be
>> used as the "base point" of resolving relative paths found within the
>> html document.  But the <a href=xxx> paths are not resolved at the
>> moment.
>
> If you look at xml.c, you will see that we just call a libxml function
> passing it this URL.  So if anything isn't as expected, the answer is
> in libxml, not in Emacs.

Thank you for pointing that out.  I will take a look at its source in a
day or two.  I am also upgrading it from 2.10.3-2 to 2.10.4-2, and will
see if that changes anything.

If I end up deciding that it is a libxml2 bug, I'll file a bug there and
link to this bug.

For completeness, here attached is the toy html file that I forgot to
attach in my initial report.


[-- Attachment #2: hello.html --]
[-- Type: text/html, Size: 152 bytes --]

[-- Attachment #3: Type: text/plain, Size: 134 bytes --]


-- 
Best,


RY

[Please note that this mail might go to spam due to some
misconfiguration in my mail server -- still investigating.]

  reply	other threads:[~2023-04-28  1:30 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-27 16:19 bug#63125: 30.0.50; [BUG] last argument of libxml2-parse-html-region has no effect? Ruijie Yu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-04-27 17:08 ` Eli Zaretskii
2023-04-28  1:30   ` Ruijie Yu via Bug reports for GNU Emacs, the Swiss army knife of text editors [this message]
2023-04-28 10:18     ` Ruijie Yu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-04-28 10:40       ` bug#63125: 30.0.50; [BUG] last argument of libxml-parse-html-region " Ruijie Yu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-04-28 11:31         ` Eli Zaretskii
2023-04-29  0:58           ` Ruijie Yu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-04-29  6:40             ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=sdv1qk4okoh.fsf@netyu.xyz \
    --to=bug-gnu-emacs@gnu.org \
    --cc=63125@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    --cc=ruijie@netyu.xyz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.