From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Ruijie Yu via "Bug reports for GNU Emacs, the Swiss army knife of text editors" Newsgroups: gmane.emacs.bugs Subject: bug#63125: 30.0.50; [BUG] last argument of libxml2-parse-html-region has no effect? Date: Fri, 28 Apr 2023 09:30:30 +0800 Message-ID: References: <83h6t1s16p.fsf@gnu.org> Reply-To: Ruijie Yu Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="25250"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: mu4e 1.9.22; emacs 30.0.50 Cc: 63125@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Fri Apr 28 03:34:26 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1psD0E-0006LZ-29 for geb-bug-gnu-emacs@m.gmane-mx.org; Fri, 28 Apr 2023 03:34:26 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1psCzs-0002HA-Ig; Thu, 27 Apr 2023 21:34:04 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1psCzq-0002Gx-PE for bug-gnu-emacs@gnu.org; Thu, 27 Apr 2023 21:34:02 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1psCzq-00085u-FD for bug-gnu-emacs@gnu.org; Thu, 27 Apr 2023 21:34:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1psCzp-0006r1-Tw for bug-gnu-emacs@gnu.org; Thu, 27 Apr 2023 21:34:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Ruijie Yu Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Fri, 28 Apr 2023 01:34:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 63125 X-GNU-PR-Package: emacs Original-Received: via spool by 63125-submit@debbugs.gnu.org id=B63125.168264562726317 (code B ref 63125); Fri, 28 Apr 2023 01:34:01 +0000 Original-Received: (at 63125) by debbugs.gnu.org; 28 Apr 2023 01:33:47 +0000 Original-Received: from localhost ([127.0.0.1]:60141 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1psCzb-0006qP-7Q for submit@debbugs.gnu.org; Thu, 27 Apr 2023 21:33:47 -0400 Original-Received: from netyu.xyz ([152.44.41.246]:36074 helo=mail.netyu.xyz) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1psCzW-0006q7-QL for 63125@debbugs.gnu.org; Thu, 27 Apr 2023 21:33:46 -0400 Original-Received: from fw.net.yu.netyu.xyz ( [222.248.4.98]) by netyu.xyz (OpenSMTPD) with ESMTPSA id 706b7ef3 (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Fri, 28 Apr 2023 01:33:40 +0000 (UTC) In-reply-to: <83h6t1s16p.fsf@gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:260744 Archived-At: --=-=-= Content-Type: text/plain Eli Zaretskii writes: >> Date: Fri, 28 Apr 2023 00:19:22 +0800 >> From: Ruijie Yu via "Bug reports for GNU Emacs, >> the Swiss army knife of text editors" >> >> I'm trying out the function `libxml2-parse-html-region' as recommended >> by a thread in help-gnu-emacs. However, I discovered that the last >> argument of this function does not help me normalize a relative url. >> >> Reproducer: >> >> Visit the attached toy html file. I imagine that it is hosted at >> "https://example.com/good/day". >> >> Run this snippet: >> >> (pp (libxml-parse-html-region >> (point-min) (point-max) >> "https://example.com/good/day")) >> >> Compare it with this snippet: >> >> (pp (libxml-parse-html-region >> (point-min) (point-max))) >> >> What I get is this result for both snippets (which is shown twice, once >> "pretty-printed", and once returned as a string): >> >> --8<---------------cut here---------------start------------->8--- >> (html nil >> (body nil "\n " >> (a >> ((href . "/hello")) >> "1") >> "\n " >> (a >> ((href . "../world")) >> "2") >> "\n " >> (a >> ((href . "good")) >> "3") >> "\n " >> (a >> ((href . "morning/or/night")) >> "4") >> "\n ")) >> --8<---------------cut here---------------end--------------->8--- >> >> Notice, that the href values are not normalized: they are copied >> verbatim from the original html file. >> >> If I understand the docstring correctly, the last argument of >> `libxml2-parse-html-region', when specified as a url string, should be >> used as the "base point" of resolving relative paths found within the >> html document. But the paths are not resolved at the >> moment. > > If you look at xml.c, you will see that we just call a libxml function > passing it this URL. So if anything isn't as expected, the answer is > in libxml, not in Emacs. Thank you for pointing that out. I will take a look at its source in a day or two. I am also upgrading it from 2.10.3-2 to 2.10.4-2, and will see if that changes anything. If I end up deciding that it is a libxml2 bug, I'll file a bug there and link to this bug. For completeness, here attached is the toy html file that I forgot to attach in my initial report. --=-=-= Content-Type: text/html Content-Disposition: attachment; filename=hello.html 1 2 3 4 --=-=-= Content-Type: text/plain -- Best, RY [Please note that this mail might go to spam due to some misconfiguration in my mail server -- still investigating.] --=-=-=--