From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#63125: 30.0.50; [BUG] last argument of libxml2-parse-html-region has no effect? Date: Thu, 27 Apr 2023 20:08:14 +0300 Message-ID: <83h6t1s16p.fsf@gnu.org> References: Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="30843"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 63125@debbugs.gnu.org To: Ruijie Yu Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Thu Apr 27 19:09:20 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1ps57P-0007ng-43 for geb-bug-gnu-emacs@m.gmane-mx.org; Thu, 27 Apr 2023 19:09:19 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ps57A-0003BT-45; Thu, 27 Apr 2023 13:09:04 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ps578-0003B0-86 for bug-gnu-emacs@gnu.org; Thu, 27 Apr 2023 13:09:02 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1ps577-0004qR-TZ for bug-gnu-emacs@gnu.org; Thu, 27 Apr 2023 13:09:01 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1ps577-0004AL-OS for bug-gnu-emacs@gnu.org; Thu, 27 Apr 2023 13:09:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 27 Apr 2023 17:09:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 63125 X-GNU-PR-Package: emacs Original-Received: via spool by 63125-submit@debbugs.gnu.org id=B63125.168261528915938 (code B ref 63125); Thu, 27 Apr 2023 17:09:01 +0000 Original-Received: (at 63125) by debbugs.gnu.org; 27 Apr 2023 17:08:09 +0000 Original-Received: from localhost ([127.0.0.1]:59676 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ps56H-000490-A3 for submit@debbugs.gnu.org; Thu, 27 Apr 2023 13:08:09 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:49288) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ps56B-00048L-Ip for 63125@debbugs.gnu.org; Thu, 27 Apr 2023 13:08:07 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ps563-0004b1-VF; Thu, 27 Apr 2023 13:07:57 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=pgHJajaOGx66Y2ffOJ6P7/yAbvWwtm+xRu9kb49zlcQ=; b=MSvuIduF/nxP y6UdAlrZNognhA6hg4upOUo/E4f59bq5WE9l3jE/RebxkStUoe9dCFFNC1gGmiE0VnAaAynSOa2a8 naP5/af0PRRTmpnF4oocYy0AQ/m2hfq3SSkMDxGWl1dQm+Ebq9ku8rLplcTLhmB1ARVfZsG3hfXly O9zn6iZ61hVl5Um5kMz8O37smOPiaoP99leDzjSsA6Rs+q+8NRn/t+VqLPEQVu7fIFDBy7/LOZ4UR Gh/gjAhzKgt8i42alc9pnUKIyJsEEaJLZDDHbysRs4U6A5Zx1nWazme6wXrXEd1FCWJF8brmbdGoV ZF10qVhES6qmssL9jD52Mg==; Original-Received: from [87.69.77.57] (helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ps55q-0006sw-9j; Thu, 27 Apr 2023 13:07:55 -0400 In-Reply-To: (bug-gnu-emacs@gnu.org) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:260728 Archived-At: > Date: Fri, 28 Apr 2023 00:19:22 +0800 > From: Ruijie Yu via "Bug reports for GNU Emacs, > the Swiss army knife of text editors" > > I'm trying out the function `libxml2-parse-html-region' as recommended > by a thread in help-gnu-emacs. However, I discovered that the last > argument of this function does not help me normalize a relative url. > > Reproducer: > > Visit the attached toy html file. I imagine that it is hosted at > "https://example.com/good/day". > > Run this snippet: > > (pp (libxml-parse-html-region > (point-min) (point-max) > "https://example.com/good/day")) > > Compare it with this snippet: > > (pp (libxml-parse-html-region > (point-min) (point-max))) > > What I get is this result for both snippets (which is shown twice, once > "pretty-printed", and once returned as a string): > > --8<---------------cut here---------------start------------->8--- > (html nil > (body nil "\n " > (a > ((href . "/hello")) > "1") > "\n " > (a > ((href . "../world")) > "2") > "\n " > (a > ((href . "good")) > "3") > "\n " > (a > ((href . "morning/or/night")) > "4") > "\n ")) > --8<---------------cut here---------------end--------------->8--- > > Notice, that the href values are not normalized: they are copied > verbatim from the original html file. > > If I understand the docstring correctly, the last argument of > `libxml2-parse-html-region', when specified as a url string, should be > used as the "base point" of resolving relative paths found within the > html document. But the paths are not resolved at the > moment. If you look at xml.c, you will see that we just call a libxml function passing it this URL. So if anything isn't as expected, the answer is in libxml, not in Emacs.