From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Manuel Giraud via "Emacs development discussions." Newsgroups: gmane.emacs.devel Subject: Re: [PATCH] shr.el: correct SVG attribute case Date: Mon, 29 Jan 2024 22:20:31 +0100 Message-ID: <87le87u99c.fsf@ledu-giraud.fr> References: <878r4cxjt8.fsf@sachachua.com> <87y1cawv8x.fsf@ledu-giraud.fr> Reply-To: Manuel Giraud Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="5240"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Cc: emacs-devel@gnu.org To: Sacha Chua Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Mon Jan 29 22:21:33 2024 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1rUZ4P-0001BI-Dd for ged-emacs-devel@m.gmane-mx.org; Mon, 29 Jan 2024 22:21:33 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rUZ3X-0003Da-UQ; Mon, 29 Jan 2024 16:20:40 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rUZ3W-0003Cy-BF for emacs-devel@gnu.org; Mon, 29 Jan 2024 16:20:38 -0500 Original-Received: from ledu-giraud.fr ([51.159.28.247]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rUZ3T-0004Bp-Gn for emacs-devel@gnu.org; Mon, 29 Jan 2024 16:20:37 -0500 DKIM-Signature: v=1; a=ed25519-sha256; c=simple/simple; s=ed25519; bh=dwBYIicM kQKmW4/WLMd6zxBrIFHfcN9koQq2WXcLk7Y=; h=date:references:in-reply-to: subject:cc:to:from; d=ledu-giraud.fr; b=BPbLVoSwbeHHGsVBjE1weULzK4j6zX IXYKLelhuyJ2UdpiLGVN0h999uA3se/vyzepqc/q68VYWSer6TkiU6Dw== DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; s=rsa; bh=dwBYIicMkQKmW4/W LMd6zxBrIFHfcN9koQq2WXcLk7Y=; h=date:references:in-reply-to:subject: cc:to:from; d=ledu-giraud.fr; b=GalfDuVTfhwuxZTdpfUx9XDy1OZem2/9EDOMD1 Kc9pv8/7AZqUEdnbNqUrcJYODWqy293Mao4BDPbjp/cUDdVsebpEsWePZzt8RRk4/G7imB N2EynTX/flkCBBrfVn6haYiR7tk75VSveVtEcLNCncCb1EWbfq7oz3UE2B090bLCVM20DC 145k5u+/2qodx4cUc3IESeJHKW/YE1oT401Oco5+7QNpmfhriRCrPhSx8wQpmAs22y9bmr OHy6aa19exoBoVnUi7NmrBYkWtynNWVHY6Tbc4dQoA7JS0a371MQqnr3Nk6JTWU46mpQlk qpv/Xevr8UQweDNiVF2tz/hA== Original-Received: from computer ( [10.1.1.1]) by ledu-giraud.fr (OpenSMTPD) with ESMTPSA id ec3c2ac4 (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Mon, 29 Jan 2024 22:20:32 +0100 (CET) In-Reply-To: (Sacha Chua's message of "Sat, 27 Jan 2024 12:29:37 -0500") Received-SPF: pass client-ip=51.159.28.247; envelope-from=manuel@ledu-giraud.fr; helo=ledu-giraud.fr X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:315609 Archived-At: Sacha Chua writes: > Hello, Manuel, all! > > On Sat, Jan 27, 2024, 12:18 Manuel Giraud wrote: > >> FWIW, it seems that 'libxml-parse-xml-region' seems to do the right >> thing. >> > > Yes, libxml-parse-xml-region preserves case. I think shr uses > libxml-parse-html-region to try to be more forgiving of real-world HTML, > though, as documented in > https://www.gnu.org/software/emacs/manual/html_node/elisp/Parsing-HTML_002fXML.html Hi Sacha, I have looked a bit at libxml2 and it seems that the XML and HTML parser are really two differents parsers. And the HTML one unconditionally does lower case tags ('htmlParseHTMLName' in HTMLparser.c). My idea was to see if it was possible to have an option for this but it's not the case for the HTML parser. So I think your patch with a correspondence table is the correct one here. -- Manuel Giraud