From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Daniele Nicolodi Newsgroups: gmane.emacs.devel Subject: Re: Bug in url-retrieve-synchronously from url.el on redirect Date: Mon, 13 Jul 2020 12:48:03 -0600 Message-ID: <05cca25d-4e5e-4979-fca8-a5d4bfb9a22e@grinta.net> References: <08fad79e-9b6b-6ff4-66fd-c32fdf5b7189@grinta.net> <8d25cbf6-5cc8-25d5-89bc-5a7a74d477c4@grinta.net> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="19508"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:68.0) Gecko/20100101 Thunderbird/68.9.0 Cc: wmperry@gnu.org, Stefan Monnier , EMACS development team To: chad Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Mon Jul 13 20:49:09 2020 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1jv3Vl-0004y5-P7 for ged-emacs-devel@m.gmane-mx.org; Mon, 13 Jul 2020 20:49:09 +0200 Original-Received: from localhost ([::1]:55798 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jv3Vk-0003JF-P1 for ged-emacs-devel@m.gmane-mx.org; Mon, 13 Jul 2020 14:49:08 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:52038) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jv3V2-0002ny-6D for emacs-devel@gnu.org; Mon, 13 Jul 2020 14:48:24 -0400 Original-Received: from grinta.net ([109.74.203.128]:43086) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jv3V0-00007t-0Z; Mon, 13 Jul 2020 14:48:23 -0400 Original-Received: from [192.168.43.29] (mobile-107-77-164-91.mobile.att.net [107.77.164.91]) (Authenticated sender: daniele) by grinta.net (Postfix) with ESMTPSA id 046B0E0ED8; Mon, 13 Jul 2020 18:48:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=grinta.net; s=2020; t=1594666098; bh=76ycK7qjGE7hLNRjKibrqmLUvmMPZ9un0bP0CK+4P1c=; h=Subject:To:Cc:References:From:Date:In-Reply-To:From; b=i83YEl9vtU29Rhj3OeHSS832muAS0YZ66RM/fz04Xl3lTCpE3w1KkuVG3Z3VH/Sfa HLy5+KJf8OgZHBBM/EJxnwVM5yrRLThKgjp4eitI0ScHm1U/lWoKettRjsnBNlz/l8 FUiuYhGbO8vFjsmwk+avl7j7HTcMQ6YR0fMU67ZHYnLIUIep+ePCgQc//roInfzeZr CkFd3cnOpgnnStQmWZQ/tjlmdhsRbfMGWr0Uk5KcXoS8VUKJaJVBh3s0gzX1qUjw4W +LYMiy/QvK0IQsjmXVkSzIcfreL9bnSi55HsTAkW6vdsobXJJ5oA2632l14egAZJdM 1bHuYFzsXyo/Q== In-Reply-To: Content-Language: en-US Received-SPF: pass client-ip=109.74.203.128; envelope-from=daniele@grinta.net; helo=grinta.net X-detected-operating-system: by eggs.gnu.org: First seen = 2020/07/13 14:48:18 X-ACL-Warn: Detected OS = Linux 2.2.x-3.x [generic] X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:252921 Archived-At: On 13/07/2020 12:15, chad wrote: > Very likely the stripping code dates from a time period when code > would recognize strings inside angle-brackets as potential URLs/URIs, > and passed the entire string to the url library out of simplicity. If > memory serves, Bill Perry's original url code dates from the wild and > wooly early days of loose url encoding. I would expect that it can be > changed safely. > > Hope that helps, > ~Chad Thanks Chad. This does not quite explain the presence of the stripping in handling HTTP protocol headers. But it may be that with subsequent refactoring, this coded ended where it is now. It would be nice if someone with commit rights could find a couple of spare cycles to comment on these patches and hopefully apply them. PS: Emails to Bill Perry address are bouncing for me. Cheers, Dan > > On Fri, Jul 10, 2020 at 5:55 PM Daniele Nicolodi > wrote: > > On 10-07-2020 14:32, Daniele Nicolodi wrote: > > On 10/07/2020 14:25, Yuri Khan wrote: > >> On Sat, 11 Jul 2020 at 02:43, Daniele Nicolodi > > wrote: > >> > >>> As far as I understand the RFCs (and being wrong before, I may > be wrong > >>> again) do not allow for < > quoting either. Why does url-http.el > strip > >>> them? Why does it break the URI at the first space if spaces are not > >>> allowed? > >> > >> I cannot answer that, maybe someone who is knowledgeable about > >> uri-http.el chimes in. > >> > >> RFC 7231 allows clients to attempt to DTRT with invalid Location URIs > >> in any way they deem appropriate; you could argue for a different > >> recovery heuristic. Me, I’d rather have things break loudly on each > >> violation, so that it does not go unnoticed for too long. Postel’s > >> Razor is how we got HTML in its current shape. > > > > I tend to agree with you, but, in this specific case, being compatible > > with other HTTP implementations is a worthwhile goal. > > > > Unfortunately, re-defining url-http-parse-headers is the only > > work-around I found to make Emacs do the less bad thing when dealing > > with this malformed URIs. > > Bill, you seem to be the author of this code, although Stefan is the one > that introduced it to the Emacs accordingly to git blame. Do any of you > know why the redirect Location is handled like that? > > I would like to suggest the two attached patches. The first fixes actual > issues I encountered, the second simply adjusts a comment. > > Thank you. > > Cheers, > Dan >