From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: chad Newsgroups: gmane.emacs.devel Subject: Re: Bug in url-retrieve-synchronously from url.el on redirect Date: Mon, 13 Jul 2020 11:15:01 -0700 Message-ID: References: <08fad79e-9b6b-6ff4-66fd-c32fdf5b7189@grinta.net> <8d25cbf6-5cc8-25d5-89bc-5a7a74d477c4@grinta.net> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="00000000000000914b05aa56acd2" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="22988"; mail-complaints-to="usenet@ciao.gmane.io" Cc: wmperry@gnu.org, Stefan Monnier , EMACS development team To: Daniele Nicolodi Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Mon Jul 13 20:15:54 2020 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1jv2zY-0005rH-MI for ged-emacs-devel@m.gmane-mx.org; Mon, 13 Jul 2020 20:15:52 +0200 Original-Received: from localhost ([::1]:60748 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jv2zX-0004rC-LM for ged-emacs-devel@m.gmane-mx.org; Mon, 13 Jul 2020 14:15:51 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:35172) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jv2yz-0004Ll-Bq for emacs-devel@gnu.org; Mon, 13 Jul 2020 14:15:17 -0400 Original-Received: from mail-yb1-xb34.google.com ([2607:f8b0:4864:20::b34]:37439) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1jv2yx-0001yD-El; Mon, 13 Jul 2020 14:15:17 -0400 Original-Received: by mail-yb1-xb34.google.com with SMTP id x9so6861161ybd.4; Mon, 13 Jul 2020 11:15:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Ci9Vx02zfM7m2Ol1WPTUMMG2jgValFO/JNna5yj1TLk=; b=kLIm4rmpwymy3icdZyLICIj6A8Psqk2qqYMG1MShwH/euS+1/inUwxPCY2Iy+1RUMb lYe79GubDD796uS+S0fWyqX/b7lir1kPhOd5f0Kflhru76Y4arQpycwXAt9DBGb8EVsY Lxe/DDZIdlhgbj2yxfOtwCub2hz/jJZZRpXP4ys6+Vc+ixsvIb9TZCreFZlHtHn+pSKK jCFGNp0yO68DEnUXmb9E9J3ML6mmxQFl31JF8Etg35vJyautbuWA3jNnvNDQ0CzGAY+R uOVd+9ms4SrWW6ncSOt2rX/2Sg+ZtlllhITLGhtV8zTg3qWjTzwLzLtQEUW2kb27yvq5 67LQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Ci9Vx02zfM7m2Ol1WPTUMMG2jgValFO/JNna5yj1TLk=; b=HXsWXjJp5Avkse3sETGyThlxg43G1DPgsMGwj1Vp1um9Q4Gy87x78gnllANejtnMjm HMMM+dnByZagE5wpGLk04WA5l62Tq6HPq9lgc857ozudz+0/IUubjZFn+J/aQwQVWwNb yoRyiEJPn1C43INbMQ+YL0N/ieo0i3cNks6UuixCiz/Jc09GzaL9lOb6MESRBx/awtzR tgfcLyMfb8tMqXHT6MDKArXQP1TQpx9fBk0nlnSv17cd1XZ4patwkdzepk0gtPFtaxjB oP/JaDvhyaMKkSYjYLPLE4i7sY4t+nuto/ckV4Wj0KcUAb+Jqdcrja1J2q/iRi1KmV2g tx6A== X-Gm-Message-State: AOAM532I7gEeshSGMCAdJmDFpK9rQo4y+/A+8ov1Wf2g0z8v8Mmf239Y RSdhu8wuL7yXuTo68Trcq2c2xsaB1nLQKtFTAVk= X-Google-Smtp-Source: ABdhPJy/nd3cs1DjwMDee2xZ10gsApWLyH4TZAgXL7YXYbVPMNxU0LrktOp4XSGE9uB72ZjjBO/y9H9E6YZUyFGbeqs= X-Received: by 2002:a25:ac66:: with SMTP id r38mr2031940ybd.105.1594664113179; Mon, 13 Jul 2020 11:15:13 -0700 (PDT) In-Reply-To: Received-SPF: pass client-ip=2607:f8b0:4864:20::b34; envelope-from=yandros@gmail.com; helo=mail-yb1-xb34.google.com X-detected-operating-system: by eggs.gnu.org: No matching host in p0f cache. That's all we know. X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:252918 Archived-At: --00000000000000914b05aa56acd2 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Very likely the stripping code dates from a time period when code would recognize strings inside angle-brackets as potential URLs/URIs, and passed the entire string to the url library out of simplicity. If memory serves, Bill Perry's original url code dates from the wild and wooly early days of loose url encoding. I would expect that it can be changed safely. Hope that helps, ~Chad On Fri, Jul 10, 2020 at 5:55 PM Daniele Nicolodi wrote= : > On 10-07-2020 14:32, Daniele Nicolodi wrote: > > On 10/07/2020 14:25, Yuri Khan wrote: > >> On Sat, 11 Jul 2020 at 02:43, Daniele Nicolodi > wrote: > >> > >>> As far as I understand the RFCs (and being wrong before, I may be wro= ng > >>> again) do not allow for < > quoting either. Why does url-http.el stri= p > >>> them? Why does it break the URI at the first space if spaces are not > >>> allowed? > >> > >> I cannot answer that, maybe someone who is knowledgeable about > >> uri-http.el chimes in. > >> > >> RFC 7231 allows clients to attempt to DTRT with invalid Location URIs > >> in any way they deem appropriate; you could argue for a different > >> recovery heuristic. Me, I=E2=80=99d rather have things break loudly on= each > >> violation, so that it does not go unnoticed for too long. Postel=E2=80= =99s > >> Razor is how we got HTML in its current shape. > > > > I tend to agree with you, but, in this specific case, being compatible > > with other HTTP implementations is a worthwhile goal. > > > > Unfortunately, re-defining url-http-parse-headers is the only > > work-around I found to make Emacs do the less bad thing when dealing > > with this malformed URIs. > > Bill, you seem to be the author of this code, although Stefan is the one > that introduced it to the Emacs accordingly to git blame. Do any of you > know why the redirect Location is handled like that? > > I would like to suggest the two attached patches. The first fixes actual > issues I encountered, the second simply adjusts a comment. > > Thank you. > > Cheers, > Dan > > --00000000000000914b05aa56acd2 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Very likely=C2=A0the </> stripping code dates from a= time period when code would recognize strings inside angle-brackets as pot= ential URLs/URIs, and passed the entire string to the url library out of si= mplicity. If memory serves, Bill Perry's original url code dates from t= he wild and wooly early days of loose url encoding. I would expect that it = can be changed safely.

Hope that helps,
~Chad<= /div>

On Fri, Jul 10, 2020 at 5:55 PM Daniele Nicolodi <daniele@grinta.net> wrote:
On 10-07-2020 14:32, Daniele Nicolod= i wrote:
> On 10/07/2020 14:25, Yuri Khan wrote:
>> On Sat, 11 Jul 2020 at 02:43, Daniele Nicolodi <daniele@grinta.net> wrote:<= br> >>
>>> As far as I understand the RFCs (and being wrong before, I may= be wrong
>>> again) do not allow for < > quoting either. Why does url= -http.el strip
>>> them? Why does it break the URI at the first space if spaces a= re not
>>> allowed?
>>
>> I cannot answer that, maybe someone who is knowledgeable about
>> uri-http.el chimes in.
>>
>> RFC 7231 allows clients to attempt to DTRT with invalid Location U= RIs
>> in any way they deem appropriate; you could argue for a different<= br> >> recovery heuristic. Me, I=E2=80=99d rather have things break loudl= y on each
>> violation, so that it does not go unnoticed for too long. Postel= =E2=80=99s
>> Razor is how we got HTML in its current shape.
>
> I tend to agree with you, but, in this specific case, being compatible=
> with other HTTP implementations is a worthwhile goal.
>
> Unfortunately, re-defining url-http-parse-headers is the only
> work-around I found to make Emacs do the less bad thing when dealing > with this malformed URIs.

Bill, you seem to be the author of this code, although Stefan is the one that introduced it to the Emacs accordingly to git blame. Do any of you
know why the redirect Location is handled like that?

I would like to suggest the two attached patches. The first fixes actual issues I encountered, the second simply adjusts a comment.

Thank you.

Cheers,
Dan

--00000000000000914b05aa56acd2--