From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-2.7 required=3.0 tests=AWL,BAYES_00, DATE_IN_PAST_12_24,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, SPF_HELO_NONE,SPF_PASS shortcircuit=no autolearn=no autolearn_force=no version=3.4.6 Received: from todd.t-8ch.de (todd.t-8ch.de [IPv6:2a01:4f8:c010:41de::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 564C31F59D for ; Mon, 9 Jan 2023 14:14:11 +0000 (UTC) Authentication-Results: dcvr.yhbt.net; dkim=pass (1024-bit key; unprotected) header.d=t-8ch.de header.i=@t-8ch.de header.a=rsa-sha256 header.s=mail header.b=HS+wm9Yq; dkim-atps=neutral Date: Sun, 8 Jan 2023 21:54:19 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=t-8ch.de; s=mail; t=1673273649; bh=o1AHNCjUm3UUd84EHclZH6RaxIsYFmIAwbQM1tgBJto=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=HS+wm9YqLvzJe1hXRz5R2YO0LO6agS0Ovpf3bw3oF49V7oTjkmfQsZEoXsvbvRz7D Q9Y2uksZfxxnRS4/ZxRP/V+jagVYCtavhDvYorok16U0EZQqzRR45bL9ywlji4Og4R gGVfYi25ujP82PtHPZSycoMSoIU83d+JNoZd82GY= From: Thomas =?utf-8?Q?Wei=C3=9Fschuh?= To: Eric Wong Cc: meta@public-inbox.org Subject: Re: Add "generator" information to HTML pages Message-ID: <20230108215419.gcnbpk7er7f7davy@snowball.t-8ch.de> References: <20230108190404.nghzrip46oh4wl3p@snowball.t-8ch.de> <20230108194738.M225235@dcvr> <20230108200233.y2zqecm3ob47gsdd@snowball.t-8ch.de> <20230108205804.M144044@dcvr> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20230108205804.M144044@dcvr> List-Id: On Sun, Jan 08, 2023 at 08:58:04PM +0000, Eric Wong wrote: > Thomas Weißschuh wrote: > > On Sun, Jan 08, 2023 at 07:47:38PM +0000, Eric Wong wrote: > > > Thomas Weißschuh wrote: > > > > it would be nice if public-inbox could extend the HTML pages it > > > > generates with the "generator" meta tag [0]. > > > > Especially the version would be useful. > > > > > > > > This would help users during debugging to see the specific version of > > > > public-inbox they are looking at. > > > > > > What would users be debugging? > > > Admins would be the only ones who care, I think... > > > > Since recently my mails to linux-kernel@vger.kernel.org that should end > > up on public-inbox on https://lore.kernel.org/lkml/ don't do so. > > They are accepted by the mail server on vger.kernel.org but never end up > > in the archives. > > I suspect some interactions between b4 which is used to generate the > > mails, the unicode characters in my name and public-inbox to be the > > culprit. > > Your mail seem fine to my server, but coming from an IPv6 > address has caused problems with some other servers in the past. > Another potential thing might be your use of utf-8 in the From: > header, while your Content-Type: is iso-8859-1 for the body. I think I found the culprit. And it is indeed the b4 tool, or rather the Python email library it is using. Posting it here because you might know if this is standards conform or if it would be reasonable to carry a workaround inside public-inbox. When b4 passes the message to Pythons email.message.EmailMessage the 'To' header is just a long, unencoded string containing all recipients and their unicode names. EmailMessage then makes sure that this string conforms to legal email header values. It performs linewrapping and the special header utf-8 encoding/escaping. However IFF a header line contains unicode character and IFF the first character of a linewrapped line is a comma (,) then that comma will also be utf-8 escaped. Example input: 01234567890123456789012345678901234567890123456789012345678901234567890123, ä Example output 01234567890123456789012345678901234567890123456789012345678901234567890123 =?utf-8?q?=2C?= =?utf-8?q?=C3=A4?= I expect this to be a bug in the python library but maybe it is correct. > > This is what I wanted to reproduce locally, for which exact versions > > would have been nice. > > I remember Konstantin has cherry-picked some commits from > public-inbox.git in the past, and I suspect he already > has https://public-inbox.org/meta/20221124213155.M736847@dcvr/ > ("eml: header_raw converts octets to Perl UTF-8") for SMTPUTF8 > > One thing I wouldn't be opposed to doing is adding a way to > download all loaded files in a tarball as a means for AGPL > enforcement. The tricky thing is those files may change on disk > after loading (and often does in my case :x), so they'd need to > be copied into stable storage at startup (and updated if there's > lazy-loading). Same security caveats apply, though. > > > > I also don't like wasting memory+bandwidth on things most users > > > won't see or care about. This is especially true for stuff at > > > the beginnning of the output since that's most likely to succeed > > > in being transferred. > > > > Fair enough. > > The loading speed of public-inbox is really great, let's keep it that > > way. > > Good to know it's great for you. It's still too slow for me, > but I'm anti-consumerist and refuse to follow Moore's law :x