* [maintenance] Compressed JSON files and served file extension? @ 2023-10-12 13:15 Simon Tournier 2023-11-16 14:40 ` Ludovic Courtès 2023-11-20 14:03 ` Attila Lendvai 0 siblings, 2 replies; 10+ messages in thread From: Simon Tournier @ 2023-10-12 13:15 UTC (permalink / raw) To: Guix Devel Hi, I have just noticed that: 1123fd8 hydra: build-package-metadata: Compress JSON files. Cool! However, now I get: --8<---------------cut here---------------start------------->8--- $ wget https://guix.gnu.org/sources.json $ cat sources.json | jq | head parse error: Invalid numeric literal at line 1, column 16 cat: write error: Broken pipe --8<---------------cut here---------------end--------------->8--- and it does not appear to me obvious what is wrong here. Could we have https://guix.gnu.org/sources.json.gz instead? The name is self-consistent. Well, it will break some consumers of packages.json and sources.json. To my knowledge, for sources.json at least, the only consumer is SWH and it appears to me easy to keep them in touch. :-) Cheers, simon ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [maintenance] Compressed JSON files and served file extension? 2023-10-12 13:15 [maintenance] Compressed JSON files and served file extension? Simon Tournier @ 2023-11-16 14:40 ` Ludovic Courtès 2023-11-20 9:49 ` Simon Tournier 2023-11-28 13:48 ` Simon Tournier 2023-11-20 14:03 ` Attila Lendvai 1 sibling, 2 replies; 10+ messages in thread From: Ludovic Courtès @ 2023-11-16 14:40 UTC (permalink / raw) To: Simon Tournier; +Cc: Guix Devel Simon Tournier <zimon.toutoune@gmail.com> skribis: > However, now I get: > > $ wget https://guix.gnu.org/sources.json If you open it in a browser though, it’s fine, because browsers and in fact many HTTP clients other than wget, honor ‘Content-Encoding’: --8<---------------cut here---------------start------------->8--- $ wget -O/dev/null --debug https://guix.gnu.org/sources.json [...] ---response begin--- HTTP/1.1 200 OK Server: nginx Date: Thu, 16 Nov 2023 14:38:04 GMT Content-Type: application/json Content-Length: 2670848 Connection: keep-alive Content-Encoding: gzip Expires: Thu, 16 Nov 2023 17:38:04 GMT Cache-Control: max-age=10800 Content-Security-Policy: frame-ancestors 'none' --8<---------------cut here---------------end--------------->8--- [...] > Well, it will break some consumers of packages.json and sources.json. > To my knowledge, for sources.json at least, the only consumer is SWH and > it appears to me easy to keep them in touch. :-) It shouldn’t break consumers, their client will just transparently decompress the stream (I checked with #swh-devel back then just in case). That said, if you become aware of actual breakage, we can revisit this! Thanks, Ludo’. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [maintenance] Compressed JSON files and served file extension? 2023-11-16 14:40 ` Ludovic Courtès @ 2023-11-20 9:49 ` Simon Tournier 2023-11-28 13:48 ` Simon Tournier 1 sibling, 0 replies; 10+ messages in thread From: Simon Tournier @ 2023-11-20 9:49 UTC (permalink / raw) To: Ludovic Courtès; +Cc: Guix Devel Hi, On Thu, 16 Nov 2023 at 15:40, Ludovic Courtès <ludo@gnu.org> wrote: >> However, now I get: >> >> $ wget https://guix.gnu.org/sources.json > > If you open it in a browser though, it’s fine, because browsers and in > fact many HTTP clients other than wget, honor ‘Content-Encoding’: [...] > That said, if you become aware of actual breakage, we can revisit this! Could we have a self-consistent name? The expected extension when a file is Gziped compressed is .gz. Why not put it? 1. It costs us nothing. 2. How to read the file is then clearer. Cheers, simon ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [maintenance] Compressed JSON files and served file extension? 2023-11-16 14:40 ` Ludovic Courtès 2023-11-20 9:49 ` Simon Tournier @ 2023-11-28 13:48 ` Simon Tournier 2023-11-28 15:57 ` Tomas Volf 1 sibling, 1 reply; 10+ messages in thread From: Simon Tournier @ 2023-11-28 13:48 UTC (permalink / raw) To: Ludovic Courtès; +Cc: Guix Devel Hi Ludo, On Thu, 16 Nov 2023 at 15:40, Ludovic Courtès <ludo@gnu.org> wrote: > That said, if you become aware of actual breakage, we can revisit this! The actual breakage is my own interaction with this file. :-) Again, it happened to me yesterday. By habits, I do: $ wget https://guix.gnu.org/sources.json $ cat sources.json | jq | head Then, --8<---------------cut here---------------start------------->8--- parse error: Invalid numeric literal at line 1, column 16 cat: write error: Broken pipe --8<---------------cut here---------------end--------------->8--- Well, we are 6 days later my last message, 12 days after your message and more than one month after my report; and again the same mistake. That’s mistake because it does not jump to my eyes that the file is compressed. Yeah, I could do many on my side as change my habits, as use curl, as have a smarter cat, as write a note, as have a better memory, as … However, the simplest still appears to me to have the extension reflecting the format of the file. Similarly as it is ’sources.json’ and not just ’sources’. Cheers, simon ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [maintenance] Compressed JSON files and served file extension? 2023-11-28 13:48 ` Simon Tournier @ 2023-11-28 15:57 ` Tomas Volf 0 siblings, 0 replies; 10+ messages in thread From: Tomas Volf @ 2023-11-28 15:57 UTC (permalink / raw) To: Simon Tournier; +Cc: Ludovic Courtès, Guix Devel [-- Attachment #1: Type: text/plain, Size: 2098 bytes --] On 2023-11-28 14:48:05 +0100, Simon Tournier wrote: > Hi Ludo, > > On Thu, 16 Nov 2023 at 15:40, Ludovic Courtès <ludo@gnu.org> wrote: > > > That said, if you become aware of actual breakage, we can revisit this! > > The actual breakage is my own interaction with this file. :-) > > Again, it happened to me yesterday. By habits, I do: > > $ wget https://guix.gnu.org/sources.json > $ cat sources.json | jq | head > > Then, > > --8<---------------cut here---------------start------------->8--- > parse error: Invalid numeric literal at line 1, column 16 > cat: write error: Broken pipe > --8<---------------cut here---------------end--------------->8--- > > Well, we are 6 days later my last message, 12 days after your message > and more than one month after my report; and again the same mistake. > That’s mistake because it does not jump to my eyes that the file is > compressed. Yeah, I could do many on my side as change my habits, as > use curl, as have a smarter cat, as write a note, as have a better > memory, as … > > However, the simplest still appears to me to have the extension > reflecting the format of the file. Similarly as it is ’sources.json’ > and not just ’sources’. But the problem is that the extension does reflect the format of the file. The Content-Encoding is just for transmission, it does not describe the file itself. It should be thought of as an implementation detail. Same way you do not really care if you get the file over HTTP 1.1 or HTTP 2.0. This just sound like a bug in wget, I wonder if there is a bug report and/or good reason to behave like this... Would it be possible to provide both files? Sources.json.gz would *not* set the content-encoding (and I guess content-type would be gzip?), so that one could pick between https://guix.gnu.org/sources.json and https://guix.gnu.org/sources.json.gz depending on their preference? Have a nice day, Tomas -- There are only two hard things in Computer Science: cache invalidation, naming things and off-by-one errors. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [maintenance] Compressed JSON files and served file extension? 2023-10-12 13:15 [maintenance] Compressed JSON files and served file extension? Simon Tournier 2023-11-16 14:40 ` Ludovic Courtès @ 2023-11-20 14:03 ` Attila Lendvai 2023-11-28 13:37 ` Simon Tournier 1 sibling, 1 reply; 10+ messages in thread From: Attila Lendvai @ 2023-11-20 14:03 UTC (permalink / raw) To: Simon Tournier; +Cc: Guix Devel TL;DR the filename shouldn't contain the .gz extension, and the HTTP standard is crap ("If no Accept-Encoding field is present in a request, the server MAY assume that the client will accept any content coding."). use curl --compressed the details: the Content-Encoding response header instructs the client on how to decode the __transfer payload__ that the server is serving. i.e. proper HTTP clients should automatically decode the content as instructed by the Content-Encoding response header, or at the very least warn that they do not understand the response encoding... but that should not happen, because the HTTP request can contain an Accept-Encoding header that tells the server what the client understands, and it defaults to unprocessed raw data ('identity')... except that the standard allows the server to ignore the Accept-Encoding request header. well, this is the theory, but both wget and curl don't automatically decode the content. curl at least can be instructed to do so, which arguably should be its default: curl --compressed https://guix.gnu.org/sources.json | less --verbose can be used to inspect the reques/response headers (printed to stderr): curl --verbose https://guix.gnu.org/sources.json >/dev/null curl --verbose --compressed https://guix.gnu.org/sources.json >/dev/null here's a detailed discussion of this very question: https://stackoverflow.com/questions/8364640/how-to-properly-handle-a-gzipped-page-when-using-curl so, in an ideal world wget and curl should transparently decode the content according to the Content-Encoding response header, and nginx should not respond with a compressed content when the client is not sending an Accept-Encoding request header. the pragmatic solution is to use curl --compressed in scripts, and/or add it to your ~/.curlrc: # to automatically decode responses with some of # the supported Content-Encoding compressed HTH, -- • attila lendvai • PGP: 963F 5D5F 45C7 DFCD 0A39 -- “We need people in our lives with whom we can be as open as possible. To have real conversations with people may seem like such a simple, obvious suggestion, but it involves courage and risk.” — Thomas Moore (1940–) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [maintenance] Compressed JSON files and served file extension? 2023-11-20 14:03 ` Attila Lendvai @ 2023-11-28 13:37 ` Simon Tournier 2023-11-28 15:50 ` Tomas Volf 2023-11-28 21:05 ` Attila Lendvai 0 siblings, 2 replies; 10+ messages in thread From: Simon Tournier @ 2023-11-28 13:37 UTC (permalink / raw) To: Attila Lendvai; +Cc: Guix Devel Hi, On Mon, 20 Nov 2023 at 14:03, Attila Lendvai <attila@lendvai.name> wrote: > TL;DR the filename shouldn't contain the .gz extension, and the HTTP > standard is crap ("If no Accept-Encoding field is present in a > request, the server MAY assume that the client will accept any content > coding."). > > use curl --compressed And if I do not want to use curl but instead another tool as wget? :-) Cheers, simon ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [maintenance] Compressed JSON files and served file extension? 2023-11-28 13:37 ` Simon Tournier @ 2023-11-28 15:50 ` Tomas Volf 2023-11-28 21:05 ` Attila Lendvai 1 sibling, 0 replies; 10+ messages in thread From: Tomas Volf @ 2023-11-28 15:50 UTC (permalink / raw) To: Simon Tournier; +Cc: Attila Lendvai, Guix Devel [-- Attachment #1: Type: text/plain, Size: 696 bytes --] On 2023-11-28 14:37:55 +0100, Simon Tournier wrote: > Hi, > > On Mon, 20 Nov 2023 at 14:03, Attila Lendvai <attila@lendvai.name> wrote: > > > TL;DR the filename shouldn't contain the .gz extension, and the HTTP > > standard is crap ("If no Accept-Encoding field is present in a > > request, the server MAY assume that the client will accept any content > > coding."). > > > > use curl --compressed > > And if I do not want to use curl but instead another tool as wget? :-) You could invoke the wget with -E flag I guess. > > Cheers, > simon > > > -- There are only two hard things in Computer Science: cache invalidation, naming things and off-by-one errors. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [maintenance] Compressed JSON files and served file extension? 2023-11-28 13:37 ` Simon Tournier 2023-11-28 15:50 ` Tomas Volf @ 2023-11-28 21:05 ` Attila Lendvai 2023-11-29 9:08 ` Simon Tournier 1 sibling, 1 reply; 10+ messages in thread From: Attila Lendvai @ 2023-11-28 21:05 UTC (permalink / raw) To: Simon Tournier; +Cc: Guix Devel > And if I do not want to use curl but instead another tool as wget? :-) then maybe complain to the authors that it doesn't comply with the standard? :) here's the bug report BTW: Wget not honouring Content-Encoding: gzip https://savannah.gnu.org/bugs/?61649 or use wget2 instead. i guess they didn't fix it in wget because they didn't want to break "backwards compatibility". (remember: if it's not backwards, it's not compatible! :) happy hacking, -- • attila lendvai • PGP: 963F 5D5F 45C7 DFCD 0A39 -- “We cannot train our babies not to need us. Whether it's the middle of the day or the middle of the night, their needs are real and valid, including the need for a simple human touch. A 'trained' baby may give up on his needs being met, but the need is still there, just not the trust.” — L. R. Knost ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [maintenance] Compressed JSON files and served file extension? 2023-11-28 21:05 ` Attila Lendvai @ 2023-11-29 9:08 ` Simon Tournier 0 siblings, 0 replies; 10+ messages in thread From: Simon Tournier @ 2023-11-29 9:08 UTC (permalink / raw) To: Attila Lendvai, Tomas Volf; +Cc: Guix Devel Hi, On mar., 28 nov. 2023 at 16:50, Tomas Volf <~@wolfsden.cz> wrote: > You could invoke the wget with -E flag I guess. Thanks. I will try to apply that consistently. :-) On mar., 28 nov. 2023 at 21:05, Attila Lendvai <attila@lendvai.name> wrote: > (remember: if it's not backwards, it's not > compatible! :) Thanks for explaining. :-) Cheers, simon ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2023-11-29 19:14 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-10-12 13:15 [maintenance] Compressed JSON files and served file extension? Simon Tournier 2023-11-16 14:40 ` Ludovic Courtès 2023-11-20 9:49 ` Simon Tournier 2023-11-28 13:48 ` Simon Tournier 2023-11-28 15:57 ` Tomas Volf 2023-11-20 14:03 ` Attila Lendvai 2023-11-28 13:37 ` Simon Tournier 2023-11-28 15:50 ` Tomas Volf 2023-11-28 21:05 ` Attila Lendvai 2023-11-29 9:08 ` Simon Tournier
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/guix.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).