unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
* [maintenance] Compressed JSON files and served file extension?
@ 2023-10-12 13:15 Simon Tournier
  2023-11-16 14:40 ` Ludovic Courtès
  2023-11-20 14:03 ` Attila Lendvai
  0 siblings, 2 replies; 10+ messages in thread
From: Simon Tournier @ 2023-10-12 13:15 UTC (permalink / raw)
  To: Guix Devel

Hi,

I have just noticed that:

    1123fd8 hydra: build-package-metadata: Compress JSON files.

Cool!

However, now I get:

--8<---------------cut here---------------start------------->8---
$ wget https://guix.gnu.org/sources.json
$ cat sources.json | jq | head
parse error: Invalid numeric literal at line 1, column 16
cat: write error: Broken pipe
--8<---------------cut here---------------end--------------->8---

and it does not appear to me obvious what is wrong here.

Could we have https://guix.gnu.org/sources.json.gz instead?  The
name is self-consistent.

Well, it will break some consumers of packages.json and sources.json.
To my knowledge, for sources.json at least, the only consumer is SWH and
it appears to me easy to keep them in touch. :-)


Cheers,
simon


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [maintenance] Compressed JSON files and served file extension?
  2023-10-12 13:15 [maintenance] Compressed JSON files and served file extension? Simon Tournier
@ 2023-11-16 14:40 ` Ludovic Courtès
  2023-11-20  9:49   ` Simon Tournier
  2023-11-28 13:48   ` Simon Tournier
  2023-11-20 14:03 ` Attila Lendvai
  1 sibling, 2 replies; 10+ messages in thread
From: Ludovic Courtès @ 2023-11-16 14:40 UTC (permalink / raw)
  To: Simon Tournier; +Cc: Guix Devel

Simon Tournier <zimon.toutoune@gmail.com> skribis:

> However, now I get:
>
> $ wget https://guix.gnu.org/sources.json

If you open it in a browser though, it’s fine, because browsers and in
fact many HTTP clients other than wget, honor ‘Content-Encoding’:

--8<---------------cut here---------------start------------->8---
$ wget -O/dev/null --debug https://guix.gnu.org/sources.json

[...]

---response begin---
HTTP/1.1 200 OK
Server: nginx
Date: Thu, 16 Nov 2023 14:38:04 GMT
Content-Type: application/json
Content-Length: 2670848
Connection: keep-alive
Content-Encoding: gzip
Expires: Thu, 16 Nov 2023 17:38:04 GMT
Cache-Control: max-age=10800
Content-Security-Policy: frame-ancestors 'none'

--8<---------------cut here---------------end--------------->8---

[...]

> Well, it will break some consumers of packages.json and sources.json.
> To my knowledge, for sources.json at least, the only consumer is SWH and
> it appears to me easy to keep them in touch. :-)

It shouldn’t break consumers, their client will just transparently
decompress the stream (I checked with #swh-devel back then just in
case).

That said, if you become aware of actual breakage, we can revisit this!

Thanks,
Ludo’.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [maintenance] Compressed JSON files and served file extension?
  2023-11-16 14:40 ` Ludovic Courtès
@ 2023-11-20  9:49   ` Simon Tournier
  2023-11-28 13:48   ` Simon Tournier
  1 sibling, 0 replies; 10+ messages in thread
From: Simon Tournier @ 2023-11-20  9:49 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Guix Devel

Hi,

On Thu, 16 Nov 2023 at 15:40, Ludovic Courtès <ludo@gnu.org> wrote:

>> However, now I get:
>>
>> $ wget https://guix.gnu.org/sources.json
>
> If you open it in a browser though, it’s fine, because browsers and in
> fact many HTTP clients other than wget, honor ‘Content-Encoding’:

[...]

> That said, if you become aware of actual breakage, we can revisit this!

Could we have a self-consistent name?  The expected extension when a
file is Gziped compressed is .gz.  Why not put it?

1. It costs us nothing.
2. How to read the file is then clearer.


Cheers,
simon


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [maintenance] Compressed JSON files and served file extension?
  2023-10-12 13:15 [maintenance] Compressed JSON files and served file extension? Simon Tournier
  2023-11-16 14:40 ` Ludovic Courtès
@ 2023-11-20 14:03 ` Attila Lendvai
  2023-11-28 13:37   ` Simon Tournier
  1 sibling, 1 reply; 10+ messages in thread
From: Attila Lendvai @ 2023-11-20 14:03 UTC (permalink / raw)
  To: Simon Tournier; +Cc: Guix Devel

TL;DR the filename shouldn't contain the .gz extension, and the HTTP standard is crap ("If no Accept-Encoding field is present in a request, the server MAY assume that the client will accept any content coding.").

use curl --compressed

the details:

the Content-Encoding response header instructs the client on how to decode the __transfer payload__ that the server is serving.

i.e. proper HTTP clients should automatically decode the content as instructed by the Content-Encoding response header, or at the very least warn that they do not understand the response encoding...

but that should not happen, because the HTTP request can contain an Accept-Encoding header that tells the server what the client understands, and it defaults to unprocessed raw data ('identity')...

except that the standard allows the server to ignore the Accept-Encoding request header.

well, this is the theory, but both wget and curl don't automatically decode the content. curl at least can be instructed to do so, which arguably should be its default:

curl --compressed https://guix.gnu.org/sources.json | less

--verbose can be used to inspect the reques/response headers (printed to stderr):

curl --verbose https://guix.gnu.org/sources.json >/dev/null
curl --verbose --compressed https://guix.gnu.org/sources.json >/dev/null

here's a detailed discussion of this very question:

https://stackoverflow.com/questions/8364640/how-to-properly-handle-a-gzipped-page-when-using-curl

so, in an ideal world wget and curl should transparently decode the content according to the Content-Encoding response header, and nginx should not respond with a compressed content when the client is not sending an Accept-Encoding request header.

the pragmatic solution is to use curl --compressed in scripts, and/or add it to your ~/.curlrc:

# to automatically decode responses with some of
# the supported Content-Encoding
compressed

HTH,

--
• attila lendvai
• PGP: 963F 5D5F 45C7 DFCD 0A39
--
“We need people in our lives with whom we can be as open as possible. To have real conversations with people may seem like such a simple, obvious suggestion, but it involves courage and risk.”
	— Thomas Moore (1940–)



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [maintenance] Compressed JSON files and served file extension?
  2023-11-20 14:03 ` Attila Lendvai
@ 2023-11-28 13:37   ` Simon Tournier
  2023-11-28 15:50     ` Tomas Volf
  2023-11-28 21:05     ` Attila Lendvai
  0 siblings, 2 replies; 10+ messages in thread
From: Simon Tournier @ 2023-11-28 13:37 UTC (permalink / raw)
  To: Attila Lendvai; +Cc: Guix Devel

Hi,

On Mon, 20 Nov 2023 at 14:03, Attila Lendvai <attila@lendvai.name> wrote:

> TL;DR the filename shouldn't contain the .gz extension, and the HTTP
> standard is crap ("If no Accept-Encoding field is present in a
> request, the server MAY assume that the client will accept any content
> coding.").
>
> use curl --compressed

And if I do not want to use curl but instead another tool as wget? :-)

Cheers,
simon




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [maintenance] Compressed JSON files and served file extension?
  2023-11-16 14:40 ` Ludovic Courtès
  2023-11-20  9:49   ` Simon Tournier
@ 2023-11-28 13:48   ` Simon Tournier
  2023-11-28 15:57     ` Tomas Volf
  1 sibling, 1 reply; 10+ messages in thread
From: Simon Tournier @ 2023-11-28 13:48 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Guix Devel

Hi Ludo,

On Thu, 16 Nov 2023 at 15:40, Ludovic Courtès <ludo@gnu.org> wrote:

> That said, if you become aware of actual breakage, we can revisit this!

The actual breakage is my own interaction with this file. :-)

Again, it happened to me yesterday.  By habits, I do:

    $ wget https://guix.gnu.org/sources.json
    $ cat sources.json | jq | head

Then,

--8<---------------cut here---------------start------------->8---
parse error: Invalid numeric literal at line 1, column 16
cat: write error: Broken pipe
--8<---------------cut here---------------end--------------->8---

Well, we are 6 days later my last message, 12 days after your message
and more than one month after my report; and again the same mistake.
That’s mistake because it does not jump to my eyes that the file is
compressed.  Yeah, I could do many on my side as change my habits, as
use curl, as have a smarter cat, as write a note, as have a better
memory, as …

However, the simplest still appears to me to have the extension
reflecting the format of the file.  Similarly as it is ’sources.json’
and not just ’sources’.


Cheers,
simon


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [maintenance] Compressed JSON files and served file extension?
  2023-11-28 13:37   ` Simon Tournier
@ 2023-11-28 15:50     ` Tomas Volf
  2023-11-28 21:05     ` Attila Lendvai
  1 sibling, 0 replies; 10+ messages in thread
From: Tomas Volf @ 2023-11-28 15:50 UTC (permalink / raw)
  To: Simon Tournier; +Cc: Attila Lendvai, Guix Devel

[-- Attachment #1: Type: text/plain, Size: 696 bytes --]

On 2023-11-28 14:37:55 +0100, Simon Tournier wrote:
> Hi,
> 
> On Mon, 20 Nov 2023 at 14:03, Attila Lendvai <attila@lendvai.name> wrote:
> 
> > TL;DR the filename shouldn't contain the .gz extension, and the HTTP
> > standard is crap ("If no Accept-Encoding field is present in a
> > request, the server MAY assume that the client will accept any content
> > coding.").
> >
> > use curl --compressed
> 
> And if I do not want to use curl but instead another tool as wget? :-)

You could invoke the wget with -E flag I guess.

> 
> Cheers,
> simon
> 
> 
> 

-- 
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [maintenance] Compressed JSON files and served file extension?
  2023-11-28 13:48   ` Simon Tournier
@ 2023-11-28 15:57     ` Tomas Volf
  0 siblings, 0 replies; 10+ messages in thread
From: Tomas Volf @ 2023-11-28 15:57 UTC (permalink / raw)
  To: Simon Tournier; +Cc: Ludovic Courtès, Guix Devel

[-- Attachment #1: Type: text/plain, Size: 2098 bytes --]

On 2023-11-28 14:48:05 +0100, Simon Tournier wrote:
> Hi Ludo,
> 
> On Thu, 16 Nov 2023 at 15:40, Ludovic Courtès <ludo@gnu.org> wrote:
> 
> > That said, if you become aware of actual breakage, we can revisit this!
> 
> The actual breakage is my own interaction with this file. :-)
> 
> Again, it happened to me yesterday.  By habits, I do:
> 
>     $ wget https://guix.gnu.org/sources.json
>     $ cat sources.json | jq | head
> 
> Then,
> 
> --8<---------------cut here---------------start------------->8---
> parse error: Invalid numeric literal at line 1, column 16
> cat: write error: Broken pipe
> --8<---------------cut here---------------end--------------->8---
> 
> Well, we are 6 days later my last message, 12 days after your message
> and more than one month after my report; and again the same mistake.
> That’s mistake because it does not jump to my eyes that the file is
> compressed.  Yeah, I could do many on my side as change my habits, as
> use curl, as have a smarter cat, as write a note, as have a better
> memory, as …
> 
> However, the simplest still appears to me to have the extension
> reflecting the format of the file.  Similarly as it is ’sources.json’
> and not just ’sources’.

But the problem is that the extension does reflect the format of the file.  The
Content-Encoding is just for transmission, it does not describe the file itself.
It should be thought of as an implementation detail.  Same way you do not really
care if you get the file over HTTP 1.1 or HTTP 2.0.

This just sound like a bug in wget, I wonder if there is a bug report and/or
good reason to behave like this...

Would it be possible to provide both files?  Sources.json.gz would *not* set the
content-encoding (and I guess content-type would be gzip?), so that one could
pick between https://guix.gnu.org/sources.json and
https://guix.gnu.org/sources.json.gz depending on their preference?

Have a nice day,
Tomas

-- 
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [maintenance] Compressed JSON files and served file extension?
  2023-11-28 13:37   ` Simon Tournier
  2023-11-28 15:50     ` Tomas Volf
@ 2023-11-28 21:05     ` Attila Lendvai
  2023-11-29  9:08       ` Simon Tournier
  1 sibling, 1 reply; 10+ messages in thread
From: Attila Lendvai @ 2023-11-28 21:05 UTC (permalink / raw)
  To: Simon Tournier; +Cc: Guix Devel

> And if I do not want to use curl but instead another tool as wget? :-)

then maybe complain to the authors that it doesn't comply with the standard? :)

here's the bug report BTW:

Wget not honouring Content-Encoding: gzip
https://savannah.gnu.org/bugs/?61649

or use wget2 instead.

i guess they didn't fix it in wget because they didn't want to break "backwards compatibility". (remember: if it's not backwards, it's not compatible! :)

happy hacking,

-- 
• attila lendvai
• PGP: 963F 5D5F 45C7 DFCD 0A39
--
“We cannot train our babies not to need us. Whether it's the middle of the day or the middle of the night, their needs are real and valid, including the need for a simple human touch. A 'trained' baby may give up on his needs being met, but the need is still there, just not the trust.”
	— L. R. Knost



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [maintenance] Compressed JSON files and served file extension?
  2023-11-28 21:05     ` Attila Lendvai
@ 2023-11-29  9:08       ` Simon Tournier
  0 siblings, 0 replies; 10+ messages in thread
From: Simon Tournier @ 2023-11-29  9:08 UTC (permalink / raw)
  To: Attila Lendvai, Tomas Volf; +Cc: Guix Devel

Hi,

On mar., 28 nov. 2023 at 16:50, Tomas Volf <~@wolfsden.cz> wrote:

> You could invoke the wget with -E flag I guess.

Thanks.  I will try to apply that consistently. :-)


On mar., 28 nov. 2023 at 21:05, Attila Lendvai <attila@lendvai.name> wrote:

>                            (remember: if it's not backwards, it's not
> compatible! :) 

Thanks for explaining. :-)


Cheers,
simon


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-11-29 19:14 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-12 13:15 [maintenance] Compressed JSON files and served file extension? Simon Tournier
2023-11-16 14:40 ` Ludovic Courtès
2023-11-20  9:49   ` Simon Tournier
2023-11-28 13:48   ` Simon Tournier
2023-11-28 15:57     ` Tomas Volf
2023-11-20 14:03 ` Attila Lendvai
2023-11-28 13:37   ` Simon Tournier
2023-11-28 15:50     ` Tomas Volf
2023-11-28 21:05     ` Attila Lendvai
2023-11-29  9:08       ` Simon Tournier

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).