unofficial mirror of bug-guile@gnu.org 
 help / color / mirror / Atom feed
* bug#30076: [PATCH] web: Recognize JSON content type as text.
@ 2018-01-11  5:31 Arun Isaac
  2018-01-31  3:31 ` Mark H Weaver
  0 siblings, 1 reply; 4+ messages in thread
From: Arun Isaac @ 2018-01-11  5:31 UTC (permalink / raw)
  To: 30076

* module/web/response.scm (text-content-type?): Recognize JSON content
  type as text.
---
 module/web/response.scm | 1 +
 1 file changed, 1 insertion(+)

diff --git a/module/web/response.scm b/module/web/response.scm
index 06e1c6dc1..679304c4d 100644
--- a/module/web/response.scm
+++ b/module/web/response.scm
@@ -184,6 +184,7 @@ reason phrase for the response's code."
 represents a textual type such as `text/plain'."
   (let ((type (symbol->string type)))
     (or (string-prefix? "text/" type)
+        (string-suffix? "/json" type)
         (string-suffix? "/xml" type)
         (string-suffix? "+xml" type))))
 
-- 
2.15.1






^ permalink raw reply related	[flat|nested] 4+ messages in thread

* bug#30076: [PATCH] web: Recognize JSON content type as text.
  2018-01-11  5:31 bug#30076: [PATCH] web: Recognize JSON content type as text Arun Isaac
@ 2018-01-31  3:31 ` Mark H Weaver
  2018-01-31  6:04   ` Mark H Weaver
  0 siblings, 1 reply; 4+ messages in thread
From: Mark H Weaver @ 2018-01-31  3:31 UTC (permalink / raw)
  To: Arun Isaac; +Cc: 30076

Hi Arun,

Arun Isaac <arunisaac@systemreboot.net> writes:
> * module/web/response.scm (text-content-type?): Recognize JSON content
>   type as text.

While this would seem reasonable at first glance, it seems to me that
this will result in JSON texts with non-ASCII characters being
mishandled in many cases.

Within Guile, 'text-content-type?' is currently used in two places:

* 'decode-response-body' in (web client), and
* 'response-body-port' in (web response).

In both places, if 'text-content-type?' returns true, the encoding of
the response is assumed to be "ISO-8859-1" if not otherwise specified by
an explicit 'charset' parameter.  This is what RFC 2616 specifies for
text/plain, although RFC 6657 would change the default to US-ASCII, as
it was in RFC 2046, and maybe we should look into that.

However, things are quite different for the application/json MIME type,
as specified in RFCs 4627 and 7159.  Those RFCs specify that JSON text
"SHALL" (i.e. MUST) be encoded in Unicode (UTF-8, UTF-16 or UTF-32),
that the default encoding is UTF-8, and furthermore that no charset
parameter is defined for application/json.

So, we can expect at least some conforming implementations to omit the
'charset' parameter, and yet in that case we must assume that the
encoding is Unicode, and most definitely not ISO-8859-1.

RFC 4627 makes the additional interesting observation (in section 3,
"encoding") that since the first two characters of JSON text will always
be ASCII, and since UTF-8/UTF-16/UTF-32 are the only valid encodings for
JSON text, we can reliably determine the encoding by looking at the
pattern of nul bytes in the first four octets:

           00 00 00 xx  UTF-32BE
           00 xx 00 xx  UTF-16BE
           xx 00 00 00  UTF-32LE
           xx 00 xx 00  UTF-16LE
           xx xx xx xx  UTF-8

Given that any of these encodings above are possible, and that there is
no 'charset' parameter defined for "application/json", it seems to me
that we have no choice but to be prepared to auto-detect the encoding,
as described in RFC 4627 section 3 if the 'charset' parameter is
missing.

What do you think?

      Mark





^ permalink raw reply	[flat|nested] 4+ messages in thread

* bug#30076: [PATCH] web: Recognize JSON content type as text.
  2018-01-31  3:31 ` Mark H Weaver
@ 2018-01-31  6:04   ` Mark H Weaver
  2018-02-02  7:31     ` Arun Isaac
  0 siblings, 1 reply; 4+ messages in thread
From: Mark H Weaver @ 2018-01-31  6:04 UTC (permalink / raw)
  To: Arun Isaac; +Cc: 30076

Mark H Weaver <mhw@netris.org> writes:
> RFC 4627 makes the additional interesting observation (in section 3,
> "encoding") that since the first two characters of JSON text will always
> be ASCII,

Sorry, it turns out that's no longer the case.  RFC 4627 specified that
a JSON text must be either an object or array, but in RFC 7159 a JSON
text can be any JSON value.  So only the first character is guaranteed
to be ASCII.

Having looked into this a bit more, I wonder if Guile should even try to
set the port encoding itself.  As far as I can tell, there's no way to
know the encoding of the response payload in the general case, without
knowledge of the specific MIME media type.  We could teach Guile about
"application/json", but if we follow that path, it would lead to us
teaching Guile's web library about more media types over time, but we
cannot hope to know about all of them.

The 'charset' parameter is not universal.  Whether it is a valid
parameter, and how its value is to be interpreted, depends on the media
type.  For "application/json", technically there is no 'charset'
parameter at all.

Since it's not feasible for Guile to reliably choose the right encoding
for arbitrary media types, perhaps it would be better for Guile to
explicitly say that it's the application programmer's job to set the
encoding of the port, if it contains textual data.

What do you think?

      Mark





^ permalink raw reply	[flat|nested] 4+ messages in thread

* bug#30076: [PATCH] web: Recognize JSON content type as text.
  2018-01-31  6:04   ` Mark H Weaver
@ 2018-02-02  7:31     ` Arun Isaac
  0 siblings, 0 replies; 4+ messages in thread
From: Arun Isaac @ 2018-02-02  7:31 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: 30076


> Having looked into this a bit more, I wonder if Guile should even try to
> set the port encoding itself.  As far as I can tell, there's no way to
> know the encoding of the response payload in the general case, without
> knowledge of the specific MIME media type.  We could teach Guile about
> "application/json", but if we follow that path, it would lead to us
> teaching Guile's web library about more media types over time, but we
> cannot hope to know about all of them.

> Since it's not feasible for Guile to reliably choose the right encoding
> for arbitrary media types, perhaps it would be better for Guile to
> explicitly say that it's the application programmer's job to set the
> encoding of the port, if it contains textual data.

"application/json" is common enough that it would be convenient for the
application programmer to have Guile know about it. But, as a Guile
maintainer, this is your call. I don't have strong opinions this way or
that.





^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-02-02  7:31 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-11  5:31 bug#30076: [PATCH] web: Recognize JSON content type as text Arun Isaac
2018-01-31  3:31 ` Mark H Weaver
2018-01-31  6:04   ` Mark H Weaver
2018-02-02  7:31     ` Arun Isaac

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).