From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Mark H Weaver Newsgroups: gmane.lisp.guile.bugs Subject: bug#30076: [PATCH] web: Recognize JSON content type as text. Date: Tue, 30 Jan 2018 22:31:04 -0500 Message-ID: <87y3kevh53.fsf@netris.org> References: <20180111053117.4597-1-arunisaac@systemreboot.net> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: blaine.gmane.org 1517369432 16527 195.159.176.226 (31 Jan 2018 03:30:32 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Wed, 31 Jan 2018 03:30:32 +0000 (UTC) User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux) Cc: 30076@debbugs.gnu.org To: Arun Isaac Original-X-From: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Wed Jan 31 04:30:28 2018 Return-path: Envelope-to: guile-bugs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1egj69-0002on-8n for guile-bugs@m.gmane.org; Wed, 31 Jan 2018 04:30:09 +0100 Original-Received: from localhost ([::1]:57767 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1egj89-0003fw-Te for guile-bugs@m.gmane.org; Tue, 30 Jan 2018 22:32:13 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:60350) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1egj83-0003fd-Eg for bug-guile@gnu.org; Tue, 30 Jan 2018 22:32:08 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1egj7y-0000A0-HY for bug-guile@gnu.org; Tue, 30 Jan 2018 22:32:07 -0500 Original-Received: from debbugs.gnu.org ([208.118.235.43]:42188) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1egj7y-00009v-ET for bug-guile@gnu.org; Tue, 30 Jan 2018 22:32:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1egj7y-0003Tt-5C for bug-guile@gnu.org; Tue, 30 Jan 2018 22:32:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Mark H Weaver Original-Sender: "Debbugs-submit" Resent-CC: bug-guile@gnu.org Resent-Date: Wed, 31 Jan 2018 03:32:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 30076 X-GNU-PR-Package: guile X-GNU-PR-Keywords: patch Original-Received: via spool by 30076-submit@debbugs.gnu.org id=B30076.151736950913363 (code B ref 30076); Wed, 31 Jan 2018 03:32:02 +0000 Original-Received: (at 30076) by debbugs.gnu.org; 31 Jan 2018 03:31:49 +0000 Original-Received: from localhost ([127.0.0.1]:50085 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1egj7l-0003TT-F0 for submit@debbugs.gnu.org; Tue, 30 Jan 2018 22:31:49 -0500 Original-Received: from world.peace.net ([50.252.239.5]:51134) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1egj7j-0003T9-1U for 30076@debbugs.gnu.org; Tue, 30 Jan 2018 22:31:47 -0500 Original-Received: from pool-72-93-27-251.bstnma.east.verizon.net ([72.93.27.251] helo=jojen) by world.peace.net with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1egj7c-0001US-I2; Tue, 30 Jan 2018 22:31:40 -0500 In-Reply-To: <20180111053117.4597-1-arunisaac@systemreboot.net> (Arun Isaac's message of "Thu, 11 Jan 2018 11:01:17 +0530") X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-guile@gnu.org List-Id: "Bug reports for GUILE, GNU's Ubiquitous Extension Language" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Original-Sender: "bug-guile" Xref: news.gmane.org gmane.lisp.guile.bugs:8991 Archived-At: Hi Arun, Arun Isaac writes: > * module/web/response.scm (text-content-type?): Recognize JSON content > type as text. While this would seem reasonable at first glance, it seems to me that this will result in JSON texts with non-ASCII characters being mishandled in many cases. Within Guile, 'text-content-type?' is currently used in two places: * 'decode-response-body' in (web client), and * 'response-body-port' in (web response). In both places, if 'text-content-type?' returns true, the encoding of the response is assumed to be "ISO-8859-1" if not otherwise specified by an explicit 'charset' parameter. This is what RFC 2616 specifies for text/plain, although RFC 6657 would change the default to US-ASCII, as it was in RFC 2046, and maybe we should look into that. However, things are quite different for the application/json MIME type, as specified in RFCs 4627 and 7159. Those RFCs specify that JSON text "SHALL" (i.e. MUST) be encoded in Unicode (UTF-8, UTF-16 or UTF-32), that the default encoding is UTF-8, and furthermore that no charset parameter is defined for application/json. So, we can expect at least some conforming implementations to omit the 'charset' parameter, and yet in that case we must assume that the encoding is Unicode, and most definitely not ISO-8859-1. RFC 4627 makes the additional interesting observation (in section 3, "encoding") that since the first two characters of JSON text will always be ASCII, and since UTF-8/UTF-16/UTF-32 are the only valid encodings for JSON text, we can reliably determine the encoding by looking at the pattern of nul bytes in the first four octets: 00 00 00 xx UTF-32BE 00 xx 00 xx UTF-16BE xx 00 00 00 UTF-32LE xx 00 xx 00 UTF-16LE xx xx xx xx UTF-8 Given that any of these encodings above are possible, and that there is no 'charset' parameter defined for "application/json", it seems to me that we have no choice but to be prepared to auto-detect the encoding, as described in RFC 4627 section 3 if the 'charset' parameter is missing. What do you think? Mark