From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request Date: Tue, 02 Aug 2016 18:25:37 +0300 Message-ID: <83mvkvmbv2.fsf@gnu.org> References: <83d1ltq3p6.fsf@gnu.org> <83popsocg8.fsf@gnu.org> <7fb3540a-7b74-68cf-2c63-66474de26640@yandex.ru> Reply-To: Eli Zaretskii NNTP-Posting-Host: blaine Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: blaine.gmane.org 1470151644 21510 195.159.176.226 (2 Aug 2016 15:27:24 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Tue, 2 Aug 2016 15:27:24 +0000 (UTC) Cc: stakemorii@gmail.com, 24117@debbugs.gnu.org To: Dmitry Gutov Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Tue Aug 02 17:27:19 2016 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bUbbC-00057f-Id for geb-bug-gnu-emacs@m.gmane.org; Tue, 02 Aug 2016 17:27:18 +0200 Original-Received: from localhost ([::1]:57275 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bUbb8-0005I2-To for geb-bug-gnu-emacs@m.gmane.org; Tue, 02 Aug 2016 11:27:14 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:59173) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bUbb0-0005GA-Iw for bug-gnu-emacs@gnu.org; Tue, 02 Aug 2016 11:27:07 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bUbaw-0007xI-CN for bug-gnu-emacs@gnu.org; Tue, 02 Aug 2016 11:27:05 -0400 Original-Received: from debbugs.gnu.org ([208.118.235.43]:56478) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bUbaw-0007xC-8P for bug-gnu-emacs@gnu.org; Tue, 02 Aug 2016 11:27:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1bUbaw-0004Ir-2u for bug-gnu-emacs@gnu.org; Tue, 02 Aug 2016 11:27:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Tue, 02 Aug 2016 15:27:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 24117 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 24117-submit@debbugs.gnu.org id=B24117.147015156316478 (code B ref 24117); Tue, 02 Aug 2016 15:27:02 +0000 Original-Received: (at 24117) by debbugs.gnu.org; 2 Aug 2016 15:26:03 +0000 Original-Received: from localhost ([127.0.0.1]:53775 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bUbZz-0004Hh-Ad for submit@debbugs.gnu.org; Tue, 02 Aug 2016 11:26:03 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:39778) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bUbZx-0004HE-Ii for 24117@debbugs.gnu.org; Tue, 02 Aug 2016 11:26:01 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bUbZp-0007r1-7R for 24117@debbugs.gnu.org; Tue, 02 Aug 2016 11:25:56 -0400 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:33484) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bUbZp-0007qs-3o; Tue, 02 Aug 2016 11:25:53 -0400 Original-Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:2486 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1bUbZn-0008Nn-7J; Tue, 02 Aug 2016 11:25:51 -0400 In-reply-to: <7fb3540a-7b74-68cf-2c63-66474de26640@yandex.ru> (message from Dmitry Gutov on Tue, 2 Aug 2016 03:52:25 +0300) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:121783 Archived-At: > Cc: 24117@debbugs.gnu.org > From: Dmitry Gutov > Date: Tue, 2 Aug 2016 03:52:25 +0300 > > (length (concat (encode-coding-string "фыва" 'utf-8) > (string-as-multibyte "abc"))) > > => 11 > > (string-bytes (concat (encode-coding-string "фыва" 'utf-8) > (string-as-multibyte "abc"))) > > => 19 > > And > > (multibyte-string-p (url-host (url-generic-parse-url "http://127.0.0.1"))) > > => t > > Apparently, url-generic-parse-url creates a multibyte string for the > host name because it performs its parsing in a buffer. And > url-http-create-request uses the return value of (url-host > url-http-target-url) to set the Location header. And all of that gets > concatenated in the request. Thanks for spelling this out. > Some possible solutions: > > - Perform the "string-bytes = length" verification only for > url-http-data, not the the whole request string. This strikes me as > ugly, but apparently we've been living with using a multibyte string > here for a while. > > - Call url-encode-url on the return value of (url-host > url-http-target-url), and hope that no similar problem pops up with any > of the related variables. This does solve the immediate problem with > anaconda-mode, I've checked. > > - Something else? How about making the temporary buffer parsed by url-generic-parse-url a unibyte buffer? Does that fix the problem? AFAIU, RFC 3986 doesn't allow non-ASCII characters, so we should be okay handling that in a unibyte buffer, right? I mean something like this: (with-temp-buffer ;; Don't let those temp-buffer modifications accidentally ;; deactivate the mark of the current-buffer. (let ((deactivate-mark nil)) (set-syntax-table url-parse-syntax-table) (erase-buffer) (set-buffer-multibyte nil) ;; <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< (insert url) (goto-char (point-min)) ... As for other possible problems like that, are there any that could be expected already? If so, we could try fixing them now. Alternatively, we could just wait for them to come up; after all, catching those was the main rationale for introducing the length test, right? Thanks.