From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: [PATCH] url: Wrap cookie headers in url-http--encode-string. Date: Fri, 09 Sep 2016 18:04:19 +0300 Message-ID: <8360q5868c.fsf@gnu.org> References: <20160907153014.15752-1-toke@toke.dk> <87inu7k5z4.fsf@toke.dk> <83bmzzaawr.fsf@gnu.org> <877fank1oc.fsf@toke.dk> <87inu6iim8.fsf@toke.dk> <2563921f-d20d-753b-09eb-c8671bc5b6d6@yandex.ru> <87a8fiidso.fsf@toke.dk> <86sht9qfyh.fsf@realize.ch> Reply-To: Eli Zaretskii NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Trace: blaine.gmane.org 1473433498 25056 195.159.176.226 (9 Sep 2016 15:04:58 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Fri, 9 Sep 2016 15:04:58 +0000 (UTC) Cc: toke@toke.dk, emacs-devel@gnu.org, monnier@iro.umontreal.ca, dgutov@yandex.ru To: Alain Schneble Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Sep 09 17:04:53 2016 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1biNMH-0005eg-21 for ged-emacs-devel@m.gmane.org; Fri, 09 Sep 2016 17:04:49 +0200 Original-Received: from localhost ([::1]:58395 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1biNME-0004ni-SA for ged-emacs-devel@m.gmane.org; Fri, 09 Sep 2016 11:04:46 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:60603) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1biNM3-0004kE-R6 for emacs-devel@gnu.org; Fri, 09 Sep 2016 11:04:39 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1biNLy-0007dU-Rz for emacs-devel@gnu.org; Fri, 09 Sep 2016 11:04:34 -0400 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:47026) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1biNLy-0007dE-Of; Fri, 09 Sep 2016 11:04:30 -0400 Original-Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:2390 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1biNLw-0006GC-SJ; Fri, 09 Sep 2016 11:04:29 -0400 In-reply-to: <86sht9qfyh.fsf@realize.ch> (message from Alain Schneble on Fri, 9 Sep 2016 16:56:54 +0200) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:207312 Archived-At: > From: Alain Schneble > CC: Dmitry Gutov , Eli Zaretskii , > , > Date: Fri, 9 Sep 2016 16:56:54 +0200 > > > (url-retrieve-synchronously "http://google.se") ; sets a cookie > > (let* ((url-request-data (encode-coding-string "זרו" 'utf-8))) > > (url-retrieve-synchronously "http://google.se")) ; crashes > > I was able to reproduce it but am a bit confused, since it doesn't > signal an error when message-body "זרו" is replaced by "abc", while > reusing the same cookie. > > I tried to track it down with the following example. `cookie-val' is the > value of the cookie-string: > > (string-bytes cookie-val) > => 131 > (string-bytes (encode-coding-string "זרו" 'utf-8)) > => 6 > (string-bytes (concat (encode-coding-string "זרו" 'utf-8) cookie-val)) > => 143 ' why? > (string-bytes (concat (string-as-unibyte "abc") ans-cookie-val)) > => 134 Because a multibyte string with ASCII-only text has the same number of bytes as it has characters. While a multibyte string with non-ASCII text has more bytes than characters, due to the way Emacs represents characters internally (which is actually a superset of UTF-8). > Why does concat behave that strangely? What am I missing here? Is the > behavior of concatenating a unibyte and a multibyte string simply > undefined? No, it isn't undefined. When some of the arguments are multibyte strings, concat returns a multibyte string. Nothing else would make sense.