From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#31149: 27.0.50; (gui-get-selection nil 'text/html) returns mis-decoded text Date: Sat, 19 May 2018 11:50:37 +0300 Message-ID: <83po1sghb6.fsf@gnu.org> References: <83vacu47sm.fsf@gnu.org> <83zi1sv5j5.fsf@gnu.org> <83h8nmsasr.fsf@gnu.org> <83vabuo8iy.fsf@gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: blaine.gmane.org X-Trace: blaine.gmane.org 1526719807 10819 195.159.176.226 (19 May 2018 08:50:07 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sat, 19 May 2018 08:50:07 +0000 (UTC) Cc: larsi@gnus.org, 31149@debbugs.gnu.org, monnier@IRO.UMontreal.CA To: Kenichi Handa Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Sat May 19 10:50:03 2018 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fJxYw-0002fd-Iq for geb-bug-gnu-emacs@m.gmane.org; Sat, 19 May 2018 10:50:02 +0200 Original-Received: from localhost ([::1]:42267 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fJxb3-00058k-JA for geb-bug-gnu-emacs@m.gmane.org; Sat, 19 May 2018 04:52:13 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:41055) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fJxaw-00058Q-73 for bug-gnu-emacs@gnu.org; Sat, 19 May 2018 04:52:08 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fJxas-0006Og-Gh for bug-gnu-emacs@gnu.org; Sat, 19 May 2018 04:52:06 -0400 Original-Received: from debbugs.gnu.org ([208.118.235.43]:60818) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fJxas-0006OU-CR for bug-gnu-emacs@gnu.org; Sat, 19 May 2018 04:52:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1fJxas-0007Wj-0n for bug-gnu-emacs@gnu.org; Sat, 19 May 2018 04:52:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 19 May 2018 08:52:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 31149 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 31149-submit@debbugs.gnu.org id=B31149.152671986728874 (code B ref 31149); Sat, 19 May 2018 08:52:01 +0000 Original-Received: (at 31149) by debbugs.gnu.org; 19 May 2018 08:51:07 +0000 Original-Received: from localhost ([127.0.0.1]:40482 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fJxZy-0007Vc-0c for submit@debbugs.gnu.org; Sat, 19 May 2018 04:51:07 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:50004) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fJxZt-0007V6-FN for 31149@debbugs.gnu.org; Sat, 19 May 2018 04:51:04 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fJxZl-0006Bf-0r for 31149@debbugs.gnu.org; Sat, 19 May 2018 04:50:56 -0400 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:42936) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fJxZX-00068b-Bx; Sat, 19 May 2018 04:50:39 -0400 Original-Received: from [176.228.60.248] (port=1950 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1fJxZW-0007Yr-MB; Sat, 19 May 2018 04:50:39 -0400 In-reply-to: <83vabuo8iy.fsf@gnu.org> (message from Eli Zaretskii on Fri, 11 May 2018 12:18:13 +0300) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:146288 Archived-At: Ping! Ping! Ping! Ping! > Date: Fri, 11 May 2018 12:18:13 +0300 > From: Eli Zaretskii > Cc: larsi@gnus.org, 31149@debbugs.gnu.org, monnier@IRO.UMontreal.CA > > Ping! Ping! Ping! > > > Date: Sat, 05 May 2018 12:37:24 +0300 > > From: Eli Zaretskii > > Cc: larsi@gnus.org, 31149@debbugs.gnu.org, monnier@IRO.UMontreal.CA > > > > Ping! Ping! > > > > > Date: Tue, 24 Apr 2018 21:11:10 +0300 > > > From: Eli Zaretskii > > > Cc: larsi@gnus.org, 31149@debbugs.gnu.org, monnier@IRO.UMontreal.CA > > > > > > Ping! > > > > > > > Date: Sat, 14 Apr 2018 09:32:41 +0300 > > > > From: Eli Zaretskii > > > > Cc: larsi@gnus.org, 31149@debbugs.gnu.org > > > > > > > > > From: Stefan Monnier > > > > > Date: Fri, 13 Apr 2018 16:55:26 -0400 > > > > > Cc: Lars Ingebrigtsen > > > > > > > > > > (gui-get-selection nil 'text/html) > > > > > > > > > > returns utf-16 text when the primary selection is owned by Mozilla, but > > > > > we decode it as latin-1 instead, so it looks like garbage. > > > > > > > > > > I don't know why we're getting utf-16. Is that what standards say it > > > > > should do? If so, we should adjust our code (which currently knows > > > > > nothing about the `text/html` target-type). > > > > > > > > > > As for why we decode it as latin-1, it's (under GNU/Linux; Lars may be > > > > > using something else because he's getting something with a `charset` > > > > > property which I don't get here) because: > > > > > - selection_data_to_lisp_data (in xselect.c) makes a unibyte string with > > > > > the property `foreign-selection` set to `STRING` when the actual > > > > > string type is not known (as opposed to COMPOUND-TEXT and > > > > > UTF8-STRING, basically). > > > > > - in gui-get-selection we then have a mapping from `STRING` to > > > > > `iso-8859-1` (which is apparently the right thing for the official > > > > > `STRING` target-type in X11). > > > > > > > > > > I can't figure out if/where these kinds of things about the X11 > > > > > selection protocol is described, but at least in `xclip` they have > > > > > a hack specifically for this case: > > > > > > > > > > [...] > > > > > if (html != None && sel_type == html) { > > > > > /* if the buffer contains UCS-2 (UTF-16), convert to > > > > > * UTF-8. Mozilla-based browsers do this for the > > > > > * text/html target. > > > > > */ > > > > > [...] > > > > > > > > > > and according to the subsequent code it's not even always the > > > > > same endianness. > > > > > > > > > > I don't know what is the difference between the `target-type` passed to > > > > > x-get-selection-internal and the `foreign-selection` property we get on > > > > > the returned string (they seem to be the same in my tests, except when > > > > > the type is not one of the known ones, and where we then force > > > > > `foreign-selection` to be `STRING`). > > > > > > > > I hope Handa-san (CC'ed) could comment on this. > > > > > > >