From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Lars Ingebrigtsen Newsgroups: gmane.emacs.bugs Subject: bug#31149: 27.0.50; (gui-get-selection nil 'text/html) returns mis-decoded text Date: Sun, 29 Sep 2019 10:44:48 +0200 Message-ID: <87h84vqynz.fsf@gnus.org> References: Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="179609"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux) Cc: 31149@debbugs.gnu.org To: Stefan Monnier Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Sun Sep 29 10:45:12 2019 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1iEUpM-000kYz-3d for geb-bug-gnu-emacs@m.gmane.org; Sun, 29 Sep 2019 10:45:12 +0200 Original-Received: from localhost ([::1]:37438 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iEUpK-0002kf-MV for geb-bug-gnu-emacs@m.gmane.org; Sun, 29 Sep 2019 04:45:10 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:59811) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iEUpD-0002kJ-Qz for bug-gnu-emacs@gnu.org; Sun, 29 Sep 2019 04:45:04 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1iEUpC-0001pX-Km for bug-gnu-emacs@gnu.org; Sun, 29 Sep 2019 04:45:03 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:43395) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1iEUpC-0001pS-HW for bug-gnu-emacs@gnu.org; Sun, 29 Sep 2019 04:45:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1iEUpC-0001qU-C0 for bug-gnu-emacs@gnu.org; Sun, 29 Sep 2019 04:45:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Lars Ingebrigtsen Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 29 Sep 2019 08:45:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 31149 X-GNU-PR-Package: emacs Original-Received: via spool by 31149-submit@debbugs.gnu.org id=B31149.15697466967065 (code B ref 31149); Sun, 29 Sep 2019 08:45:02 +0000 Original-Received: (at 31149) by debbugs.gnu.org; 29 Sep 2019 08:44:56 +0000 Original-Received: from localhost ([127.0.0.1]:52216 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1iEUp6-0001pt-32 for submit@debbugs.gnu.org; Sun, 29 Sep 2019 04:44:56 -0400 Original-Received: from quimby.gnus.org ([80.91.231.51]:48908) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1iEUp4-0001pj-3z for 31149@debbugs.gnu.org; Sun, 29 Sep 2019 04:44:55 -0400 Original-Received: from cm-84.212.202.86.getinternet.no ([84.212.202.86] helo=marnie) by quimby.gnus.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1iEUoy-0006ej-Em; Sun, 29 Sep 2019 10:44:50 +0200 In-Reply-To: (Stefan Monnier's message of "Fri, 13 Apr 2018 16:55:26 -0400") X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.51.188.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:167628 Archived-At: Stefan Monnier writes: > (gui-get-selection nil 'text/html) > > returns utf-16 text when the primary selection is owned by Mozilla, but > we decode it as latin-1 instead, so it looks like garbage. This is still the case on the trunk: #("=C3=BF=C3=BEM^@e^@r^@g^@e^@d^@" 0 14 (foreign-selection STRING charset i= so-8859-1)) [...] > I can't figure out if/where these kinds of things about the X11 > selection protocol is described, but at least in `xclip` they have > a hack specifically for this case: > > [...] > if (html !=3D None && sel_type =3D=3D html) { > /* if the buffer contains UCS-2 (UTF-16), convert to > * UTF-8. Mozilla-based browsers do this for the > * text/html target. > */ > [...] > > and according to the subsequent code it's not even always the > same endianness. I think it would make sense for us to do the same here. It should be easy enough for us to detect that the string is utf-16, I think? The data has a BOM and everything... --=20 (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no