From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.bugs Subject: bug#31149: 27.0.50; (gui-get-selection nil 'text/html) returns mis-decoded text Date: Fri, 13 Apr 2018 16:55:26 -0400 Message-ID: NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: blaine.gmane.org 1523652856 24332 195.159.176.226 (13 Apr 2018 20:54:16 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Fri, 13 Apr 2018 20:54:16 +0000 (UTC) Cc: Lars Ingebrigtsen To: 31149@debbugs.gnu.org Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Fri Apr 13 22:54:11 2018 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1f75hw-0006Au-CD for geb-bug-gnu-emacs@m.gmane.org; Fri, 13 Apr 2018 22:54:08 +0200 Original-Received: from localhost ([::1]:53822 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1f75k3-00010R-2P for geb-bug-gnu-emacs@m.gmane.org; Fri, 13 Apr 2018 16:56:19 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:43493) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1f75jr-0000zt-Kc for bug-gnu-emacs@gnu.org; Fri, 13 Apr 2018 16:56:09 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1f75jn-0002hn-Cb for bug-gnu-emacs@gnu.org; Fri, 13 Apr 2018 16:56:07 -0400 Original-Received: from debbugs.gnu.org ([208.118.235.43]:43090) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1f75jn-0002he-8D for bug-gnu-emacs@gnu.org; Fri, 13 Apr 2018 16:56:03 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1f75jm-0001yw-Iy; Fri, 13 Apr 2018 16:56:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Stefan Monnier Original-Sender: "Debbugs-submit" Resent-CC: larsi@gnus.org, bug-gnu-emacs@gnu.org Resent-Date: Fri, 13 Apr 2018 20:56:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 31149 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: X-Debbugs-Original-To: bug-gnu-emacs@gnu.org X-Debbugs-Original-Xcc: Lars Ingebrigtsen Original-Received: via spool by submit@debbugs.gnu.org id=B.15236529437592 (code B ref -1); Fri, 13 Apr 2018 20:56:02 +0000 Original-Received: (at submit) by debbugs.gnu.org; 13 Apr 2018 20:55:43 +0000 Original-Received: from localhost ([127.0.0.1]:50986 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1f75jT-0001yN-5M for submit@debbugs.gnu.org; Fri, 13 Apr 2018 16:55:43 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:52185) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1f75jQ-0001yA-WA for submit@debbugs.gnu.org; Fri, 13 Apr 2018 16:55:41 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1f75jK-0002TK-6d for submit@debbugs.gnu.org; Fri, 13 Apr 2018 16:55:35 -0400 Original-Received: from lists.gnu.org ([2001:4830:134:3::11]:56963) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1f75jK-0002TD-2k for submit@debbugs.gnu.org; Fri, 13 Apr 2018 16:55:34 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:43128) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1f75jI-0000pC-8d for bug-gnu-emacs@gnu.org; Fri, 13 Apr 2018 16:55:33 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1f75jE-0002Qq-5V for bug-gnu-emacs@gnu.org; Fri, 13 Apr 2018 16:55:32 -0400 Original-Received: from pruche.dit.umontreal.ca ([132.204.246.22]:42346) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1f75jD-0002Q6-UD for bug-gnu-emacs@gnu.org; Fri, 13 Apr 2018 16:55:28 -0400 Original-Received: from ceviche.home (lechon.iro.umontreal.ca [132.204.27.242]) by pruche.dit.umontreal.ca (8.14.7/8.14.1) with ESMTP id w3DKtQDG024538 for ; Fri, 13 Apr 2018 16:55:26 -0400 Original-Received: by ceviche.home (Postfix, from userid 20848) id 37EBF6639A; Fri, 13 Apr 2018 16:55:26 -0400 (EDT) X-NAI-Spam-Flag: NO X-NAI-Spam-Level: X-NAI-Spam-Threshold: 5 X-NAI-Spam-Score: 0.9 X-NAI-Spam-Rules: 5 Rules triggered BEC_TRC1=0.4, BEC_TRC1_W_GEN_SPAM_FEATRE=0.4, GEN_SPAM_FEATRE=0.1, EDT_SA_DN_PASS=0, RV6264=0 X-NAI-Spam-Version: 2.3.0.9418 : core <6264> : inlines <6560> : streams <1783938> : uri <2625014> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:145310 Archived-At: Package: Emacs Version: 27.0.50 (gui-get-selection nil 'text/html) returns utf-16 text when the primary selection is owned by Mozilla, but we decode it as latin-1 instead, so it looks like garbage. I don't know why we're getting utf-16. Is that what standards say it should do? If so, we should adjust our code (which currently knows nothing about the `text/html` target-type). As for why we decode it as latin-1, it's (under GNU/Linux; Lars may be using something else because he's getting something with a `charset` property which I don't get here) because: - selection_data_to_lisp_data (in xselect.c) makes a unibyte string with the property `foreign-selection` set to `STRING` when the actual string type is not known (as opposed to COMPOUND-TEXT and UTF8-STRING, basically). - in gui-get-selection we then have a mapping from `STRING` to `iso-8859-1` (which is apparently the right thing for the official `STRING` target-type in X11). I can't figure out if/where these kinds of things about the X11 selection protocol is described, but at least in `xclip` they have a hack specifically for this case: [...] if (html != None && sel_type == html) { /* if the buffer contains UCS-2 (UTF-16), convert to * UTF-8. Mozilla-based browsers do this for the * text/html target. */ [...] and according to the subsequent code it's not even always the same endianness. I don't know what is the difference between the `target-type` passed to x-get-selection-internal and the `foreign-selection` property we get on the returned string (they seem to be the same in my tests, except when the type is not one of the known ones, and where we then force `foreign-selection` to be `STRING`). Stefan In GNU Emacs 27.0.50 (build 1, i686-pc-linux-gnu, GTK+ Version 2.24.32) of 2018-03-23 built on ceviche Repository revision: ef4cd3805771e2cccd395d0f0b35f56816940508 Windowing system distributor 'The X.Org Foundation', version 11.0.11906000 System Description: Debian GNU/Linux buster/sid Recent messages: Saving file /home/monnier/src/emacs/work/src/xselect.c... Wrote /home/monnier/src/emacs/work/src/xselect.c Mark set user-error: Minibuffer window is not active Mark set Mark saved where search started Mark set Making completion list... [2 times] Quit [2 times] Mark set Configured using: 'configure -C --enable-checking --with-modules --enable-check-lisp-object-type 'CFLAGS=-Wall -g3 -Og -Wno-pointer-sign' PKG_CONFIG_PATH=/home/monnier/lib/pkgconfig' Configured features: XPM JPEG TIFF GIF PNG RSVG SOUND GPM DBUS GSETTINGS NOTIFY GNUTLS LIBXML2 FREETYPE M17N_FLT LIBOTF XFT ZLIB TOOLKIT_SCROLL_BARS GTK2 X11 MODULES THREADS Important settings: value of $LANG: fr_CH.UTF-8 locale-coding-system: utf-8-unix Major mode: InactiveMinibuffer Minor modes in effect: csv-field-index-mode: t shell-dirtrack-mode: t diff-auto-refine-mode: t electric-pair-mode: t global-reveal-mode: t reveal-mode: t auto-insert-mode: t savehist-mode: t minibuffer-electric-default-mode: t global-compact-docstrings-mode: t url-handler-mode: t global-eldoc-mode: t electric-indent-mode: t mouse-wheel-mode: t global-prettify-symbols-mode: t menu-bar-mode: t file-name-shadow-mode: t global-font-lock-mode: t auto-composition-mode: t auto-encryption-mode: t auto-compression-mode: t line-number-mode: t transient-mark-mode: t Load-path shadows: /home/monnier/src/emacs/elpa/packages/svg/svg hides /home/monnier/src/emacs/work/lisp/svg /home/monnier/src/emacs/elpa/packages/ada-mode/ada-xref hides /home/monnier/src/emacs/work/lisp/progmodes/ada-xref /home/monnier/src/emacs/elpa/packages/ada-mode/ada-mode hides /home/monnier/src/emacs/work/lisp/progmodes/ada-mode /home/monnier/src/emacs/elpa/packages/ada-mode/ada-stmt hides /home/monnier/src/emacs/work/lisp/progmodes/ada-stmt /home/monnier/src/emacs/elpa/packages/ada-mode/ada-prj hides /home/monnier/src/emacs/work/lisp/progmodes/ada-prj /home/monnier/src/emacs/elpa/packages/hyperbole/set hides /home/monnier/src/emacs/work/lisp/emacs-lisp/set /home/monnier/src/emacs/elpa/packages/landmark/landmark hides /home/monnier/src/emacs/work/lisp/obsolete/landmark /home/monnier/src/emacs/elpa/packages/crisp/crisp hides /home/monnier/src/emacs/work/lisp/obsolete/crisp Features: (mule-diag csv-mode mailcap reporter debian-bug debian-el-loaddefs image-file iimage skeleton html5-schema rng-xsd xsd-regexp rng-cmpct rng-nxml nxml-mode nxml-outln nxml-rap sgml-mode dom reftex-dcr reftex reftex-loaddefs reftex-vars latexenc sort mail-extr emacsbug tildify rst rng-valid refer refer-to-bibtex refbib printing picture nroff-mode enriched ebnf2ps ps-print ps-print-loaddefs ps-def lpr delim-col bib-mode view cal-china lunar solar cal-dst cal-bahai cal-islam cal-hebrew holidays hol-loaddefs cal-french diary-lib diary-loaddefs cal-move battery log-view srecode/document semantic/doc srecode/semantic semantic/senator semantic/decorate semantic/ctxt semantic/format srecode/extract srecode/insert srecode/filters srecode/find srecode/map srecode/ctxt semantic/tag-ls semantic/find srecode/compile semantic/util-modes semantic/util semantic semantic/tag semantic/lex semantic/fw srecode/args ede/speedbar ede/files ede ede/detect ede/base ede/auto ede/source eieio-speedbar eieio-custom cedet srecode/dictionary srecode/table eieio-base srecode mode-local informat texinfo tex-mode vc-dir grep rect gdb-mi bindat gud ffap cl-print ox-odt rng-loc rng-uri rng-parse rng-match rng-dt rng-util rng-pttrn nxml-parse nxml-ns nxml-enc xmltok nxml-util ox-latex ox-icalendar ox-html table ox-ascii ox-publish ox org-protocol org-mouse org-mobile org-agenda org-indent org-feed org-crypt org-capture org-attach org-id org-rmail org-mhe org-irc org-info org-gnus nnir gnus-sum gnus-group gnus-undo gnus-start gnus-cloud nnimap nnmail mail-source tls gnutls utf7 netrc nnoo parse-time gnus-spec gnus-int gnus-range gnus-win gnus nnheader org-docview org-bibtex bibtex org-bbdb org-w3m org-element avl-tree generator org org-macro org-footnote org-pcomplete org-list org-faces org-entities org-version ob-emacs-lisp ob ob-tangle org-src ob-ref ob-lob ob-table ob-keys ob-exp ob-comint ob-core ob-eval org-compat org-macs org-loaddefs cal-menu calendar cal-loaddefs autorevert filenotify doc-view jka-compr image-mode vc-bzr vc-src vc-sccs vc-svn vc-cvs vc-rcs dabbrev log-edit message sendmail rmc puny dired dired-loaddefs format-spec rfc822 mml mml-sec gnus-util rmail rmail-loaddefs mm-decode mm-bodies mm-encode mail-parse rfc2231 rfc2047 rfc2045 mm-util ietf-drums mail-prsvr mailabbrev mail-utils mailheader pcvs-util bug-reference add-log sh-script make-mode autoload shell pcomplete pulse etags xref project epa-file epa derived epg sm-c-mode smie whitespace misearch multi-isearch eieio-opt speedbar sb-image ezimage dframe cl-extra help-fns radix-tree executable copyright lisp-mnt xscheme unsafep trace testcover shadow scheme re-builder profiler inf-lisp ielm gmm-utils ert pp find-func ewoc debug elp edebug cl-indent cus-edit cus-start cus-load wid-edit vc vc-dispatcher smerge-mode vc-git diff-mode filecache server time-date flymake-proc flymake compile comint ansi-color ring warnings noutline outline easy-mmode flyspell ispell checkdoc thingatpt help-mode load-dir elec-pair reveal autoinsert proof-site proof-autoloads cl pg-vars savehist minibuf-eldef disp-table compact-docstrings cl-seq inline kotl-autoloads advice info realgud-recursive-autoloads finder-inf url-auth package easymenu epg-config url-handlers url-parse auth-source eieio eieio-core cl-macs eieio-loaddefs password-cache json map url-vars seq byte-opt gv bytecomp byte-compile cconv cl-loaddefs cl-lib mule-util tooltip eldoc electric uniquify ediff-hook vc-hooks lisp-float-type mwheel term/x-win x-win term/common-win x-dnd tool-bar dnd fontset image regexp-opt fringe tabulated-list replace newcomment text-mode elisp-mode lisp-mode prog-mode register page menu-bar rfn-eshadow isearch timer select scroll-bar mouse jit-lock font-lock syntax font-core term/tty-colors frame cl-generic cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech european ethiopic indian cyrillic chinese composite charscript charprop case-table epa-hook jka-cmpr-hook help simple abbrev obarray minibuffer cl-preloaded nadvice loaddefs button faces cus-face macroexp files text-properties overlay sha1 md5 base64 format env code-pages mule custom widget hashtable-print-readable backquote dbusbind inotify dynamic-setting system-font-setting font-render-setting move-toolbar gtk x-toolkit x multi-tty make-network-process emacs) Memory information: ((conses 8 904625 146270) (symbols 24 56914 156) (miscs 20 15608 1993) (strings 16 269351 14086) (string-bytes 1 8339699) (vectors 12 109056) (vector-slots 4 3333709 279700) (floats 8 1341 1410) (intervals 28 57426 412) (buffers 536 153))