From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Po Lu Newsgroups: gmane.emacs.devel Subject: Re: default charset for text/html selection in X11 Date: Thu, 22 Jun 2023 08:56:49 +0800 Message-ID: <875y7g2u26.fsf@yahoo.com> References: <87mt0sg6fc.fsf@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="1659"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Cc: emacs-devel@gnu.org To: Robert Pluim Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Thu Jun 22 02:58:04 2023 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1qC8eC-0000J3-R7 for ged-emacs-devel@m.gmane-mx.org; Thu, 22 Jun 2023 02:58:04 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qC8dI-0000sX-Ve; Wed, 21 Jun 2023 20:57:09 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qC8dF-0000sF-1e for emacs-devel@gnu.org; Wed, 21 Jun 2023 20:57:05 -0400 Original-Received: from sonic315-21.consmr.mail.ne1.yahoo.com ([66.163.190.147]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qC8dC-00020A-TZ for emacs-devel@gnu.org; Wed, 21 Jun 2023 20:57:04 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1687395420; bh=KynjTD3RS7TDFrc8B2kIfi40RjO0wVdsX/n5N3FUic0=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From:Subject:Reply-To; b=h4IuTajJWpKbZJ5u2qXy8aoMrCW5/yfkeoiqKBdAUDlfkX8FQj3nmMGh2kiDiltiyFl4xLdRdZRiossQSRnzBSVY0uFcbt3/3LTx8I5ct5bfqY7htpBq+zSi9fJirXBYgg1r3+0oZoMlIssmVIc3mhRYZm6FrxVrkxUYVj91WZmuqa2Qv5CBU7zFxczD7sO2I0gCQYtQSA9q4Vlsp1CHqbII/LqoUS60NtjGuj2MSYj7orQWgyXdGTk6EleUwJ39ySXQbf2h+yP5t+ucrUgHHxNYqCr4B+Fh5OIJUwpxCnqgE3TCGNvtQ0n0vE7btOAwYnJftxU7gtvjpx6J4XyR2g== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1687395420; bh=HzDdBmhBM7yjxzxXeXCy2btgB2M5H+udtPuxKs1lyF2=; h=X-Sonic-MF:From:To:Subject:Date:From:Subject; b=cll/0QgUouw13ODISg7VDvXuzGUHuLZkKRcVllmU1X3Zrhnknovnmkgv5/XsHD2SJRLnyYibw2iR8FLBXLjL9AojnFFgSu4Whc7bpglRDfz7xgg3U2jXyovvphFDuJW6X5b6BDqSSjGXgW5xKJelFqrbiJzz47SF8ZK80wU1cErxrxuU3VhlktV1FY0EDdPEWP/IibwKbAv0I9O5NLPBdVnvZY8chFdLhYhHs6VNV+sxmr9/g1upK5mORKWmo5oPrlRA/DvTNCsME+vZ9CD2R8G6/GpNTiUslG8nnd6R7NSiuPi0UK7n+nse56GE37OLI0wqBcfBvFWUAwADwqYHRA== X-YMail-OSG: vdLnGBQVM1ka7mJR.p7HVerO2JXhZLJik.sEc6xPbQxva2w9PXPZFuXngMNstEW sYpyy_OIPIwixvgaP4aKTEVa4QuPLKPjo9beSgRlrHh5xV7htIdlOoQcnAZYnrXnT1rls2MbXg9Q ycbzNlq4Yvtgqtq7QlhNzk5Clk9oBU.BC4257TiVOysJtxbgsFrCxZCd3DXhpreCEd1fJJohORrX FEi6YLgDBeC6QY.DQpo3njnkCCGNz0uJl1lYcewZUSwfzQHH389LSXaqnxSgEnQChmjgkEF9Azci ITslFOS3JIf5BlZslHIZ3DE6ZKvuDAjkfjfOGH6zF4JvJtQJHUrsKahkmURAs7seQrRPYi4bn5wr rjPBdJ3cJqW4VwSnZKqP1pvnXKV2aO.FrjAZvz7IOwiF8H7HnCGpWNa3UKwtNfvizwBE7vcK4pz9 4x1tqNrhObmmhhdVx1j2t.dBND.Z3gZfflH7tuNDrGk3VKatlyJ3jHY27KrthQ8E.CZW8BO0aB5V AKCyxmAZiX9e1GrVuPQxHUemJBxB3mxshENT9ZNfUWQ4HQM0K0Q2jepuhtd_Sw3Vx9AVtAoFEN35 YfPcVVxh5ef8vdzn8UUEf.oMRNcoCqI_1dXkYuuLq4SjSGyq8BlEu4l8Mkmiv_3zVLNosjAu9g9i MnBMQULVYWtbV_89CXssX7xfNReyeBUyHvvVeJq7rGi5cDnVD2ZDv.iQc36JVowP.cGvjH.Avu_N h7pSVokfmmA3kHUspi8gQEZjfgsIZb_Z_URGrLyBMbIU_HD6JvU8XXqmlkxx1saj.5hxEMRZiRNi 6QoXQZx25MIlS561r.Li.1P.e_2k4Ut3yDFmDdXx8K X-Sonic-MF: X-Sonic-ID: 12017c1a-d85a-48a9-8869-fcb3ecaa60ba Original-Received: from sonic.gate.mail.ne1.yahoo.com by sonic315.consmr.mail.ne1.yahoo.com with HTTP; Thu, 22 Jun 2023 00:57:00 +0000 Original-Received: by hermes--production-sg3-748897c457-xsv6g (Yahoo Inc. Hermes SMTP Server) with ESMTPA ID 3a873da896dbc1075b70f020231ce3c4; Thu, 22 Jun 2023 00:56:54 +0000 (UTC) In-Reply-To: <87mt0sg6fc.fsf@gmail.com> (Robert Pluim's message of "Wed, 21 Jun 2023 17:51:19 +0200") X-Mailer: WebService/1.1.21557 mail.backend.jedi.jws.acl:role.jedi.acl.token.atz.jws.hermes.yahoo Received-SPF: pass client-ip=66.163.190.147; envelope-from=luangruo@yahoo.com; helo=sonic315-21.consmr.mail.ne1.yahoo.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:307118 Archived-At: Robert Pluim writes: > Hi, > > I=CA=BCve been playing around with the `yank-media' stuff Lars added, and > I=CA=BCve noticed that when yanking a selection with mime-type text/html > from Chromium, what I=CA=BCm getting is a utf-8 encoded string, which mak= es > this: > > (defun html-mode--html-yank-handler (_type html) > (save-restriction > (insert html) > (ignore-errors > (sgml-pretty-print (point-min) (point-max))))) > > insert any codepoints > 127 as their constituent raw bytes > instead, eg U+A0 ends up as \xc2\xa0 in the buffer. > > I *think* it should be OK to assume utf-8 here, and thus do: > > (defun html-mode--html-yank-handler (_type html) > (save-restriction > (insert (decode-coding-string html 'utf-8 t)) > (ignore-errors > (sgml-pretty-print (point-min) (point-max))))) > > but I can=CA=BCt find a normative reference for that (if this was http, t= he > default charset would be iso-8859-1, but this isn=CA=BCt http). > > Robert What is the type of the string? IOW, what's (get-text-property html 'foreign-selection) ? This should be one of the usual X11 string formats: STRING (iso-latin-1), COMPOUND_TEXT (compound-text-with-extensions), or UTF8_STRING (utf-8). If it's anything else, Emacs should try to detect the encoding automatically, and fall back to Latin-1 if that fails. Thanks.