From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel,gmane.emacs.pretest.bugs Subject: Re: Emacs puts binary junk into the clipboard, marking it as text Date: Tue, 19 Sep 2006 16:14:01 +0900 Message-ID: References: <1158280855.14121.69.camel@chrislap.madeupdomain.com> <450A514E.6020205@swipnet.se> <450BE084.10905@swipnet.se> <450C3380.2050008@swipnet.se> <450F8AF7.5010702@swipnet.se> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Trace: sea.gmane.org 1158650156 27285 80.91.229.2 (19 Sep 2006 07:15:56 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Tue, 19 Sep 2006 07:15:56 +0000 (UTC) Cc: emacs-pretest-bug@gnu.org, ihs_4664@yahoo.com, christopher.ian.moore@gmail.com, emacs-devel@gnu.org, richard.stallman@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Sep 19 09:15:50 2006 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1GPZpV-0001e1-Si for ged-emacs-devel@m.gmane.org; Tue, 19 Sep 2006 09:15:42 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1GPZpV-0002sq-6D for ged-emacs-devel@m.gmane.org; Tue, 19 Sep 2006 03:15:41 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1GPZpG-0002oz-Ct for emacs-devel@gnu.org; Tue, 19 Sep 2006 03:15:26 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1GPZpD-0002lZ-Mv for emacs-devel@gnu.org; Tue, 19 Sep 2006 03:15:25 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1GPZpD-0002lI-JD; Tue, 19 Sep 2006 03:15:23 -0400 Original-Received: from [150.29.246.133] (helo=mx1.aist.go.jp) by monty-python.gnu.org with esmtp (Exim 4.52) id 1GPZsD-00012O-Na; Tue, 19 Sep 2006 03:18:30 -0400 Original-Received: from smtp3.aist.go.jp ([150.29.246.12]) by mx1.aist.go.jp with ESMTP id k8J7FIVf010472; Tue, 19 Sep 2006 16:15:18 +0900 (JST) env-from (handa@m17n.org) Original-Received: by smtp3.aist.go.jp with ESMTP id k8J7FFHA002856; Tue, 19 Sep 2006 16:15:15 +0900 (JST) env-from (handa@m17n.org) Original-Received: from handa by etlken with local (Exim 3.36 #1 (Debian)) id 1GPZnt-0000jy-00; Tue, 19 Sep 2006 16:14:01 +0900 Original-To: Jan =?ISO-8859-1?Q?Dj=E4rv?= In-reply-to: <450F8AF7.5010702@swipnet.se> (message from Jan =?ISO-8859-1?Q?Dj=E4rv?= on Tue, 19 Sep 2006 08:15:19 +0200) User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/22.0.50 (i686-pc-linux-gnu) MULE/5.0 (SAKAKI) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:60000 gmane.emacs.pretest.bugs:13995 Archived-At: In article <450F8AF7.5010702@swipnet.se>, Jan Dj=E4rv = writes: > > AFAIK, only when TEXT is requested, an selection owner can > > choose the returning type from STRING, COMPOUND_TEXT, or > > UTF8_STRING. When UTF8_STRING is requested, we should > > return it or return nothing. > >=20 > > And, if Emacs owns a unibyte string, perhaps the right thing > > is to make it multibyte according to the current > > lang. env. (by string-make-multibyte) at first, then encode > > it by utf-8. > What would that do to illegal UTF-8 sequences in the original unibyte str= ing?=20 The original unibyte string won't be in UTF-8 format. But, string-make-multibyte will convert it to a correct multibyte string, thus encoding that multibyte string by UTF-8 will produce a correct UTF-8 string ... usually. > I.e. will this procedure always produce valid UTF-8 data? No. If a byte in the original unibyte string is not a valid code point of the primary charset of the current lang. env., string-make-unibyte will produce a multibyte string that contains eight-bit-control or eight-bit-graphic character. Then, encoding it by UTF-8 will results in incorrect UTF-8 sequence. So, for safely, we must delete such eight-bit characters or replace them with U+FFFD (REPLACEMENT CHARACTER) before encoding by UTF-8. Or, in such a case, don't return anything (which means Emacs doesn't hold a requested data). --- Kenichi Handa handa@m17n.org