all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Kenichi Handa <handa@m17n.org>
Cc: emacs-pretest-bug@gnu.org, ihs_4664@yahoo.com,
	christopher.ian.moore@gmail.com, emacs-devel@gnu.org,
	richard.stallman@gnu.org
Subject: Re: Emacs puts binary junk into the clipboard, marking it as text
Date: Tue, 19 Sep 2006 16:14:01 +0900	[thread overview]
Message-ID: <E1GPZnt-0000jy-00@etlken> (raw)
In-Reply-To: <450F8AF7.5010702@swipnet.se> (message from Jan Djärv on Tue, 19	Sep 2006 08:15:19 +0200)

In article <450F8AF7.5010702@swipnet.se>, Jan Djärv <jan.h.d@swipnet.se> writes:

> > AFAIK, only when TEXT is requested, an selection owner can
> > choose the returning type from STRING, COMPOUND_TEXT, or
> > UTF8_STRING.  When UTF8_STRING is requested, we should
> > return it or return nothing.
> > 
> > And, if Emacs owns a unibyte string, perhaps the right thing
> > is to make it multibyte according to the current
> > lang. env. (by string-make-multibyte) at first, then encode
> > it by utf-8.

> What would that do to illegal UTF-8 sequences in the original unibyte string? 

The original unibyte string won't be in UTF-8 format.  But,
string-make-multibyte will convert it to a correct multibyte
string, thus encoding that multibyte string by UTF-8 will
produce a correct UTF-8 string ... usually.

>   I.e. will this procedure always produce valid UTF-8 data?

No.  If a byte in the original unibyte string is not a valid
code point of the primary charset of the current lang. env.,
string-make-unibyte will produce a multibyte string that
contains eight-bit-control or eight-bit-graphic character.
Then, encoding it by UTF-8 will results in incorrect UTF-8
sequence.  So, for safely, we must delete such eight-bit
characters or replace them with U+FFFD (REPLACEMENT
CHARACTER) before encoding by UTF-8.

Or, in such a case, don't return anything (which means Emacs
doesn't hold a requested data).

---
Kenichi Handa
handa@m17n.org

  reply	other threads:[~2006-09-19  7:14 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1158280855.14121.69.camel@chrislap.madeupdomain.com>
2006-09-15  7:07 ` Emacs puts binary junk into the clipboard, marking it as text Jan Djärv
2006-09-15 16:30   ` Kevin Rodgers
2006-09-16 11:31     ` Jan D.
2006-09-16 17:25       ` Jan D.
2006-09-19  5:05         ` Kenichi Handa
2006-09-19  6:15           ` Jan Djärv
2006-09-19  7:14             ` Kenichi Handa [this message]
2006-09-19 10:54           ` Stefan Monnier
2006-09-19 11:14             ` Kenichi Handa
2006-09-19 16:15               ` Stefan Monnier
2006-09-19 19:32                 ` Jan D.
2006-09-20  2:20                 ` Kenichi Handa
2006-10-19  7:19                   ` Jan Djärv

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=E1GPZnt-0000jy-00@etlken \
    --to=handa@m17n.org \
    --cc=christopher.ian.moore@gmail.com \
    --cc=emacs-devel@gnu.org \
    --cc=emacs-pretest-bug@gnu.org \
    --cc=ihs_4664@yahoo.com \
    --cc=richard.stallman@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.