From: Kenichi Handa <handa@m17n.org>
Cc: christopher.ian.moore@gmail.com, emacs-pretest-bug@gnu.org,
ihs_4664@yahoo.com, richard.stallman@gnu.org,
emacs-devel@gnu.org
Subject: Re: Emacs puts binary junk into the clipboard, marking it as text
Date: Wed, 20 Sep 2006 11:20:43 +0900 [thread overview]
Message-ID: <E1GPrhb-0006Ox-00@etlken> (raw)
In-Reply-To: <jwvk63z7s2s.fsf-monnier+emacs@gnu.org> (message from Stefan Monnier on Tue, 19 Sep 2006 12:15:58 -0400)
In article <jwvk63z7s2s.fsf-monnier+emacs@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes:
> I think we can't know what should be done, so we should strive for
> simplicity and try to avoid losing information. I.e. just return the
> unibyte string as-is.
Even if it doesn't conform to ICCCM? I'll attach the
relevant part of that document.
"Jan D." <jan.h.d@swipnet.se> writes:
> W.r.t the standards, Emacs has two choices, return a valid UTF8-string
> or don't return anything at all. I'm beginning to think the second
> option is the best.
This will be useful for checking UTF-8 validity.
(define-ccl-program ccl-check-utf-8
'(0
((r0 = 1)
(loop
(read-if (r1 < #x80) (repeat)
((r0 = 0)
(if (r1 < #xC2) (end))
(read r2)
(if ((r2 & #xC0) != #x80) (end))
(if (r1 < #xE0) ((r0 = 1) (repeat)))
(read r2)
(if ((r2 & #xC0) != #x80) (end))
(if (r1 < #xF0) ((r0 = 1) (repeat)))
(read r2)
(if ((r2 & #xC0) != #x80) (end))
(if (r1 < #xF8) ((r0 = 1) (repeat)))
(read r2)
(if ((r2 & #xC0) != #x80) (end))
(if (r1 == #xF8) ((r0 = 1) (repeat)))
(end))))))
"Check if the input unibyte string is a valid UTF-8 sequence or not.
If it is valid, set the register `r0' to 1, else set it to 0.")
(defun string-utf-8-p (string)
"Return non-nil iff STRING is a unibyte string of valid UTF-8 sequence."
(if (or (not (stringp string))
(multibyte-string-p string))
(error "Not a unibyte string: %s" string))
(let ((status (make-vector 9 0)))
(ccl-execute-on-string ccl-check-utf-8 status string)
(= (aref status 0) 1)))
---
Kenichi Handa
handa@m17n.org
Inter-Client Communication Conventions Manual
Version 2.0.xf86.1
[...]
2.7. Use of Selection Properties
The names of the properties used in selection data transfer
are chosen by the requestor. The use of None property
fields in ConvertSelection requests (which request the
selection owner to choose a name) is not permitted by these
conventions.
The selection owner always chooses the type of the property
in the selection data transfer. Some types have special
semantics assigned by convention, and these are reviewed in
the following sections.
In all cases, a request for conversion to a target should
return either a property of one of the types listed in the
previous table for that target or a property of type INCR
and then a property of one of the listed types.
Certain selection properties may contain resource IDs. The
selection owner should ensure that the resource is not
destroyed and that its contents are not changed until after
the selection transfer is complete. Requestors that rely on
the existence or on the proper contents of a resource must
operate on the resource (for example, by copying the con-
tents of a pixmap) before deleting the selection property.
The selection owner will return a list of zero or more items
of the type indicated by the property type. In general, the
number of items in the list will correspond to the number of
disjoint parts of the selection. Some targets (for example,
side-effect targets) will be of length zero irrespective of
the number of disjoint selection parts. In the case of
fixed-size items, the requestor may determine the number of
items by the property size. Selection property types are
listed in the table below. For variable-length items such
as text, the separators are also listed.
-------------------------------------
Type Atom Format Separator
-------------------------------------
APPLE_PICT 8 Self-sizing
ATOM 32 Fixed-size
ATOM_PAIR 32 Fixed-size
BITMAP 32 Fixed-size
C_STRING 8 Zero
COLORMAP 32 Fixed-size
COMPOUND_TEXT 8 Zero
DRAWABLE 32 Fixed-size
INCR 32 Fixed-size
INTEGER 32 Fixed-size
PIXEL 32 Fixed-size
PIXMAP 32 Fixed-size
SPAN 32 Fixed-size
STRING 8 Zero
UTF8_STRING 8 Zero
WINDOW 32 Fixed-size
-------------------------------------
It is expected that this table will grow over time.
2.7.1. TEXT Properties
In general, the encoding for the characters in a text string
property is specified by its type. It is highly desirable
for there to be a simple, invertible mapping between string
property types and any character set names embedded within
font names in any font naming standard adopted by the Con-
sortium.
The atom TEXT is a polymorphic target. Requesting conver-
sion into TEXT will convert into whatever encoding is conve-
nient for the owner. The encoding chosen will be indicated
by the type of the property returned. TEXT is not defined
as a type; it will never be the returned type from a selec-
tion conversion request.
If the requestor wants the owner to return the contents of
the selection in a specific encoding, it should request con-
version into the name of that encoding.
In the table in section 2.6.2, the word TEXT (in the Type
column) is used to indicate one of the registered encoding
names. The type would not actually be TEXT; it would be
STRING or some other ATOM naming the encoding chosen by the
owner.
STRING as a type or a target specifies the ISO Latin-1 char-
acter set plus the control characters TAB (hex 09) and NEW-
LINE (hex 0A). The spacing interpretation of TAB is context
dependent. Other ASCII control characters are explicitly
not included in STRING at the present time.
COMPOUND_TEXT as a type or a target specifies the Compound
Text interchange format; see the Compound Text Encoding.
UTF8_STRING as a type or a target specifies an UTF-8 encoded
string, with NEWLINE (U+000A, hex 0A) as end-of-line marker.
There are some text objects where the source or intended
user, as the case may be, does not have a specific character
set for the text, but instead merely requires a zero-termi-
nated sequence of bytes with no other restriction; no ele-
ment of the selection mechanism may assume that any byte
value is forbidden or that any two differing sequences are
equivalent.8 For these objects, the type C_STRING should be
used.
Rationale
An example of the need for C_STRING is to transmit
the names of files; many operating systems do not
interpret filenames as having a character set. For
example, the same character string uses a differ-
ent sequence of bytes in ASCII and EBCDIC, and so
most operating systems see these as different
filenames and offer no way to treat them as the
same. Thus no character-set based property type is
suitable.
Type STRING, COMPOUND_TEXT, UTF8_STRING, and C_STRING prop-
erties will consist of a list of elements separated by null
characters; other encodings will need to specify an appro-
priate list format.
next prev parent reply other threads:[~2006-09-20 2:20 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1158280855.14121.69.camel@chrislap.madeupdomain.com>
2006-09-15 7:07 ` Emacs puts binary junk into the clipboard, marking it as text Jan Djärv
2006-09-15 16:30 ` Kevin Rodgers
2006-09-16 11:31 ` Jan D.
2006-09-16 17:25 ` Jan D.
2006-09-19 5:05 ` Kenichi Handa
2006-09-19 6:15 ` Jan Djärv
2006-09-19 7:14 ` Kenichi Handa
2006-09-19 10:54 ` Stefan Monnier
2006-09-19 11:14 ` Kenichi Handa
2006-09-19 16:15 ` Stefan Monnier
2006-09-19 19:32 ` Jan D.
2006-09-20 2:20 ` Kenichi Handa [this message]
2006-10-19 7:19 ` Jan Djärv
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=E1GPrhb-0006Ox-00@etlken \
--to=handa@m17n.org \
--cc=christopher.ian.moore@gmail.com \
--cc=emacs-devel@gnu.org \
--cc=emacs-pretest-bug@gnu.org \
--cc=ihs_4664@yahoo.com \
--cc=richard.stallman@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.