From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Philipp Stephani
> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Sun, 22 Nov 2015 18:19:29 +0000
> Cc: tzz@lifelogs= .com, aurelien.aptel+emacs@gmail.com, emacs-devel@gnu.org
>
>=C2=A0 =C2=A0 =C2=A0I already suggested what we should say in the docum= entation: that
>=C2=A0 =C2=A0 =C2=A0these interfaces accept and produce UTF-8 encoded n= on-ASCII text.
>
>
> If the interface accepts UTF-8, then it must signal an error for inval= id
> sequences; the Unicode standard mandates this.
The Unicode standard cannot mandate anything for Emacs, because Emacs
is not subject to Unicode standardization.
> If the interface produces UTF-8, then it must only ever produce valid<= br> > sequences
As I explained, this would violate the basic expectation from a text
editing program.
> That's why I propose to not encode raw bytes as bytes, but as the = Emacs integer
> codes used to represent them.
If we do that, no external code will be able to do anything useful
with such "bytes".=C2=A0 Module authors will have to write their = own
replacements for library functions.=C2=A0 This will never be accepted by
our users.
> No matter what we expect or tolerate, we need to state that.
No, we don't.=C2=A0 When the callers violate the contract, they cannot<= br> expect to know in detail what will happen.=C2=A0 If they want to know, they=
will have to read the source.
> Module authors are not end users.
They are users like anyone who writes Lisp.=C2=A0 They came to expect that<= br> Emacs behaves in certain ways, and modules should follow suit.
> I agree that end users should not see errors on decoding failure,
> but modules use only programmatic access, where we can be more
> strict.
You cannot be more strict, unless you rewrite the whole
encoding/decoding machinery, or write specialized code to detect and
reject invalid UTF-8 before it is passed to a decoder.=C2=A0 There are no good reasons to do either, so let's not.
> An Emacs string is a sequence of integers.
No, it's a sequence of bytes.
> I agree that we shouldn't add such limitations. But I disagree tha= t we should
> leave the behavior undocumented in such cases.
OK, so let's agree to disagree.=C2=A0 If that disagreement gets in your= way
of fixing the issues related to this discussion, please say so, and I
will fix them myself