From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Philipp Stephani Newsgroups: gmane.emacs.devel Subject: Re: Dynamic loading progress Date: Sun, 22 Nov 2015 19:37:43 +0000 Message-ID: References: <83k2ptq5t3.fsf@gnu.org> <87h9kxx60e.fsf@lifelogs.com> <877flswse5.fsf@lifelogs.com> <8737wgw7kf.fsf@lifelogs.com> <87io5bv1it.fsf@lifelogs.com> <87egfzuwca.fsf@lifelogs.com> <876118u6f2.fsf@lifelogs.com> <8737w3qero.fsf@lifelogs.com> <831tbn9g9j.fsf@gnu.org> <878u5upw7o.fsf@lifelogs.com> <83ziya8xph.fsf@gnu.org> <83y4du80xo.fsf@gnu.org> <837fld6lps.fsf@gnu.org> <83si3z4s5n.fsf@gnu.org> <83mvu74nhm.fsf@gnu.org> <83d1v34hba.fsf@gnu.org> <83io4u2aze.fsf@gnu.org> <8337vx3kp6.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=047d7bb04dc49b53d70525263efa X-Trace: ger.gmane.org 1448221088 7473 80.91.229.3 (22 Nov 2015 19:38:08 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 22 Nov 2015 19:38:08 +0000 (UTC) Cc: aurelien.aptel+emacs@gmail.com, tzz@lifelogs.com, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Nov 22 20:38:07 2015 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1a0aSX-0000EJ-Jp for ged-emacs-devel@m.gmane.org; Sun, 22 Nov 2015 20:38:01 +0100 Original-Received: from localhost ([::1]:57215 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a0aSX-00043T-HV for ged-emacs-devel@m.gmane.org; Sun, 22 Nov 2015 14:38:01 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:37666) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a0aST-00043M-S8 for emacs-devel@gnu.org; Sun, 22 Nov 2015 14:37:59 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a0aSS-0000o8-08 for emacs-devel@gnu.org; Sun, 22 Nov 2015 14:37:57 -0500 Original-Received: from mail-wm0-x232.google.com ([2a00:1450:400c:c09::232]:33274) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a0aSP-0000nk-Kt; Sun, 22 Nov 2015 14:37:53 -0500 Original-Received: by wmec201 with SMTP id c201so134426866wme.0; Sun, 22 Nov 2015 11:37:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-type; bh=+WNZXhyrnyzuvdz9xDXDJVorav3q6iIhD22G9WJhKOM=; b=hh0+3ICcdYhiL8w3uW6pz0obLHKPbQ5y4so+a0WlSV6BX44uwm1SySUEaRjawPkOJ1 rHhrQ/d5KyfizQs3NZ9eLaJ5vx0RQMuwx5oD+aLOCUF97zCo7b5SZ92dj+VSBhlTybO6 5UBxmr99xgaGiUPKgrf2rMMmos6PywM3/q33EbdFKTS6eUjrUy8jMZXXvCW6IByKshux svZD5Xze7wsOhwTxb4bKnhgGmqY2qbJ7vSpVizQLV7SkOZanYKxZZhqbnuksHWcPt2qm oY3s8YZBLJbPCxHXLCdPwY3c0rScHvDeYcwVTiuSNCRCFb7GDRLfKsMhrGfKyj38VC1E 6ehQ== X-Received: by 10.195.13.135 with SMTP id ey7mr26123983wjd.25.1448221072902; Sun, 22 Nov 2015 11:37:52 -0800 (PST) In-Reply-To: <8337vx3kp6.fsf@gnu.org> X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2a00:1450:400c:c09::232 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:195056 Archived-At: --047d7bb04dc49b53d70525263efa Content-Type: text/plain; charset=UTF-8 Eli Zaretskii schrieb am So., 22. Nov. 2015 um 20:20 Uhr: > > From: Philipp Stephani > > Date: Sun, 22 Nov 2015 18:19:29 +0000 > > Cc: tzz@lifelogs.com, aurelien.aptel+emacs@gmail.com, > emacs-devel@gnu.org > > > > I already suggested what we should say in the documentation: that > > these interfaces accept and produce UTF-8 encoded non-ASCII text. > > > > > > If the interface accepts UTF-8, then it must signal an error for invalid > > sequences; the Unicode standard mandates this. > > The Unicode standard cannot mandate anything for Emacs, because Emacs > is not subject to Unicode standardization. > True, but I think we shouldn't make the terminology more confusing. If we say "UTF-8", we should mean "UTF-8 as defined in the Unicode standard", not the Emacs extension of UTF-8. That's all. > > > If the interface produces UTF-8, then it must only ever produce valid > > sequences > > As I explained, this would violate the basic expectation from a text > editing program. > > > That's why I propose to not encode raw bytes as bytes, but as the Emacs > integer > > codes used to represent them. > > If we do that, no external code will be able to do anything useful > with such "bytes". Module authors will have to write their own > replacements for library functions. This will never be accepted by > our users. > I wouldn't be so pessimistic, but I was convinced by consistency with encode-coding-string. So yes, let's use the raw bytes (and document that). > > > If any byte sequence is accepted, then the behavior becomes more > complex. We > > need to exhaustively describe the behavior for any possible byte > sequence, > > otherwise module authors cannot make any assumption. > > We say that we accept valid UTF-8 encoded strings; anything else > might produce invalid UTF-8 on output. > Couldn't we just say "it behaves as if encoding and decoding were done using the utf-8-unix coding system"? Because I think that's what this boils down to. > > > No matter what we expect or tolerate, we need to state that. > > No, we don't. When the callers violate the contract, they cannot > expect to know in detail what will happen. If they want to know, they > will have to read the source. > So you want this to be unspecified or undefined behavior? That might be OK (we already have that in several places), but we still need to state what the contract is. > > > Module authors are not end users. > > They are users like anyone who writes Lisp. They came to expect that > Emacs behaves in certain ways, and modules should follow suit. > > > I agree that end users should not see errors on decoding failure, > > but modules use only programmatic access, where we can be more > > strict. > > You cannot be more strict, unless you rewrite the whole > encoding/decoding machinery, or write specialized code to detect and > reject invalid UTF-8 before it is passed to a decoder. There are no > good reasons to do either, so let's not. > > > An Emacs string is a sequence of integers. > > No, it's a sequence of bytes. > From https://www.gnu.org/software/emacs/manual/html_node/elisp/String-Basics.html : "In Emacs Lisp, characters are simply integers ... A string is a fixed sequence of characters" How a string is represented internally shouldn't be the concern of module authors. > > > I agree that we shouldn't add such limitations. But I disagree that we > should > > leave the behavior undocumented in such cases. > > OK, so let's agree to disagree. If that disagreement gets in your way > of fixing the issues related to this discussion, please say so, and I > will fix them myself > > No, I will definitely fix it. I think our disagreement is way smaller than it might look like. --047d7bb04dc49b53d70525263efa Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable


Eli Za= retskii <eliz@gnu.org> schrieb am= So., 22. Nov. 2015 um 20:20=C2=A0Uhr:
> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Sun, 22 Nov 2015 18:19:29 +0000
> Cc: tzz@lifelogs= .com, aurelien.aptel+emacs@gmail.com, emacs-devel@gnu.org
>
>=C2=A0 =C2=A0 =C2=A0I already suggested what we should say in the docum= entation: that
>=C2=A0 =C2=A0 =C2=A0these interfaces accept and produce UTF-8 encoded n= on-ASCII text.
>
>
> If the interface accepts UTF-8, then it must signal an error for inval= id
> sequences; the Unicode standard mandates this.

The Unicode standard cannot mandate anything for Emacs, because Emacs
is not subject to Unicode standardization.

<= div>True, but I think we shouldn't make the terminology more confusing.= If we say "UTF-8", we should mean "UTF-8 as defined in the = Unicode standard", not the Emacs extension of UTF-8. That's all.
=C2=A0

> If the interface produces UTF-8, then it must only ever produce valid<= br> > sequences

As I explained, this would violate the basic expectation from a text
editing program.

> That's why I propose to not encode raw bytes as bytes, but as the = Emacs integer
> codes used to represent them.

If we do that, no external code will be able to do anything useful
with such "bytes".=C2=A0 Module authors will have to write their = own
replacements for library functions.=C2=A0 This will never be accepted by our users.

I wouldn't be so pessimi= stic, but I was convinced by consistency with encode-coding-string. So yes,= let's use the raw bytes (and document that).
=C2=A0

> If any byte sequence is accepted, then the behavior becomes more compl= ex. We
> need to exhaustively describe the behavior for any possible byte seque= nce,
> otherwise module authors cannot make any assumption.

We say that we accept valid UTF-8 encoded strings; anything else
might produce invalid UTF-8 on output.

= Couldn't we just say "it behaves as if encoding and decoding were = done using the utf-8-unix coding system"? Because I think that's w= hat this boils down to.
=C2=A0

> Module authors are not end users.

They are users like anyone who writes Lisp.=C2=A0 They came to expect that<= br> Emacs behaves in certain ways, and modules should follow suit.

> I agree that end users should not see errors on decoding failure,
> but modules use only programmatic access, where we can be more
> strict.

You cannot be more strict, unless you rewrite the whole
encoding/decoding machinery, or write specialized code to detect and
reject invalid UTF-8 before it is passed to a decoder.=C2=A0 There are no good reasons to do either, so let's not.

> An Emacs string is a sequence of integers.

No, it's a sequence of bytes.

"In Emacs Lisp, characters are si= mply integers ...=C2=A0A string is a fixed sequence of characters"
How a string is represented internally shouldn't be the concern = of module authors.
=C2=A0

> I agree that we shouldn't add such limitations. But I disagree tha= t we should
> leave the behavior undocumented in such cases.

OK, so let's agree to disagree.=C2=A0 If that disagreement gets in your= way
of fixing the issues related to this discussion, please say so, and I
will fix them myself


No, I will definitely fix it. I think = our disagreement is way smaller than it might look like.=C2=A0
<= /div> --047d7bb04dc49b53d70525263efa--