From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Dynamic loading progress Date: Sun, 22 Nov 2015 21:20:21 +0200 Message-ID: <8337vx3kp6.fsf@gnu.org> References: <83k2ptq5t3.fsf@gnu.org> <87h9kxx60e.fsf@lifelogs.com> <877flswse5.fsf@lifelogs.com> <8737wgw7kf.fsf@lifelogs.com> <87io5bv1it.fsf@lifelogs.com> <87egfzuwca.fsf@lifelogs.com> <876118u6f2.fsf@lifelogs.com> <8737w3qero.fsf@lifelogs.com> <831tbn9g9j.fsf@gnu.org> <878u5upw7o.fsf@lifelogs.com> <83ziya8xph.fsf@gnu.org> <83y4du80xo.fsf@gnu.org> <837fld6lps.fsf@gnu.org> <83si3z4s5n.fsf@gnu.org> <83mvu74nhm.fsf@gnu.org> <83d1v34hba.fsf@gnu.org> <83io4u2aze.fsf@gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org X-Trace: ger.gmane.org 1448220066 23723 80.91.229.3 (22 Nov 2015 19:21:06 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 22 Nov 2015 19:21:06 +0000 (UTC) Cc: aurelien.aptel+emacs@gmail.com, tzz@lifelogs.com, emacs-devel@gnu.org To: Philipp Stephani Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Nov 22 20:20:57 2015 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1a0aBw-0007h5-A8 for ged-emacs-devel@m.gmane.org; Sun, 22 Nov 2015 20:20:52 +0100 Original-Received: from localhost ([::1]:57173 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a0aBw-0001HM-FN for ged-emacs-devel@m.gmane.org; Sun, 22 Nov 2015 14:20:52 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:34712) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a0aBf-0001H3-4m for emacs-devel@gnu.org; Sun, 22 Nov 2015 14:20:36 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a0aBb-0005WH-2e for emacs-devel@gnu.org; Sun, 22 Nov 2015 14:20:35 -0500 Original-Received: from mtaout22.012.net.il ([80.179.55.172]:41828) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a0aBa-0005Vt-D3 for emacs-devel@gnu.org; Sun, 22 Nov 2015 14:20:30 -0500 Original-Received: from conversion-daemon.a-mtaout22.012.net.il by a-mtaout22.012.net.il (HyperSendmail v2007.08) id <0NY800900DIJN700@a-mtaout22.012.net.il> for emacs-devel@gnu.org; Sun, 22 Nov 2015 21:20:28 +0200 (IST) Original-Received: from HOME-C4E4A596F7 ([84.94.185.246]) by a-mtaout22.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NY8009FMDQ3DP90@a-mtaout22.012.net.il>; Sun, 22 Nov 2015 21:20:28 +0200 (IST) In-reply-to: X-012-Sender: halo1@inter.net.il X-detected-operating-system: by eggs.gnu.org: Solaris 10 X-Received-From: 80.179.55.172 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:195054 Archived-At: > From: Philipp Stephani > Date: Sun, 22 Nov 2015 18:19:29 +0000 > Cc: tzz@lifelogs.com, aurelien.aptel+emacs@gmail.com, emacs-devel@gnu.org > > I already suggested what we should say in the documentation: that > these interfaces accept and produce UTF-8 encoded non-ASCII text. > > > If the interface accepts UTF-8, then it must signal an error for invalid > sequences; the Unicode standard mandates this. The Unicode standard cannot mandate anything for Emacs, because Emacs is not subject to Unicode standardization. > If the interface produces UTF-8, then it must only ever produce valid > sequences As I explained, this would violate the basic expectation from a text editing program. > That's why I propose to not encode raw bytes as bytes, but as the Emacs integer > codes used to represent them. If we do that, no external code will be able to do anything useful with such "bytes". Module authors will have to write their own replacements for library functions. This will never be accepted by our users. > If any byte sequence is accepted, then the behavior becomes more complex. We > need to exhaustively describe the behavior for any possible byte sequence, > otherwise module authors cannot make any assumption. We say that we accept valid UTF-8 encoded strings; anything else might produce invalid UTF-8 on output. > No matter what we expect or tolerate, we need to state that. No, we don't. When the callers violate the contract, they cannot expect to know in detail what will happen. If they want to know, they will have to read the source. > Module authors are not end users. They are users like anyone who writes Lisp. They came to expect that Emacs behaves in certain ways, and modules should follow suit. > I agree that end users should not see errors on decoding failure, > but modules use only programmatic access, where we can be more > strict. You cannot be more strict, unless you rewrite the whole encoding/decoding machinery, or write specialized code to detect and reject invalid UTF-8 before it is passed to a decoder. There are no good reasons to do either, so let's not. > An Emacs string is a sequence of integers. No, it's a sequence of bytes. > I agree that we shouldn't add such limitations. But I disagree that we should > leave the behavior undocumented in such cases. OK, so let's agree to disagree. If that disagreement gets in your way of fixing the issues related to this discussion, please say so, and I will fix them myself Thanks.