From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: David Kastrup Newsgroups: gmane.emacs.devel Subject: Re: Dynamic loading progress Date: Sun, 22 Nov 2015 20:12:10 +0100 Message-ID: <87mvu597cl.fsf@fencepost.gnu.org> References: <8737w3qero.fsf@lifelogs.com> <831tbn9g9j.fsf@gnu.org> <878u5upw7o.fsf@lifelogs.com> <83ziya8xph.fsf@gnu.org> <83y4du80xo.fsf@gnu.org> <837fld6lps.fsf@gnu.org> <83si3z4s5n.fsf@gnu.org> <83mvu74nhm.fsf@gnu.org> <83d1v34hba.fsf@gnu.org> <83io4u2aze.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1448219566 15004 80.91.229.3 (22 Nov 2015 19:12:46 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 22 Nov 2015 19:12:46 +0000 (UTC) Cc: aurelien.aptel+emacs@gmail.com, Eli Zaretskii , tzz@lifelogs.com, emacs-devel@gnu.org To: Philipp Stephani Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Nov 22 20:12:44 2015 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1a0a44-0008Cx-Bs for ged-emacs-devel@m.gmane.org; Sun, 22 Nov 2015 20:12:44 +0100 Original-Received: from localhost ([::1]:57149 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a0a44-0006kL-Ls for ged-emacs-devel@m.gmane.org; Sun, 22 Nov 2015 14:12:44 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:32824) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a0a3r-0006iy-53 for emacs-devel@gnu.org; Sun, 22 Nov 2015 14:12:32 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a0a3q-0003rn-6W for emacs-devel@gnu.org; Sun, 22 Nov 2015 14:12:31 -0500 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:47337) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a0a3n-0003qw-8I; Sun, 22 Nov 2015 14:12:27 -0500 Original-Received: from localhost ([127.0.0.1]:32920 helo=lola) by fencepost.gnu.org with esmtp (Exim 4.82) (envelope-from ) id 1a0a3l-0006Vz-V5; Sun, 22 Nov 2015 14:12:26 -0500 Original-Received: by lola (Postfix, from userid 1000) id 3E5A8DF5F8; Sun, 22 Nov 2015 20:12:10 +0100 (CET) In-Reply-To: (Philipp Stephani's message of "Sun, 22 Nov 2015 18:19:29 +0000") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.0.50 (gnu/linux) X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::e X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:195051 Archived-At: Philipp Stephani writes: > Eli Zaretskii schrieb am So., 22. Nov. 2015 um 18:35 Uhr: > >> > From: Philipp Stephani >> > Date: Sun, 22 Nov 2015 09:25:08 +0000 >> > Cc: tzz@lifelogs.com, aurelien.aptel+emacs@gmail.com, >> emacs-devel@gnu.org >> > >> > > Fine with me, but how would we then represent Emacs strings that >> are not >> > valid >> > > Unicode strings? Just raise an error? >> > >> > No need to raise an error. Strings that are returned to modules >> > should be encoded into UTF-8. That encoding already takes care of >> > these situations: it either produces the UTF-8 encoding of the >> > equivalent Unicode characters, or outputs raw bytes. >> > >> > Then we should document such a situation and give module authors a way= to >> > detect them. >> >> I already suggested what we should say in the documentation: that >> these interfaces accept and produce UTF-8 encoded non-ASCII text. >> > > If the interface accepts UTF-8, then it must signal an error for invalid > sequences; the Unicode standard mandates this. > If the interface produces UTF-8, then it must only ever produce valid > sequences, this is again required by the Unicode standard. The Unicode standard does not mandate Emacs' internal string encodings. Emacs=A022 had an entirely different internal string encoding that was less convenient for a module interface. Emacs' internal encoding has the property that valid UTF-8 sequences are represented by themselves. Which is a convenience to the programmer. Not more, not less. > That's why I propose to not encode raw bytes as bytes, but as the > Emacs integer codes used to represent them. So UCS-16 instead of UTF-8? With conversions for every call? That sounds like shuffling the problem around, and shuffling does not come for free. It's ok to provide checking sequences that verify that something is a valid internal Emacs string (which includes more than Unicode) and flag an error if it isn't. Also that something is a valid UTF-8 string if that is desired (and which will in case of error optionally either convert to the Emacs-internal representation or flag an error). But making each call gate automatically verify every string for UTF-8 compliance is both wasteful as well as making it impossible to process generic Emacs strings in an external module. --=20 David Kastrup