From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Philipp Stephani Newsgroups: gmane.emacs.devel Subject: Re: Dynamic loading progress Date: Sat, 21 Nov 2015 12:11:45 +0000 Message-ID: References: <83k2ptq5t3.fsf@gnu.org> <87h9kxx60e.fsf@lifelogs.com> <877flswse5.fsf@lifelogs.com> <8737wgw7kf.fsf@lifelogs.com> <87io5bv1it.fsf@lifelogs.com> <87egfzuwca.fsf@lifelogs.com> <876118u6f2.fsf@lifelogs.com> <8737w3qero.fsf@lifelogs.com> <831tbn9g9j.fsf@gnu.org> <878u5upw7o.fsf@lifelogs.com> <83ziya8xph.fsf@gnu.org> <83y4du80xo.fsf@gnu.org> <837fld6lps.fsf@gnu.org> <83si3z4s5n.fsf@gnu.org> <83mvu74nhm.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=089e0102def4df6bf505250be57d X-Trace: ger.gmane.org 1448107931 6884 80.91.229.3 (21 Nov 2015 12:12:11 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 21 Nov 2015 12:12:11 +0000 (UTC) Cc: aurelien.aptel+emacs@gmail.com, tzz@lifelogs.com, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Nov 21 13:12:06 2015 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1a071R-0002kY-UP for ged-emacs-devel@m.gmane.org; Sat, 21 Nov 2015 13:12:06 +0100 Original-Received: from localhost ([::1]:52033 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a071R-0005eV-CC for ged-emacs-devel@m.gmane.org; Sat, 21 Nov 2015 07:12:05 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:56684) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a071M-0005cQ-3i for emacs-devel@gnu.org; Sat, 21 Nov 2015 07:12:01 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a071K-0004fM-3c for emacs-devel@gnu.org; Sat, 21 Nov 2015 07:11:59 -0500 Original-Received: from mail-wm0-x235.google.com ([2a00:1450:400c:c09::235]:33881) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a071H-0004f7-Mz; Sat, 21 Nov 2015 07:11:55 -0500 Original-Received: by wmvv187 with SMTP id v187so104565024wmv.1; Sat, 21 Nov 2015 04:11:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-type; bh=X/C8EPfCj7UHhYdf8Jbmn4fkgNYoQCLjdC99CvNugM0=; b=ISwXSas4ONWHd6QdxDt0+m+T8i3w/44INWAbMTgXmdkNDhPTc1Klb9JCAuXMkXyrDU OKpVnwj3Z8EbEyLQz/EMGXTGm2ljRl/JVTCquuvhSN+Elf00V2kbyVwCqkY+08a7GRIk vMLNyLxgLW2dHSKqNbOAZrhGLWPbds492A4r42DUAMochOwD7ee1IdUEEAC4lsqv8zMc TjeC9D2vUGI3g4fYJ7Dky+tKxcFCYv/u1Gxwtgdf5ot6BVWJvEtrISTFZG0Bj5VG7VQq y8Hf0qTrsa0eyJULVKSCjXlPD61rTwGtZLo/OsNK3FiX5JqQtbEeSbW8tHiNG6kUlRB/ h1YQ== X-Received: by 10.194.87.39 with SMTP id u7mr20234164wjz.11.1448107915043; Sat, 21 Nov 2015 04:11:55 -0800 (PST) In-Reply-To: <83mvu74nhm.fsf@gnu.org> X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2a00:1450:400c:c09::235 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:194953 Archived-At: --089e0102def4df6bf505250be57d Content-Type: text/plain; charset=UTF-8 Eli Zaretskii schrieb am Sa., 21. Nov. 2015 um 12:10 Uhr: > > (Btw, I don't think we should worry about changing the internal > > representation of characters in Emacs, because make_multibyte_string > > will be updated as needed.) > > > > This is a crucial point. If the internal encoding never changes, then we > can > > declare that those string parameters are expected to be in the internal > > encoding. > > No, we cannot, or rather should not. It is unreasonable to expect > external modules to know the intricacies of the internal > representation. Most Emacs hackers don't. > Fine with me, but how would we then represent Emacs strings that are not valid Unicode strings? Just raise an error? > > > But see the discussion in > > https://github.com/aaptel/emacs-dynamic-module/issues/37: the comment in > > mule-conf.el seems to indicate that the internal encoding is not stable. > > That discussion is about zero-copy access to Emacs buffer text and > Emacs strings inside module code. Partially, the encoding discussion is also part of that because it's required to specify the encoding before zero-copy access is even possible. > Such access is indeed impossible > without either knowing _something_ about the internal representation, > or having additional APIs in emacs-module.c that allow modules such > access while hiding the details of the internal representation. We > could discuss extending the module functionality to include this. > > Yes, there's no need for that in this subthread though. > But that is a separate issue from what module_make_function and > module_make_string do. These two functions are basic, and don't need > to know about the internal representation or use it. While direct > access to Emacs buffer text will be needed by only some modules, > module_make_function will be used by all of them, and > module_make_string by many. > > So I think we shouldn't conflate these two issues; they are separate. > OK. > > > This is what my comments were about. I think that you, by contrast, > > are talking about the encoding of the _input_ strings, in this case > > the 'documentation' argument to module_make_function and 'str' > > argument to module_make_string. My assumption was that these > > arguments will always have to be in UTF-8 encoding; if that > assumption > > is true, then no decoding via code_convert_string_norecord is > > necessary, since make_multibyte_string will DTRT. We can (and > > probably should) document the fact that all non-ASCII strings must be > > UTF-8 encoded as a requirement of the emacs-module interface. > > > > Or rather, an extension to UTF-8 capable of encoding surrogate code > points and > > numbers that are not code points, as described in > > > https://www.gnu.org/software/emacs/manual/html_node/elisp/Text-Representations.html > . > > No, I meant strict UTF-8, not its Emacs extension. > That would be possible and provide a clean interface. However, Emacs strings are extended, so we'd need to specify how they interact with UTF-8 strings. - If a module passes a char sequence that's not a valid UTF-8 string, but a valid Emacs multibyte string, what should happen? Error, undefined behavior, silently accepted? - If copy_string_contents is passed an Emacs string that is not a valid Unicode string, what should happen? Error, or should the internal representation be silently leaked? > > If it's stable, we can use make_multibyte_string; if not, we can > > only use make_unibyte_string. > > If the arguments strings are in strict UTF-8, then > make_multibyte_string will DTRT automagically, no matter what the > internal representation is. That is their contract. > OK, then we can use that, of course. The question of handling invalid UTF-8 strings is still open, though, as make_multibyte_string doesn't enforce valid UTF-8. If it's the contract of make_multibyte_string that it will always accept UTF-8, then that should be added as a comment to that function. Currently I don't see it documented anywhere. --089e0102def4df6bf505250be57d Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable


Eli Za= retskii <eliz@gnu.org> schrieb am= Sa., 21. Nov. 2015 um 12:10=C2=A0Uhr:
>=C2=A0 =C2=A0 =C2=A0(Btw, I don't think we should worry about c= hanging the internal
>=C2=A0 =C2=A0 =C2=A0representation of characters in Emacs, because make= _multibyte_string
>=C2=A0 =C2=A0 =C2=A0will be updated as needed.)
>
> This is a crucial point. If the internal encoding never changes, then = we can
> declare that those string parameters are expected to be in the interna= l
> encoding.

No, we cannot, or rather should not.=C2=A0 It is unreasonable to expect
external modules to know the intricacies of the internal
representation.=C2=A0 Most Emacs hackers don't.
Fine with me, but how would we then represent Emacs strings th= at are not valid Unicode strings? Just raise an error?
=C2=A0

> But see the discussion in
> https://github.com/aaptel/emacs-dynamic= -module/issues/37: the comment in
> mule-conf.el seems to indicate that the internal encoding is not stabl= e.

That discussion is about zero-copy access to Emacs buffer text and
Emacs strings inside module code.

Partially= , the encoding discussion is also part of that because it's required to= specify the encoding before zero-copy access is even possible.
= =C2=A0
=C2=A0 Such access is indeed imp= ossible
without either knowing _something_ about the internal representation,
or having additional APIs in emacs-module.c that allow modules such
access while hiding the details of the internal representation.=C2=A0 We could discuss extending the module functionality to include this.


Yes, there's no need for that in t= his subthread though.
=C2=A0
But that is a separate issue from what module_make_function and
module_make_string do.=C2=A0 These two functions are basic, and don't n= eed
to know about the internal representation or use it.=C2=A0 While direct
access to Emacs buffer text will be needed by only some modules,
module_make_function will be used by all of them, and
module_make_string by many.

So I think we shouldn't conflate these two issues; they are separate.

OK.
=C2=A0

>=C2=A0 =C2=A0 =C2=A0This is what my comments were about. I think that y= ou, by contrast,
>=C2=A0 =C2=A0 =C2=A0are talking about the encoding of the _input_ strin= gs, in this case
>=C2=A0 =C2=A0 =C2=A0the 'documentation' argument to module_make= _function and 'str'
>=C2=A0 =C2=A0 =C2=A0argument to module_make_string. My assumption was t= hat these
>=C2=A0 =C2=A0 =C2=A0arguments will always have to be in UTF-8 encoding;= if that assumption
>=C2=A0 =C2=A0 =C2=A0is true, then no decoding via code_convert_string_n= orecord is
>=C2=A0 =C2=A0 =C2=A0necessary, since make_multibyte_string will DTRT. W= e can (and
>=C2=A0 =C2=A0 =C2=A0probably should) document the fact that all non-ASC= II strings must be
>=C2=A0 =C2=A0 =C2=A0UTF-8 encoded as a requirement of the emacs-module = interface.
>
> Or rather, an extension to UTF-8 capable of encoding surrogate code po= ints and
> numbers that are not code points, as described in
> https://www.= gnu.org/software/emacs/manual/html_node/elisp/Text-Representations.html= .

No, I meant strict UTF-8, not its Emacs extension.
That would be possible and provide a clean interface. However, = Emacs strings are extended, so we'd need to specify how they interact w= ith UTF-8 strings.
  • If a module passes a char sequence tha= t's not a valid UTF-8 string, but a valid Emacs multibyte string, what = should happen? Error, undefined behavior, silently accepted?
  • If cop= y_string_contents is passed an Emacs string that is not a valid Unicode str= ing, what should happen? Error, or should the internal representation be si= lently leaked?
=C2=A0
> If it's stable, we can use make_multibyte_string; if not, we ca= n
> only use make_unibyte_string.

If the arguments strings are in strict UTF-8, then
make_multibyte_string will DTRT automagically, no matter what the
internal representation is.=C2=A0 That is their contract.
<= div>
OK, then we can use that, of course. The question of han= dling invalid UTF-8 strings is still open, though, as make_multibyte_string= doesn't enforce valid UTF-8.
If it's the contract of mak= e_multibyte_string that it will always accept UTF-8, then that should be ad= ded as a comment to that function. Currently I don't see it documented = anywhere.=C2=A0
--089e0102def4df6bf505250be57d--