From mboxrd@z Thu Jan  1 00:00:00 1970
From: Mark H Weaver <mhw@netris.org>
Subject: bug#33848: Store references in SBCL-compiled code are "invisible"
Date: Mon, 24 Dec 2018 13:12:23 -0500
Message-ID: <87tvj2yesd.fsf@netris.org>
References: <87r2e8jpfx.fsf@gnu.org> <877eg0i43j.fsf@netris.org>
	<87d0psi1xo.fsf@gnu.org> <874lb3kin6.fsf@ambrevar.xyz>
	<87sgynezha.fsf@gnu.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Return-path: <bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org>
Received: from eggs.gnu.org ([208.118.235.92]:54116)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1gbUjx-0004Tm-Ck
	for bug-guix@gnu.org; Mon, 24 Dec 2018 13:14:10 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1gbUjt-0007mW-JQ
	for bug-guix@gnu.org; Mon, 24 Dec 2018 13:14:09 -0500
Received: from debbugs.gnu.org ([208.118.235.43]:46898)
	by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)
	(Exim 4.71) (envelope-from <Debian-debbugs@debbugs.gnu.org>)
	id 1gbUjq-0007k8-63
	for bug-guix@gnu.org; Mon, 24 Dec 2018 13:14:03 -0500
Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1gbUjp-0002Qg-VI
	for bug-guix@gnu.org; Mon, 24 Dec 2018 13:14:02 -0500
Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
Resent-Message-ID: <handler.33848.B33848.15456752189302@debbugs.gnu.org>
In-Reply-To: <87sgynezha.fsf@gnu.org> ("Ludovic
	\=\?utf-8\?Q\?Court\=C3\=A8s\=22'\?\=
	\=\?utf-8\?Q\?s\?\= message of "Mon, 24 Dec 2018 16:06:09 +0100")
List-Id: Bug reports for GNU Guix <bug-guix.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/bug-guix>,
	<mailto:bug-guix-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/bug-guix/>
List-Post: <mailto:bug-guix@gnu.org>
List-Help: <mailto:bug-guix-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/bug-guix>,
	<mailto:bug-guix-request@gnu.org?subject=subscribe>
Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org
Sender: "bug-Guix" <bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org>
To: Ludovic =?UTF-8?Q?Court=C3=A8s?= <ludo@gnu.org>
Cc: Pierre Neidhardt <mail@ambrevar.xyz>, 33848@debbugs.gnu.org

Hi Ludovic,

Ludovic Court=C3=A8s <ludo@gnu.org> writes:

> Pierre Neidhardt <mail@ambrevar.xyz> skribis:
>
>>> For now I lean towards looking for a way to address the issue
>>> specifically for SBCL.
>>
>> Don't forget that we currently have 5 Lisp compilers.
>> Besides, it's not clear that this can be fixed on the compiler's side, i=
t could
>> very well be that patches will be required  on a per-project basis.
>
> I know little about CL but maybe we can find a solution that works for
> all five compilers.  At least that would be the first approach I would
> suggest following.

I can't imagine a solution that would work for all five compilers, but
perhaps that's a failure of imagination on my part.  Of course, you're
welcome to search for such a solution.  Can you give me a rough outline
of what you have in mind?

Of course, the usual reason to choose UTF-32 is to support non-ASCII
characters while retaining fixed-width code points, so that string
lookups are straightforward and efficient.  Using UTF-8 improves space
efficiency, but at the cost of extra code complexity.  That extra
complexity is what I guess we would need to add to each program that
currently uses UTF-32.  Alternatively, we could extend the on-disk
format to support UTF-8 and then add some kind of "load hook" that
converts the string to UTF-32 at load time.  Either way, it's likely to
be a can of worms.

Consider the case of Guile.  Years ago we agreed to switch to UTF-8 as
its sole internal string encoding, but it hasn't yet been done because
it's a big job, even for those of us already intimately familiar with
the code.

Now imagine how hard it would be for someone who barely uses Guile, but
nevertheless felt compelled to change our internal string representation
to use UTF-8.  Moreover, imagine that they hoped to find a single
solution that would work for several different Scheme implementations.

What would you say to them if they proposed to find a general solution
to convert several Scheme implementations to use UTF-8 as their string
representation, to save themselves the trouble of having to understand
each implementation individually?

I really think it would be a mistake to try to force every program and
language implementation to use our preferred string representation.  I
suspect it would be vastly easier to compromise and support a few other
popular string representations in Guix, namely UTF-16 and UTF-32.

If you don't want to change the daemon, it could be worked around in our
build-side code as follows: we could add a new phase to certain build
systems (or possibly gnu-build-system) that scans each output for
UTF-16/32 encoded store references that are never referenced in UTF-8.
If such references exist, a file with an unobtrusive name would be added
to that output containing those references encoded in UTF-8.  This would
enable our daemon's existing reference scanner to find all of the
references.

Our grafting code would then need to be extended to recognize and
transform store references encoded in UTF-16/32 as well as UTF-8.

What do you think?

      Regards,
        Mark