From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mark H Weaver Subject: bug#33848: Store references in SBCL-compiled code are "invisible" Date: Mon, 24 Dec 2018 13:12:23 -0500 Message-ID: <87tvj2yesd.fsf@netris.org> References: <87r2e8jpfx.fsf@gnu.org> <877eg0i43j.fsf@netris.org> <87d0psi1xo.fsf@gnu.org> <874lb3kin6.fsf@ambrevar.xyz> <87sgynezha.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([208.118.235.92]:54116) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gbUjx-0004Tm-Ck for bug-guix@gnu.org; Mon, 24 Dec 2018 13:14:10 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gbUjt-0007mW-JQ for bug-guix@gnu.org; Mon, 24 Dec 2018 13:14:09 -0500 Received: from debbugs.gnu.org ([208.118.235.43]:46898) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1gbUjq-0007k8-63 for bug-guix@gnu.org; Mon, 24 Dec 2018 13:14:03 -0500 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1gbUjp-0002Qg-VI for bug-guix@gnu.org; Mon, 24 Dec 2018 13:14:02 -0500 Sender: "Debbugs-submit" Resent-Message-ID: In-Reply-To: <87sgynezha.fsf@gnu.org> ("Ludovic \=\?utf-8\?Q\?Court\=C3\=A8s\=22'\?\= \=\?utf-8\?Q\?s\?\= message of "Mon, 24 Dec 2018 16:06:09 +0100") List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org Sender: "bug-Guix" To: Ludovic =?UTF-8?Q?Court=C3=A8s?= Cc: Pierre Neidhardt , 33848@debbugs.gnu.org Hi Ludovic, Ludovic Court=C3=A8s writes: > Pierre Neidhardt skribis: > >>> For now I lean towards looking for a way to address the issue >>> specifically for SBCL. >> >> Don't forget that we currently have 5 Lisp compilers. >> Besides, it's not clear that this can be fixed on the compiler's side, i= t could >> very well be that patches will be required on a per-project basis. > > I know little about CL but maybe we can find a solution that works for > all five compilers. At least that would be the first approach I would > suggest following. I can't imagine a solution that would work for all five compilers, but perhaps that's a failure of imagination on my part. Of course, you're welcome to search for such a solution. Can you give me a rough outline of what you have in mind? Of course, the usual reason to choose UTF-32 is to support non-ASCII characters while retaining fixed-width code points, so that string lookups are straightforward and efficient. Using UTF-8 improves space efficiency, but at the cost of extra code complexity. That extra complexity is what I guess we would need to add to each program that currently uses UTF-32. Alternatively, we could extend the on-disk format to support UTF-8 and then add some kind of "load hook" that converts the string to UTF-32 at load time. Either way, it's likely to be a can of worms. Consider the case of Guile. Years ago we agreed to switch to UTF-8 as its sole internal string encoding, but it hasn't yet been done because it's a big job, even for those of us already intimately familiar with the code. Now imagine how hard it would be for someone who barely uses Guile, but nevertheless felt compelled to change our internal string representation to use UTF-8. Moreover, imagine that they hoped to find a single solution that would work for several different Scheme implementations. What would you say to them if they proposed to find a general solution to convert several Scheme implementations to use UTF-8 as their string representation, to save themselves the trouble of having to understand each implementation individually? I really think it would be a mistake to try to force every program and language implementation to use our preferred string representation. I suspect it would be vastly easier to compromise and support a few other popular string representations in Guix, namely UTF-16 and UTF-32. If you don't want to change the daemon, it could be worked around in our build-side code as follows: we could add a new phase to certain build systems (or possibly gnu-build-system) that scans each output for UTF-16/32 encoded store references that are never referenced in UTF-8. If such references exist, a file with an unobtrusive name would be added to that output containing those references encoded in UTF-8. This would enable our daemon's existing reference scanner to find all of the references. Our grafting code would then need to be extended to recognize and transform store references encoded in UTF-16/32 as well as UTF-8. What do you think? Regards, Mark