Hi Leo, On Tue, 03 Nov 2020 14:41:31 +0100 Leo Prikler wrote: > > (note: "-l guix.scm") > > > > seems to have fixed most of the problems. > > (There is no automated diagnostic--so who knows whether it did fix > > them for real?) > What diagnostic would you want here? Whether there exist packages with the same name but different derivations in the environment (especially if mixing hidden and non-hidden packages with the same name). Also, there's probably some kind of environment build file list printing in guix. Using it, it could list libgobjects are in the environment without me having to use a LD_PRELOAD to find which they are... Does it exist? If so, how to use it? > Is that really the problem at hand? I don't see glib explicitly > mentioned here, so you should not build glib-with-documentation here. > Adding glib to one of your --ad-hoc "chains" should yield a different > profile. It's not that direct--but yes, something like that is indeed the problem at hand. If you want more detail, the guile-gi bug report is pretty detailed on it. Also, better, the reason I put the "historical" section at the end of my previous post is that it is reproducible later, so we can find out what exactly happened here. Following that you will get exactly the same situation--and using (only) the LD_PRELOAD discussed on guile-gi, you will find the culprit, and you will see that it is two different libgobjects being loaded. We are now in a better situation because I do all my development using version control--even throwaway stuff. guile-gi devs had this happen before--but their reproducers got lost to time. Now we have reproducers that cause it every single time (both the historical version of guix-gui mentioned and also file test/insanity.scm in guile-gi by now). > What is the meaning of "doesn't like" here? Is it explicitly > discouraged, poorly supported, work in progress, ...? It means it does not work, and given what they are doing in the source code GNOME evidently did not think about this case at all. And at runtime using guix environment, this does (pretty much always) cause problems without any special things to be done to cause it. You can read a lot of details on the bug report on the guile-gi website. > > This was both my fault, and guix's fault for being VERY obtuse. > I don't think looking for someone to blame is particularly useful. We are engineers--so finding out what is to blame for a problem and fixing that place is indeed what we do all day (that includes deciding which projects should have the change, and which shouldn't. If the answer is "none" the situation will not improve. So that's right out). If you mean blaming people, I don't do that--that would be silly. "Blaming" programs I will do all day every day (including mine). Problems like this need to be seen, otherwise us guix devs don't know that it's like this. I know it from my own stuff I put into Guix master. *I* did get those to work by configuring stuff. But is it usable for a regular user? For this guix environment stuff, the answer is a definite no. (there's another thread on the guix-devel list about improving that--so it's not like I'm the only one--it's just a bad interface. It happens.) >Regarding the safety aspect, I don't think that's what they're > going for. They might be thinking, that "--pure --no-grafts" is close > enough to what you'd get inside `guix build`, so that you don't have to > use `guix build` for debugging. If grafts are broken (which seems likely according to guile-gi devs), grafting should be fixed now, before a security problem happens and we miss it (...but state that we fixed it). I know that guile-gi argues for doing development with --no-grafts, but (1) I don't have any readable characters when I do that (2) If I did, it would still not help the end user because it is not the right fix > > > Do you still encounter Guile-GI#96 after packaging? > > > > I've not finished packaging it yet (I tried--but that's pretty > > difficult, > > too! Help wanted--see README and guix.scm in the guix-gui repo). > I'll see what I can do to help, but the only thing I can see are > "FIXME" strings in guix.scm. The README even suggests to run `guix > build`. Thanks! If you do the guix build and then run the final script, then you will see that it does not start up because it cannot find guile modules. I am not knowledgeable how to deploy large guile programs, so I'd appreciate help in doing that. (I can totally find a hacky way to make it work--but I'd rather use the canonical way to find external guile modules that surely exists, especially in guix (... which is written in guile and is a large program with external guile modules)) > One could disagree about that based on the workflow you've posted here > ;) I can assure you this is what anyone would try, until he gets burned like this and then does some of (1) gives up and deletes guix (2) posts a bad review to the media (3) posts bug reports detailing what happened (4) uses another way than "guix environment" and it finally works (but does that really do the right thing? Who knows...) I prefer people do (3) and (4). Ideally, getting information for (3) would be automatic (at least more automatic than " is not compatible with ". Oh yeah? It's not compatible? Good to know ;) Also, guile-gi devs always used "--pure -l guix.scm" and it still happens to them! > Jokes aside, the Guile-GI devs seem to be hitting a similar issue, but > as they write, it appears to mainly affect their own development > environments (i.e. exactly the thing that hurts you the most as a > developer). There is potentially a discrepancy between `guix build` > and `guix environment`. Apparently so. Note that with the current state of guix (no gui...) we mainly can target only developers--so if the story for developers is bad, well, that's bad for guix. > Having a look at Guile-GI specifically, ldd `guix build guile-gi` has > the following: > libgobject-2.0.so.0 => /gnu/store/xa1vfhfc42x655hi7vxqmbyvwldnz7r0- > glib-2.62.6/lib/libgobject-2.0.so.0 > > You should recognize that hash by now ;) Definitely. With how often in different contexts I saw gobject hashes by now, I'll probably dream about those ;) > You can eliminate a huge array of those issues by using pure > environments. As I said, "guix environment --pure" did not fix all (or even many) of these problems. I added "--pure" very soon--and WITH the "--pure" it still loads two different libgobjects. Using "-l" helps tremendously. "--pure"? Nope. > I'm not sure whether they must specifically enforce one and only one > type registry or whether those can be made to coexist. You should > probably talk to gobject-introspection about this. Yeah. But I will see what guile-gi devs have to say about it first. > It would also make it extremely easy to shoot yourself in the foot, > thereby costing you even more time. It's like shooting flies with a > cannon. Sure, you might kill some bugs, but you can just as well tear > down castle walls with that. I don't see what is even controversal with it. The same duplicate symbol checking *always* happens with regular linking. So if you now dlopen stuff that usually is regularily linked into C programs, the latter C program developers already ensured that the libraries don't have duplicate symbols (in a certain sense), otherwise the libraries wouldn't link with the C programs in the first place. I'm only suggesting for dlopen to to do the same thing in diagnostics mode. Also, I'm not suggesting to always enable the feature--but it should *exist*. But you are right, an easier fix would be guix environment just warning about packages that have the same name but are different packages in the same environment. That way you wouldn't have to care about linker stuff at all. Bad debugging features are the achilles heel of guix--and it's difficult for me to justify spending time on guix when every time there's something to debug it is so bad. Examples: (1) There is no debug info by default. Ludo fixed that in that now one can use package transformations to build certain packages with debug info without having to recompile a guix checkout. Nice. (previously, you had to check out guix, manually edit packages (first find out how), then compile it (which is not easy) and then use ./pre-inst-env to rebuild whatever you wanted to debug in the first place so you get line number information when one of the programs segfaults and you are in gdb. Think joe regular dev would do that when collecting info for reporting bugs? I don't) (2) guix guile backtraces always just say eval.scm and not the actual source file or line something happened in. That is still the case. I basically use divide and conquer "write"s in order to find out where in my program the actual line it is talking about is. (3) guix environment and the actual guix build daemon set up slightly different enviroments. That way, it can happen that a bug that happens inside the guix build container doesn't happen when you search for it using guix environment. > I'm not sure how sane loading another version of GObject is in other GI > implementations. Not sane at all. That duplicate loading of libraries with the same basename is not supposed to be done. I mean it's nice if someone gets it to work anyway--but I suspect that that won't fly with gobject's runtime type checking. > I don't think I can help you with that here, but if you have something > usable for debugging, like a stack trace, it might be worth to ask > around Guile-GI (assuming it is not just a repetition of their #96). The problem is that the software stack does not fail correctly, and somewhere the software stack needs to be adapted so that it fails instead of dragging the error state all the way through the entire profile without it being detected. There are only usable stacktraces when you go out of your way using LD_PRELOAD. That is not a tenable way. I want to be clear that guile-gi is not at fault, by which I mean that this part of the software stack should not be changed. If you did, gobject-introspection would still do it "wrong". If you changed that, the next GNOME library over there would STILL do it "wrong". That is not the way to fix it. Also, what is "fixing it"? Is it preventing duplicate loading for good or is it changing 29 libraries upstream to support duplicate loading? Or is it warning if it happens? All of those in different points of time? With this, I hope to at least have raised awareness and started the discussion so progress can be made. (For example, it makes not much sense to bother guile-gi devs with a problem that guile-gi did not cause. gobject-introspection has the exact same design (and is a dependency of guile-gi), so everyone who uses gobject-introspection inherits that limitation. The guile-gi people are just nice enough to look at it anyway--even though it's not something they can unilaterally reasonably fix anyway)