Hi Paul, Hi Janneke, On Thu, 07 Jan 2021 11:43:39 +0100 Jan Nieuwenhuizen wrote: > > This is really exciting, great job Jan! Do you need any help, e.g. on > > the ARM side, or with optimising the integration? > Thanks! We really could use help with this (Danny?). @Paul: Yeah, help would be nice! We are now pretty far along in bootstrapping--which means you'd ideally set up an armhf machine, install guix on it and then debug gawk-mesboot0 using that in order to help. Can you do that? A writeup of how to debug a current problem we are facing follows. If you can help with fixing that problem, please do :) We currently are at mescc -> tinycc -> tinycc -> tinycc -> gawk-mesboot0 or so. The next steps for us are: * Debug why gawk-mesboot0 doesn't work correctly (see below--and Janneke's E-Mail) * That will entail enabling gdb to work on tcc (TinyCC) executables and/or reading ARM assembly. Since the error does not cause a failure immediately, it will be difficult to pinpoint exactly where the error was introduced. To reproduce the bug: ./gawk 'BEGIN { i = 5; ++i; print(i) }' 5 (yes, it prints 5. Same happens with "--". But i += 1 works fine. So does i *= 2). You will need Guix on armhf. I do NOT recommend using an aarch64 machine with an aarch64 kernel for the time being (I'll fix it eventually--but right now that's a really bad idea with Guix). > To paint the > picture for people listening in: Next to ARM it may need some Guix > skills, or even more work to reproduce our experimental ARM bootstrap > outside of Guix. > A very short summary of where we are, on wip-arm-bootstrap, building > gawk-mesboot0 > > --8<---------------cut here---------------start------------->8--- > ./pre-inst-env guix build -e '(@@ (gnu packages commencement) gawk-mesboot0)' > --8<---------------cut here---------------end--------------->8--- > > produces a gawk binary that fails to increment a variable > > --8<---------------cut here---------------start------------->8--- > 11:35:59 janneke@novena:~/src/wip-arm-bootstrap [env] > $ /gnu/store/f756xxxqj3mabaax5r4ldrxh01a9q54r-gawk-mesboot0-3.0.0/bin/gawk -f add.awk add.awk > i=1 > i=2 > 11:36:14 janneke@novena:~/src/wip-arm-bootstrap [env] > $ /gnu/store/f756xxxqj3mabaax5r4ldrxh01a9q54r-gawk-mesboot0-3.0.0/bin/gawk -f inc.awk inc.awk > i= 0 > i= 0 > 11:36:27 janneke@novena:~/src/wip-arm-bootstrap [env] > --8<---------------cut here---------------end--------------->8--- > > So could be anything, could bin in tcc-boot or in gawk-mesboot0... Next steps: * find function in gawk 3.0.0 that interprets "++" (yacc grammar), * find out why it doesn't work * find out why tcc miscompiles it In order to read the source of the gawk used, invoke guix build -S -e '(@@ (gnu packages commencement) gawk-mesboot0)' . In there, the yacc grammar is in awk.y (and the generated parser is in awktab.c). The AST node types generated for "++" and "--" are Node_preincrement and Node_predecrement, respectively. The evaluator is in eval.c (@janneke: and it uses setjmp. At least longjmp seems not to be entered for our gawk problem here... phiew). The place where Node_preincrement is finally handled is in op_assign in the latter file, which does: case Node_preincrement: case Node_predecrement: unref(*lhs); *lhs = make_number(lval + (tree->type == Node_preincrement ? 1.0 : -1.0)); tree->lnode->flags |= SCALAR; if (after_assign) (*after_assign)(); return *lhs; The case that does work (+= 1) is handled in the same function: lval = force_number(*lhs); rval = force_number(tmp); [...] case Node_assign_plus: *lhs = make_number(lval + rval); break; Debugging gawk on armhf via gdb --args 'BEGIN { ++i; print(i) }' I get: eval.c │ >857 *lhs = make_number(lval + │ │ 858 (tree->type == Node_preincrement ? 1.0 : -1.0)); │ And in asm, after it ascertained that tree->type == Node_preincrement, it does: │ 0x15598 ldr lr, [pc] ; 0x155a0 │ │ 0x1559c b 0x155a4 │ │ 0x155a0 CONSTANT 0 │ 0x155a4 vldr d0, [lr] │ │ 0x155a8 vldr d1, [r11, #-16] │ │ >0x155ac vadd.f64 d0, d1, d0 │ │ 0x155b0 vpush {d0} │ │ 0x155b4 mov r2, #97 ; 0x61 │ │ 0x155b8 pop {r0, r1} │ │ 0x155bc bl 0x23208 │ Registers before the vadd are: d0 {u8 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, u16 = {0x0, 0x0, 0x0, 0x0}, u32 = {0x0, 0x0}, u64 = 0x0, f32 = {0x0, 0x0}, f64 = 0x0} d1 {u8 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, u16 = {0x0, 0x0, 0x0, 0x0}, u32 = {0x0, 0x0}, u64 = 0x0, f32 = {0x0, 0x0}, f64 = 0x0} And [r11, #-16] seems to be used more often in the assembly, so is probably lval. Unsurprisingly, it's 0. But so is the other constant added to it. WTF! And maybe more useful: gdb --args ../gawk 'BEGIN { i = 5; ++i ; print(i) }': right before it does the ++: │ 0x1555c ldr r1, [r11, #12] │ │ 0x15560 add r1, r1, #24 │ │ 0x15564 ldr r2, [r1] │ │ 0x15568 cmp r2, #10 │ │ 0x1556c moveq r1, #1 │ │ 0x15570 movne r1, #0 │ │ 0x15574 str r0, [r11, #-60] ; 0xffffffc4 │ │ 0x15578 cmp r1, #0 │ │ 0x1557c beq 0x15584 │ │ 0x15580 b 0x15598 │ │ 0x15584 ldr lr, [pc] ; 0x1558c │ │ 0x15588 b 0x15590 │ │ 0x1558c CONSTANT 0x000508ec (little endian) │ 0x15590 vldr d0, [lr] │ │ 0x15594 b 0x155a8 │ │ 0x15598 ldr lr, [pc] ; 0x155a0 │ │ 0x1559c b 0x155a4 │ │ 0x155a0 CONSTANT 0x000508f4 (little endian) │ 0x155a4 vldr d0, [lr] ; instruction: 0xed9e0b00; so sign=1; note: double │ 0x155a8 vldr d1, [r11, #-16] ; instruction: 0xed1b1b04; so sign=0; note: double │ >0x155ac vadd.f64 d0, d1, d0 │ │ 0x155b0 vpush {d0} │ │ 0x155b4 mov r2, #97 ; 0x61 │ │ 0x155b8 pop {r0, r1} │ │ 0x155bc bl 0x23208 │ At address 0x000508ec there is: 0 (64 bit). That is incorrect. At address 0x000508f4 there is: 0 (64 bit). That is incorrect. Registers before the vadd are: d0 {u8 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, u16 = {0x0, 0x0, 0x0, 0x0}, u32 = {0x0, 0x0}, u64 = 0x0, f32 = {0x0, 0x0}, f64 = 0x0} d1 {u8 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x14, 0x40}, u16 = {0x0, 0x0, 0x0, 0x4014}, u32 = {0x0, 0x40140000}, u64 = 0x4014000000000000, f32 = {0x0, 0x2}, f64 = 0x5} So now we have to find out why tcc put those 0s in those constant literal slots. My wild guess is because it's trying to use 64 bit ints and we only gave it a 32 bit int at some point. But that should have been fixed by compiling tcc using tcc. Weird... tcc does the following (in arm-gen.c): int v, ft, fc, fr, sign; uint32_t op; SValue v1; fr = sv->r; ft = sv->type.t; fc = sv->c.i; if(is_float(ft)) { calcaddr(&base,&fc,&sign,1020,2); #ifdef TCC_ARM_VFP op=0xED100A00; /* flds */ if(!sign) op|=0x800000; if ((ft & VT_BTYPE) != VT_FLOAT) op|=0x100; /* flds -> fldd */ o(op|(vfpr(r)<<12)|(fc>>2)|(base<<16)); Uhhh.... how does a 64 bit double fit into an ARM (32 bit) int (variable "fc")? calcaddr seems to take the second argument, stuff it into a const pool (if too big) and return the address to the item in the second argument again. If it's not too big, then it's just used directly without pool entry. So it might make sense to add printfs to these locations in tcc and see what it does with the constants. @Paul: Could you decode what calcaddr is supposed to do, exactly (see git@gitlab.com:janneke/tinycc.git arm-gen.c)? That would help a lot. Something maybe unrelated: trying to run gawk using qemu-arm -singlestep -d nochain,in_asm,cpu ./gawk I get 0x0004e760: e3511000 cmp r1, #0 R00=00000000 R01=00000020 R02=407ffc58 R03=00000000 R04=00000000 R05=00000000 R06=00000000 R07=00000000 R08=00000000 R09=00000000 R10=0004fb78 R11=407ffc40 R12=407ffc58 R13=407ffc30 R14=0004e9b8 R15=0004e760 PSR=20000010 --C- A usr32 qemu: uncaught target signal 4 (Illegal instruction) - core dumped Illegal instruction The correct encoding of CMP is e3510000. The "illegal" encoding of CMP has "set condition codes" (bit 20) set. @Paul: Is setting the "set condition codes" flag on CMP only deprecated or actually forbidden? (I.e. am I chasing a qemu misemulation (again...) or is this actually a problem?) Attached the buggy gawk executable. I've also searched for other double literals (/= 0) in gawk's source code, and there's one in io.c (in do_getline), and one in builtin.c (rlength = -1.0 if a regexp does not match in do_match). That's it O_o. So there are very few double literals in there apparently. And indeed: $ echo | ./gawk 'BEGIN { print(getline); }' 0 <--- wrong No idea how to test the rlength thing, though. @Janneke: I'm not sure we have automated it far enough for Paul to reproduce this bug from scratch yet, right? Maybe we should already do a Mes for ARM release after all ? (provided the x86 bootstrap still works with that version, of course) I mean release early, release often and all. Otherwise it's really difficult for new people to get started.