From mboxrd@z Thu Jan 1 00:00:00 1970 From: ludo@gnu.org (Ludovic =?UTF-8?Q?Court=C3=A8s?=) Subject: bug#29654: Manual database index.db embeds timestamps Date: Sun, 17 Dec 2017 00:51:53 +0100 Message-ID: <87wp1mcjty.fsf@gnu.org> References: <87mv2jfzmi.fsf@gnu.org> <87po7fo6gv.fsf@elephly.net> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:45883) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQMFS-0006aD-2E for bug-guix@gnu.org; Sat, 16 Dec 2017 18:52:07 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eQMFO-00075f-Ti for bug-guix@gnu.org; Sat, 16 Dec 2017 18:52:06 -0500 Received: from debbugs.gnu.org ([208.118.235.43]:57389) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eQMFO-000756-O7 for bug-guix@gnu.org; Sat, 16 Dec 2017 18:52:02 -0500 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1eQMFO-0007Bu-Bd for bug-guix@gnu.org; Sat, 16 Dec 2017 18:52:02 -0500 Sender: "Debbugs-submit" Resent-Message-ID: In-Reply-To: <87po7fo6gv.fsf@elephly.net> (Ricardo Wurmus's message of "Sat, 16 Dec 2017 01:35:12 +0100") List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org Sender: "bug-Guix" To: Ricardo Wurmus Cc: Ruud van Asseldonk , 29654@debbugs.gnu.org --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Howdy Ricardo, Ricardo Wurmus skribis: >> Unfortunately, this is not fully deterministic: when running --check >> several times in a row, I occasionally get different results. I suspect >> GDBM=E2=80=99s output is not fully deterministic. > > Hmm, I dumped the contents of the generated databases with gdbm_dump and > couldn=E2=80=99t find any difference aside from the header (which is prod= uced by > gdbm_dump itself). Diffoscope shows a lot of differences, though. > > I thought that maybe the difference comes from the fact that upon adding > new entries gdbm grows the hash table. After setting the initial size > to a multiple of the number of entries I haven=E2=80=99t been able to gen= erate a > non-reproducible database. > > My only change is in =E2=80=9Cwrite-mandb-database=E2=80=9D: > > (gdbm-open file GDBM_WRCREAT #:block-size (* 512 (length entries))) > > I tried this: > > ./pre-inst-env guix package -p foo -i coreutils guile > for i in `seq 30`; do ./pre-inst-env guix build --check -K /gnu/store= /pg3684khpj69py40v7p76b90r9q4j2lv-manual-database.drv; done > > Seems fine. Coincidence or did I get lucky? I checked with the program below. It helps, but does not entirely fix it: --=-=-= Content-Type: text/plain Content-Disposition: inline; filename=t.scm Content-Description: the program (use-modules (guix man-db) (guix hash) (guix base32)) (define %database "/tmp/index.db") (let loop () (false-if-exception (delete-file %database)) (write-mandb-database %database (mandb-entries "/home/ludo/.guix-profile/share/man")) (pk (stat:size (stat %database)) (bytevector->nix-base32-string (file-sha256 %database))) (loop)) --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Valgrind reports this: --8<---------------cut here---------------start------------->8--- =3D=3D8395=3D=3D Syscall param write(buf) points to uninitialised byte(s) =3D=3D8395=3D=3D at 0x53E4A8D: ??? (in /gnu/store/3h31zsqxjjg52da5gp3qmh= kh4x8klhah-glibc-2.25/lib/libpthread-2.25.so) =3D=3D8395=3D=3D by 0xACAF44D: _gdbm_full_write (in /gnu/store/kg8ffb14m= sfnc9aivxj6djrl51g9b3zz-gdbm-1.13/lib/libgdbm.so.4.0.0) =3D=3D8395=3D=3D by 0xACAC6AD: gdbm_fd_open (in /gnu/store/kg8ffb14msfnc= 9aivxj6djrl51g9b3zz-gdbm-1.13/lib/libgdbm.so.4.0.0) =3D=3D8395=3D=3D by 0x55FA0BF: ffi_call_unix64 (in /gnu/store/kvi64k387h= qdrn59gsgd09brxh65jxjj-libffi-3.2.1/lib/libffi.so.6.0.4) =3D=3D8395=3D=3D by 0x55F8EE0: ffi_call (in /gnu/store/kvi64k387hqdrn59g= sgd09brxh65jxjj-libffi-3.2.1/lib/libffi.so.6.0.4) =3D=3D8395=3D=3D by 0x4E8C23C: scm_i_foreign_call (in /gnu/store/gwspk20= b7fbrs4l5rzgaadf8896h12bq-guile-2.2.3/lib/libguile-2.2.so.1.3.0) =3D=3D8395=3D=3D by 0x4EF9243: vm_regular_engine (in /gnu/store/gwspk20b= 7fbrs4l5rzgaadf8896h12bq-guile-2.2.3/lib/libguile-2.2.so.1.3.0) =3D=3D8395=3D=3D by 0x4EFC7B9: scm_call_n (in /gnu/store/gwspk20b7fbrs4l= 5rzgaadf8896h12bq-guile-2.2.3/lib/libguile-2.2.so.1.3.0) =3D=3D8395=3D=3D by 0x4E80A06: scm_primitive_eval (in /gnu/store/gwspk20= b7fbrs4l5rzgaadf8896h12bq-guile-2.2.3/lib/libguile-2.2.so.1.3.0) =3D=3D8395=3D=3D by 0x4E80A62: scm_eval (in /gnu/store/gwspk20b7fbrs4l5r= zgaadf8896h12bq-guile-2.2.3/lib/libguile-2.2.so.1.3.0) =3D=3D8395=3D=3D by 0x4ECBA6F: scm_shell (in /gnu/store/gwspk20b7fbrs4l5= rzgaadf8896h12bq-guile-2.2.3/lib/libguile-2.2.so.1.3.0) =3D=3D8395=3D=3D by 0x4E974AC: invoke_main_func (in /gnu/store/gwspk20b7= fbrs4l5rzgaadf8896h12bq-guile-2.2.3/lib/libguile-2.2.so.1.3.0) =3D=3D8395=3D=3D Address 0xced0044 is 4 bytes inside a block of size 8,388= ,608 alloc'd =3D=3D8395=3D=3D at 0x4C2AAD6: malloc (in /gnu/store/p2b1rzqlpdqbhn42g76= xzgykbivwc063-valgrind-3.12.0/lib/valgrind/vgpreload_memcheck-amd64-linux.s= o) =3D=3D8395=3D=3D by 0xACAC5E6: gdbm_fd_open (in /gnu/store/kg8ffb14msfnc= 9aivxj6djrl51g9b3zz-gdbm-1.13/lib/libgdbm.so.4.0.0) =3D=3D8395=3D=3D by 0x55FA0BF: ffi_call_unix64 (in /gnu/store/kvi64k387h= qdrn59gsgd09brxh65jxjj-libffi-3.2.1/lib/libffi.so.6.0.4) =3D=3D8395=3D=3D by 0x55F8EE0: ffi_call (in /gnu/store/kvi64k387hqdrn59g= sgd09brxh65jxjj-libffi-3.2.1/lib/libffi.so.6.0.4) =3D=3D8395=3D=3D by 0x4E8C23C: scm_i_foreign_call (in /gnu/store/gwspk20= b7fbrs4l5rzgaadf8896h12bq-guile-2.2.3/lib/libguile-2.2.so.1.3.0) --8<---------------cut here---------------end--------------->8--- >> +(define (entry->string entry) >> + "Return the wire format for ENTRY as a string." >> + (match entry >> + (($ file name section synopsis) >> + (string-append (abbreviate-file-name file) "\t" >> + (number->string section) "\t" >> + (number->string section) >> + >> + ;; Timestamps, that we always set to the epoch. >> + "\t0\t0" >> + >> + ;; XXX: Weird things. >> + "\tB\t-\t-\tgz\t" > > What=E2=80=99s that? In db_store.c it=E2=80=99s done like this: --8<---------------cut here---------------start------------->8--- MYDBM_SET (cont, xasprintf ( "%s\t%s\t%s\t%ld\t%ld\t%c\t%s\t%s\t%s\t%s", dash_if_unset (in->name), in->ext, in->sec, (long) in->mtime.tv_sec, in->mtime.tv_nsec, in->id, in->pointer, in->filter, in->comp, in->whatis)); --8<---------------cut here---------------end--------------->8--- and db_storage.h says: --8<---------------cut here---------------start------------->8--- struct mandata { struct mandata *next; /* ptr to next structure, if any */ char *addr; /* ptr to memory containing the fields */ char *name; /* Name of page, if !=3D key */ /* The following are all const because they should be pointers to * parts of strings allocated elsewhere (often the addr field above) * and should not be written through or freed themselves. */ const char *ext; /* Filename ext w/o comp ext */ const char *sec; /* Section name/number */ char id; /* id for this entry */ const char *pointer; /* id related file pointer */ const char *comp; /* Compression extension */ const char *filter; /* filters needed for the page */ const char *whatis; /* whatis description for page */ struct timespec mtime; /* mod time for file */ };=20 --8<---------------cut here---------------end--------------->8--- The =E2=80=98B=E2=80=99 part gives the kind of manual page: --8<---------------cut here---------------start------------->8--- /* These definitions give an inherent precedence to each particular type of manual page: =20=20=20 ULT_MAN: ultimate manual page, the full source nroff file. SO_MAN: source nroff file containing .so request to an ULT_MAN. WHATIS_MAN: virtual `whatis referenced' page pointing to an ULT_MAN. STRAY_CAT: pre-formatted manual page with no source. WHATIS_CAT: virtual `whatis referenced' page pointing to a STRAY_CAT. */ --8<---------------cut here---------------end--------------->8--- I=E2=80=99ve updated man-db.scm to handle that better. Thanks, Ludo=E2=80=99. --=-=-= Content-Type: text/x-patch Content-Disposition: inline diff --git a/guix/man-db.scm b/guix/man-db.scm index b42558b06..3ce268547 100644 --- a/guix/man-db.scm +++ b/guix/man-db.scm @@ -29,6 +29,7 @@ mandb-entry-name mandb-entry-section mandb-entry-synopsis + mandb-entry-kind mandb-entries write-mandb-database)) @@ -47,12 +48,13 @@ (module-use! (current-module) (resolve-interface '(gdbm))) (define-record-type - (mandb-entry file-name name section synopsis) + (mandb-entry file-name name section synopsis kind) mandb-entry? (file-name mandb-entry-file-name) ;e.g., "../abiword.1.gz" (name mandb-entry-name) ;e.g., "ABIWORD" (section mandb-entry-section) ;number - (synopsis mandb-entry-synopsis)) ;string + (synopsis mandb-entry-synopsis) ;string + (kind mandb-entry-kind)) ;'ultimate | 'link (define (mandb-entrystring entry) "Return the wire format for ENTRY as a string." (match entry - (($ file name section synopsis) + (($ file name section synopsis kind) + ;; See db_store.c:make_content in man-db for the format. (string-append (abbreviate-file-name file) "\t" (number->string section) "\t" (number->string section) - ;; Timestamps, that we always set to the epoch. + ;; Timestamp that we always set to the epoch. "\t0\t0" - ;; XXX: Weird things. - "\tB\t-\t-\tgz\t" + ;; See "db_storage.h" in man-db for the different kinds. + "\t" + (case kind + ((ultimate) "A") ;ultimate man page + ((link) "B") ;".so" link to other man page + (else "A")) ;something that doesn't matter much + + "\t-\t-\t" + + (if (string-suffix? ".gz" file) "gz" "") + "\t" synopsis "\x00")))) @@ -94,7 +106,8 @@ (define (write-mandb-database file entries) "Write ENTRIES to FILE as a man-db database. FILE is usually \".../index.db\", and is a GDBM database." - (let ((db (gdbm-open file GDBM_WRCREAT))) + (let ((db (gdbm-open file GDBM_WRCREAT + #:block-size (* 512 (length entries))))) (gdbm-set! db %version-key %version-value) ;; Write ENTRIES in sorted order so we get deterministic output. @@ -141,33 +154,37 @@ (string->number (string-drop (string-drop-right str 1) 1)) (string->number str))) + ;; Note: This works for both gzipped and uncompressed files. (call-with-gzip-input-port (open-file file "r0") (lambda (port) (let loop ((name #f) (section #f) - (synopsis #f)) + (synopsis #f) + (kind 'ultimate)) (if (and name section synopsis) - (mandb-entry file name section synopsis) + (mandb-entry file name section synopsis kind) (let ((line (read-line port))) (if (eof-object? line) - (mandb-entry file name (or section 0) (or synopsis "")) + (mandb-entry file name (or section 0) (or synopsis "") + kind) (match (string-tokenize line) ((".TH" name (= string->number* section) _ ...) - (loop name section synopsis)) + (loop name section synopsis kind)) ((".SH" (or "NAME" "\"NAME\"")) - (loop name section (read-synopsis port))) + (loop name section (read-synopsis port) kind)) ((".so" link) (match (and=> (resolve link) (cut man-page->entry <> resolve)) (#f - (loop name section synopsis)) + (loop name section synopsis 'link)) (alias (mandb-entry file (mandb-entry-name alias) (mandb-entry-section alias) - (mandb-entry-synopsis alias))))) + (mandb-entry-synopsis alias) + 'link)))) (_ - (loop name section synopsis)))))))))) + (loop name section synopsis kind)))))))))) (define (man-files directory) "Return the list of man pages found under DIRECTORY, recursively." --=-=-=--