From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: barry Newsgroups: gmane.emacs.bugs Subject: bug#1028: Improvement: Persistent Hash Store with GDBM Date: Thu, 25 Sep 2008 22:44:56 -0400 Message-ID: <48DC4CA8.5010901@sympatico.ca> Reply-To: barry , 1028@emacsbugs.donarmstrong.com NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Trace: ger.gmane.org 1222401905 16745 80.91.229.12 (26 Sep 2008 04:05:05 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 26 Sep 2008 04:05:05 +0000 (UTC) To: bug-gnu-emacs@gnu.org Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Fri Sep 26 06:06:02 2008 connect(): Connection refused Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1Kj4a0-0005eV-4r for geb-bug-gnu-emacs@m.gmane.org; Fri, 26 Sep 2008 06:05:46 +0200 Original-Received: from localhost ([127.0.0.1]:34674 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Kj4Yx-00013P-LV for geb-bug-gnu-emacs@m.gmane.org; Fri, 26 Sep 2008 00:04:15 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1Kj3iS-0007Of-Hw for bug-gnu-emacs@gnu.org; Thu, 25 Sep 2008 23:10:00 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1Kj3iR-0007O4-6y for bug-gnu-emacs@gnu.org; Thu, 25 Sep 2008 23:09:59 -0400 Original-Received: from [199.232.76.173] (port=34477 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Kj3iR-0007O1-3X for bug-gnu-emacs@gnu.org; Thu, 25 Sep 2008 23:09:59 -0400 Original-Received: from rzlab.ucr.edu ([138.23.92.77]:60127) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1Kj3iQ-0006um-GJ for bug-gnu-emacs@gnu.org; Thu, 25 Sep 2008 23:09:59 -0400 Original-Received: from rzlab.ucr.edu (rzlab.ucr.edu [127.0.0.1]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id m8Q39sYW032225; Thu, 25 Sep 2008 20:09:54 -0700 Original-Received: (from debbugs@localhost) by rzlab.ucr.edu (8.13.8/8.13.8/Submit) id m8Q2t4HO028017; Thu, 25 Sep 2008 19:55:04 -0700 X-Loop: don@donarmstrong.com Resent-From: barry Resent-To: bug-submit-list@donarmstrong.com Resent-CC: Emacs Bugs Resent-Date: Fri, 26 Sep 2008 02:55:04 +0000 Resent-Message-ID: Resent-Sender: don@donarmstrong.com X-Emacs-PR-Message: report 1028 X-Emacs-PR-Package: emacs X-Emacs-PR-Keywords: Original-Received: via spool by submit@emacsbugs.donarmstrong.com id=B.122239711326227 (code B ref -1); Fri, 26 Sep 2008 02:55:04 +0000 Original-Received: (at submit) by emacsbugs.donarmstrong.com; 26 Sep 2008 02:45:13 +0000 Original-Received: from lists.gnu.org (lists.gnu.org [199.232.76.165]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id m8Q2j7NC026023 for ; Thu, 25 Sep 2008 19:45:09 -0700 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1Kj3KN-0007Yw-Jm for bug-gnu-emacs@gnu.org; Thu, 25 Sep 2008 22:45:07 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1Kj3KL-0007Yk-Fz for bug-gnu-emacs@gnu.org; Thu, 25 Sep 2008 22:45:06 -0400 Original-Received: from [199.232.76.173] (port=45685 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Kj3KL-0007Yh-AS for bug-gnu-emacs@gnu.org; Thu, 25 Sep 2008 22:45:05 -0400 Original-Received: from tomts20.bellnexxia.net ([209.226.175.74]:56814 helo=tomts20-srv.bellnexxia.net) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1Kj3KK-0003bT-T7 for bug-gnu-emacs@gnu.org; Thu, 25 Sep 2008 22:45:05 -0400 Original-Received: from toip3.srvr.bell.ca ([209.226.175.86]) by tomts20-srv.bellnexxia.net (InterMail vM.5.01.06.13 201-253-122-130-113-20050324) with ESMTP id <20080926024500.XYUP1666.tomts20-srv.bellnexxia.net@toip3.srvr.bell.ca> for ; Thu, 25 Sep 2008 22:45:00 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Am0BAGXk20hMQoe+/2dsb2JhbAAItAaHF4Fl Original-Received: from bas3-toronto47-1279428542.dsl.bell.ca (HELO [192.168.1.66]) ([76.66.135.190]) by toip3.srvr.bell.ca with ESMTP; 25 Sep 2008 22:40:29 -0400 User-Agent: Thunderbird 1.5 (X11/20051201) X-detected-operating-system: by monty-python.gnu.org: Solaris 8 (1) X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6 (newer, 3) Resent-Date: Thu, 25 Sep 2008 23:09:59 -0400 X-Mailman-Approved-At: Fri, 26 Sep 2008 00:03:30 -0400 X-BeenThere: bug-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:20794 Archived-At: From: barry To: bug-gnu-emacs@gnu.org Subject: Improvement: Persistent Hash Store with GDBM --text follows this line-- Please write in English if possible, because the Emacs maintainers usually do not have translators to read other languages for them. Your bug report will be posted to the bug-gnu-emacs@gnu.org mailing list, and to the gnu.emacs.bug news group. Please describe exactly what actions triggered the bug and the precise symptoms of the bug: +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ This is not a bug but an enhancement, which implements persistent hash store using gdbm to effect the storage. The code is in file gdbm.c (below), with some minor changes to Makefile.in (in the emacs/src directory) and emacs.c to include the new functions into the emacs build. The new functions are as follows and mirror the equivalent functions in the gdbm package itself: 1. gdbm-open 2. gdbm-close 3. gdbm-fetch 4. gdbm-store 5. gdbm-delete 6. gdbm-exists 7. gdbm-firstkey 8. gdbm-nextkey 9. gdbm-sync 10.gdbm-reorganize The doc strings for each of these functions are included below. The array gdbm-open-files (maximum size set by configuration parameter max_gdbm_open_files - in syms_of_gdbm - set to 10 in gdbm.c below) contains a cons cell for each open gdbm file (referenced by an integer file id number). The car of the cons cell is the gdbm data file pointer, and the cdr is the name of the opened hash file. The values written to the hash files are all strings. Normally one would use prin1-to-string to store arbitrary lisp expressions and read-from-string to recover them. This mechanism allows for a simple and fast persistent hash storage for lisp data, directly within emacs lisp code, without the need to resort to external databases. The doc strings for the functions are shown below: 1. gdbm-open is a built-in function in `C source code'. (gdbm-open IDNO FILE ACCESS &optional MODE) Open FILENAME as a gdbm database and assign it the ID IDNO where IDNO is an integer in range 0 to max-gdbm-open-files - 1 ACCESS specifies access rights as one of strings: r for read w for read/write c for create (if none exists) n for force create a new one even if one exists MODE if present on new db create specifies the file permissions as a number ala chmod Returns: gdbm file reference ID on success or nil on failure 2. gdbm-close is a built-in function in `C source code'. (gdbm-close DBF) Close a gdbm database of the specified number. 3. gdbm-fetch is a built-in function in `C source code'. (gdbm-fetch DBF KEY) Fetch data from a gdbm database. Returns: string data stored under KEY or nil if no data under that key. 4. gdbm-store is a built-in function in `C source code'. (gdbm-store DBF KEY DATA) Store data in a gdbm database. KEY and DATA must be strings (to save binary data use prin1-to-string on key and/or data) If KEY already exists in the database it will be replaced with the new DATA If DATA is nil or empty then KEY will be deleted. Returns: 0 on successful insert -1 if open for read and tries insert. 5. gdbm-delete is a built-in function in `C source code'. (gdbm-delete DBF KEY) Delete data from a gdbm database. KEY must be a string Returns: 0 on successful delete -1 if key not in database 6. gdbm-exists is a built-in function in `C source code'. (gdbm-exists DBF KEY) Returns t if KEY is in the hash otherwise nil 7. gdbm-firstkey is a built-in function in `C source code'. (gdbm-firstkey DBF) Fetch first key data from a gdbm database. Returns: first key in GDBM hash or nil if none 8. gdbm-nextkey is a built-in function in `C source code'. (gdbm-nextkey DBF KEY) Fetch next key data from a gdbm database. Returns: the key following KEY in the gdbm hash table or nil if KEY is the last key. 9. gdbm-sync is a built-in function in `C source code'. (gdbm-sync DBF) Sync a gdbm database. Writes all buffered data to disk. 10. gdbm-reorganize is a built-in function in `C source code'. (gdbm-reorganize DBF) Reorganize a gdbm database. ------------------------------------------------------------- Following is the file emacs-22.1/src/gdbm.c to effect the above functions ------------------------------------------------------------- /* GDBM Library Interface */ #include #include "lisp.h" #include "blockinput.h" #include "commands.h" #include "keyboard.h" #include "dispextern.h" #include "charset.h" #include "coding.h" #include #include int max_gdbm_open_files; Lisp_Object Qgdbm_open_files,Vgdbm_open_files; DEFUN ("gdbm-open", Fgdbm_open, Sgdbm_open, 3, 4, 0, "Open FILENAME as a gdbm database and assign it \n\ the ID IDNO where IDNO is an integer in range \n\ 0 to max-gdbm-open-files - 1 \n\ ACCESS specifies access rights as one of strings: \n\ r for read \n\ w for read/write \n\ c for create (if none exists)\n \ n for force create a new one even if one exists\n\ MODE if present on new db create specifies the \n\ file permissions as a number ala chmod\n\ Returns: gdbm file reference ID on success or\n\ nil on failure") (idno,file,access,mode) Lisp_Object idno, file, access, mode; { int imode,iaccess; GDBM_FILE dbf; unsigned char *caccess; struct gcpro gcpro1, gcpro2, gcpro3; Lisp_Object ef, ef1, val; ef = Qnil; GCPRO3 (file, ef, ef1); //ensure id number is in range CHECK_NUMBER(idno); if((XINT(idno) < 0) || XINT(idno) >= max_gdbm_open_files) error("gdbm ID out of range"); //if we haven't yet set up the open files vector //do it now if(!VECTORP (Vgdbm_open_files)) Vgdbm_open_files=Fmake_vector(make_number(max_gdbm_open_files), Qnil); //see if there is an open file at the idno ef = AREF(Vgdbm_open_files, XINT(idno)); if(!NILP (ef)){ if(!CONSP(ef) || !NUMBERP(CAR(ef))) error("gdbm-open-files corrupted"); //if so close it gdbm_close((GDBM_FILE) XPNTR(CAR(ef))); ASET(Vgdbm_open_files,XINT(idno),Qnil); } CHECK_STRING (file); CHECK_STRING (access); if(NILP (file))return Qnil; if(!NILP (mode)){ CHECK_NUMBER(mode); imode = XUINT (mode); } else imode = 0666; ef = Fexpand_file_name (file, Qnil); ef1 = ENCODE_FILE (ef); caccess = XSTRING (access)->data; if(NILP (access))iaccess = GDBM_READER; else { switch (caccess[0]) { case 'r': case 'R': iaccess = GDBM_READER; break; case 'w': case 'W': iaccess = GDBM_WRITER; break; case 'c': case 'C': iaccess = GDBM_WRCREAT; break; case 'n': case 'N': iaccess = GDBM_NEWDB; break; default: iaccess = GDBM_READER; } } dbf = gdbm_open((char *)XSTRING(ef1)->data,0,iaccess,imode,0); if(!dbf)return(Qnil); val = XPNTR((unsigned)dbf); ASET(Vgdbm_open_files,XINT(idno),Fcons(val,ef1)); UNGCPRO; return idno; } static Lisp_Object idToGdbmKey(Lisp_Object dbf) { Lisp_Object val; //ensure id number is in range CHECK_NUMBER(dbf); if((XINT(dbf) < 0) || XINT(dbf) >= max_gdbm_open_files) error("gdbm ID out of range"); if(!VECTORP (Vgdbm_open_files)) error("no open files"); //see if there is an open file at the idno val = AREF(Vgdbm_open_files, XINT(dbf)); if(NILP(val))error("operation but no gdbm file open"); if(!CONSP(val) || !NUMBERP(CAR(val))) error("gdbm-open-files corrupted"); return(XPNTR(CAR(val))); } DEFUN ("gdbm-close", Fgdbm_close, Sgdbm_close, 1, 1, 0, "Close a gdbm database of the specified number.") (dbf) Lisp_Object dbf; { GDBM_FILE idbf; int ival; Lisp_Object val; val = idToGdbmKey(dbf); gdbm_close((GDBM_FILE) val); ASET(Vgdbm_open_files,XINT(dbf),Qnil); return (Qt); } DEFUN ("gdbm-delete", Fgdbm_delete, Sgdbm_delete, 2, 2, 0, "Delete data from a gdbm database.\n\ KEY must be a string\n\ Returns: 0 on successful delete \n\ -1 if key not in database") (dbf, key) Lisp_Object dbf, key; { Lisp_Object val; int oval; GDBM_FILE odbf; datum okey; val = idToGdbmKey(dbf); CHECK_STRING (key); odbf = (GDBM_FILE)XPNTR (val); okey.dptr = (char*)XSTRING (key)->data; //okey.dsize = XINT(Flength(key)); okey.dsize = STRING_BYTES(XSTRING (key)); oval = gdbm_delete(odbf, okey); val = make_number(oval); return(val); } DEFUN ("gdbm-store", Fgdbm_store, Sgdbm_store, 3, 3, 0, "Store data in a gdbm database.\n\ KEY and DATA must be strings \n\ (to save binary data use prin1-to-string on \n\ key and/or data)\n\ If KEY already exists in the database it will\n\ be replaced with the new DATA \n\ If DATA is nil or empty then KEY will be deleted.\n\ Returns: 0 on successful insert\n\ -1 if open for read and tries insert.") (dbf, key, data) Lisp_Object dbf, key, data; { Lisp_Object val; datum okey, odata; GDBM_FILE odbf; int ival; val = idToGdbmKey(dbf); CHECK_STRING (key); if(NILP(data))return(Fgdbm_delete(dbf, key)); CHECK_STRING (data); odbf = (GDBM_FILE)XPNTR (val); okey.dptr = (char *)XSTRING (key)->data; okey.dsize = STRING_BYTES(XSTRING (key)); odata.dptr = (char *)XSTRING (data)->data; odata.dsize = STRING_BYTES(XSTRING (data)); if(okey.dsize == 0)ival=0; else ival = gdbm_store(odbf,okey,odata,GDBM_REPLACE); val = make_number(XUINT(ival)); return val; } DEFUN ("gdbm-fetch", Fgdbm_fetch, Sgdbm_fetch, 2, 2, 0, "Fetch data from a gdbm database.\n\ Returns: string data stored under KEY or nil \n\ if no data under that key.") (dbf, key) Lisp_Object dbf, key; { Lisp_Object val; GDBM_FILE odbf; datum okey,oval; val = idToGdbmKey(dbf); CHECK_STRING (key); odbf = (GDBM_FILE)XPNTR (val); okey.dptr = (char *)XSTRING (key)->data; okey.dsize = STRING_BYTES(XSTRING (key)); oval = gdbm_fetch(odbf, okey); if(oval.dptr == NULL)return Qnil; val = make_string(oval.dptr, oval.dsize); free(oval.dptr); return val; } DEFUN ("gdbm-firstkey", Fgdbm_firstkey, Sgdbm_firstkey, 1, 1, 0, "Fetch first key data from a gdbm database.\n\ Returns: first key in GDBM hash or nil if none") (dbf) Lisp_Object dbf; { Lisp_Object val; GDBM_FILE odbf; datum oval; val = idToGdbmKey(dbf); odbf = (GDBM_FILE)XPNTR (val); oval = gdbm_firstkey(odbf); if(oval.dptr == NULL)return Qnil; val = make_string(oval.dptr, oval.dsize); free(oval.dptr); return val; } DEFUN ("gdbm-nextkey", Fgdbm_nextkey, Sgdbm_nextkey, 2, 2, 0, "Fetch next key data from a gdbm database.\n\ Returns: the key following KEY in the gdbm hash table\n\ or nil if KEY is the last key.") (dbf, key) Lisp_Object dbf, key; { Lisp_Object val; GDBM_FILE odbf; datum okey,oval; struct gcpro gcpro1; val = idToGdbmKey(dbf); GCPRO1 (val); CHECK_STRING (key); odbf = (GDBM_FILE)XPNTR (val); okey.dptr = (char *)XSTRING (key)->data; okey.dsize = STRING_BYTES(XSTRING (key)); oval = gdbm_nextkey(odbf, okey); if(oval.dptr == NULL)return Qnil; val = make_string(oval.dptr, oval.dsize); free(oval.dptr); UNGCPRO; return val; } DEFUN ("gdbm-exists", Fgdbm_exists, Sgdbm_exists, 2, 2, 0, "Returns t if KEY is in the hash otherwise nil") (dbf, key) Lisp_Object dbf, key; { Lisp_Object val; GDBM_FILE odbf; datum okey; int oval; val = idToGdbmKey(dbf); CHECK_STRING (key); odbf = (GDBM_FILE) XPNTR (val); okey.dptr = (char *)XSTRING (key)->data; okey.dsize = STRING_BYTES(XSTRING (key)); oval = gdbm_exists(odbf, okey); if(oval)return Qt; return Qnil; } DEFUN ("gdbm-reorganize", Fgdbm_reorganize, Sgdbm_reorganize, 1, 1, 0, "Reorganize a gdbm database.") (dbf) Lisp_Object dbf; { Lisp_Object val; int ival; GDBM_FILE odbf; val = idToGdbmKey(dbf); odbf = (GDBM_FILE)XPNTR (val); ival = gdbm_reorganize(odbf); val = make_number(ival); return val; } DEFUN ("gdbm-sync", Fgdbm_sync, Sgdbm_sync, 1, 1, 0, "Sync a gdbm database.\n\ Writes all buffered data to disk.") (dbf) Lisp_Object dbf; { Lisp_Object val; int ival; GDBM_FILE odbf; val = idToGdbmKey(dbf); odbf = (GDBM_FILE)XPNTR (val); gdbm_sync(odbf); return Qt; } void syms_of_gdbm () { DEFVAR_INT ("max-gdbm-open-files", &max_gdbm_open_files, "*Maximum number of open gdbm files."); max_gdbm_open_files=10; DEFVAR_INT ("gdbm_errno",(int *)&gdbm_errno, "*GDBM returned error number"); DEFVAR_LISP ("gdbm-open-files", &Vgdbm_open_files, "List of open GDBM files"); Vgdbm_open_files = Fmake_vector(make_number(max_gdbm_open_files),Qnil); Qgdbm_open_files = intern("gdbm-open-files"); staticpro(&Qgdbm_open_files); defsubr (&Sgdbm_open); defsubr (&Sgdbm_close); defsubr (&Sgdbm_store); defsubr (&Sgdbm_fetch); defsubr (&Sgdbm_delete); defsubr (&Sgdbm_firstkey); defsubr (&Sgdbm_nextkey); defsubr (&Sgdbm_exists); defsubr (&Sgdbm_reorganize); defsubr (&Sgdbm_sync); } ------------------------------------------------------------- Following are the changes to Makefile.in in emacs-22.1/src to include the gdbm.c module and the gdbm library in the build (Note that this could be handled better along with the max_open_gdbm_files as a configuration parameter/option) -------------------------------------------------------------- diff -r emacs-22.1/src/Makefile.in /users/barry/emacs-special/emacs-22.1/src/Makefile.in 589c589 < minibuf.o fileio.o dired.o filemode.o \ --- > minibuf.o fileio.o dired.o filemode.o gdbm.o\ 938c938 < LIBS_DEBUG $(GETLOADAVG_LIBS) $(GNULIB_VAR) LIB_MATH LIB_STANDARD \ --- > LIBS_DEBUG -lgdbm $(GETLOADAVG_LIBS) $(GNULIB_VAR) LIB_MATH LIB_STANDARD \ 1141a1142 > gdbm.o: gdbm.c $(config_h) blockinput.h commands.h keyboard.h dispextern.h charset.h coding.h --------------------------------------------------------------- --------------------------------------------------------------- Following are the changes to emacs.c to reference the gdbm.c module in the build: ---------------------------------------------------------------- diff -r emacs-22.1/src/emacs.c /users/barry/emacs-special/emacs-22.1/src/emacs.c 1562a1563 > syms_of_gdbm (); ---------------------------------------------------------------- Changelog entry: 2008-09-21 Barry Krofchick * gdbm.c Added built-in gdbm-based persistent hash tables for lisp and other data ---------------------------------------------------------------- That's it. Thanks for all the great work on emacs, a beautiful piece of software. I hope you can include the gdbm hash tables in future releases. They are extremely useful for managing large persistent lisp knowledge bases, quickly and easily from within emacs lisp code. I had, prior to this implementation used external custom server to do the same job, with significant reduction in performance. Thanks, Barry barry.krofchick@sympatico.ca ------------------------------------------------------------------------ In GNU Emacs 22.1.1 (i686-pc-linux-gnu, X toolkit) of 2008-01-23 on benny Windowing system distributor `The XFree86 Project, Inc', version 11.0.40500000 Important settings: value of $LC_ALL: nil value of $LC_COLLATE: nil value of $LC_CTYPE: nil value of $LC_MESSAGES: nil value of $LC_MONETARY: nil value of $LC_NUMERIC: nil value of $LC_TIME: nil value of $LANG: nil locale-coding-system: nil default-enable-multibyte-characters: t Major mode: Info Minor modes in effect: shell-dirtrack-mode: t tooltip-mode: t tool-bar-mode: t mouse-wheel-mode: t menu-bar-mode: t file-name-shadow-mode: t global-font-lock-mode: t font-lock-mode: t blink-cursor-mode: t unify-8859-on-encoding-mode: t utf-translate-cjk-mode: t auto-compression-mode: t line-number-mode: t abbrev-mode: t