From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Karl Fogel Newsgroups: gmane.emacs.devel Subject: Extending the ecomplete.el data store. Date: Sun, 04 Feb 2018 00:16:32 -0600 Message-ID: <87fu6hcm9r.fsf@red-bean.com> Reply-To: Karl Fogel NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: blaine.gmane.org 1517724891 18359 195.159.176.226 (4 Feb 2018 06:14:51 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sun, 4 Feb 2018 06:14:51 +0000 (UTC) User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux) To: Emacs Devel Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Feb 04 07:14:47 2018 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eiDZX-00043c-Fp for ged-emacs-devel@m.gmane.org; Sun, 04 Feb 2018 07:14:39 +0100 Original-Received: from localhost ([::1]:39464 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eiDbY-0004h9-74 for ged-emacs-devel@m.gmane.org; Sun, 04 Feb 2018 01:16:44 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:34120) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eiDbR-0004h0-Is for emacs-devel@gnu.org; Sun, 04 Feb 2018 01:16:39 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eiDbP-0006D2-V1 for emacs-devel@gnu.org; Sun, 04 Feb 2018 01:16:37 -0500 Original-Received: from mail-it0-x236.google.com ([2607:f8b0:4001:c0b::236]:39764) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eiDbP-0006Cn-N8 for emacs-devel@gnu.org; Sun, 04 Feb 2018 01:16:35 -0500 Original-Received: by mail-it0-x236.google.com with SMTP id c80so6356513itb.4 for ; Sat, 03 Feb 2018 22:16:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:subject:reply-to:date:message-id:user-agent :mime-version; bh=i98YrVOhcvAQqzPVjT52iKWsPJ6/FONgigH0sDnX+s4=; b=HxITtj6FYIIawroNpQGGvu9QsxldmydA17t0uBtzLU5E2it9kents1Bj8fnPisFzaN vXFBJhrRQC26yprZwv0D0Nj2knT2LY2XXywogbHPxIFyz5pJQOxiGRestC7E5yHwMwSW 5p4b3xstDZ9d5e2j/b2KyTMMH4q/5lovHhFXu7TnKtuIUAtCPNUfOHtzQNjyV7fSLXN5 1PhZhgJr/DRhDD100fqKvpd+E3Wvub1o28bgBsI0LGCtg92lRy6F/uPXE2DLgmrJsFKb 4HpzIGW3qjRdh4WcO/gXAiz2ASz0igJ5FEywd85XVmiiBkpZ0RzshZgA+xGNz/OBwqZP QrSw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:subject:reply-to:date:message-id :user-agent:mime-version; bh=i98YrVOhcvAQqzPVjT52iKWsPJ6/FONgigH0sDnX+s4=; b=GWf7f5c0+NukgJEOMVGmXm4Dkec29WuA5zSvA4lVXq4+yIUuYzCdtjIi++MjTpiQ9O yNSUuPtklfnL1EK/6h0GaG5dSdwWahXfrxWF5RUBirXbmwkonpi1QHxyj77+hlJgMB8i 7OU+G1wrtVnuT/sBdoI+Rf/nln9KZGWEEFbu+Myfvm41kd/Xm3odeRY/uABzb/yDQLNt Lj3QPKBmblPSbgBr6HA/wAGtV/FCW6BHdLhS/RXV8YzntQkY2YNZRxudoFltbaAlVKoK l2oXrGknU2mQYRs2J9JrtM+WSpgompPAIcDYOGfXhEY+bk0dWm/fle/nQruVVAZEO/Q0 FQLg== X-Gm-Message-State: AKwxytcx8LOaZ8CUNitjMcwTHbQYhgpmabIXiHqLHiMIowROn5o8HpTT W5rYPT9j8bs+nGuFYnVDf75k7g== X-Google-Smtp-Source: AH8x227pn/q2elPAhTf4aYld/y4d9mWQeCM2LHEQ/9UWPcnUZdXE1obI5rCxNt+/CYagOYiP1k3CkQ== X-Received: by 10.36.216.5 with SMTP id b5mr50761527itg.131.1517724994337; Sat, 03 Feb 2018 22:16:34 -0800 (PST) Original-Received: from floss ([2602:306:3707:da30:986:a4f1:7b3e:e57]) by smtp.gmail.com with ESMTPSA id t84sm3507613iod.6.2018.02.03.22.16.33 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sat, 03 Feb 2018 22:16:33 -0800 (PST) X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:4001:c0b::236 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:222491 Archived-At: This post's primary audience is Lars Ingebrigtsen -- we agreed to move this thread over here from Emacs Tangents [1] -- though of course anyone's welcome to join in. Some context for everyone else: after I wrote mailaprop [2] to do prioritized autofill for email addresses, Lars mentioned his ecomplete.el, which is part of Emacs. Ecomplete offers similar functionality, although its UI is minibuffer-based rather than tooltip-based, and it uses a different address prioritization algorithm from mailaprop. Lars, I'd like to propose extending the data stored by ecomplete.el so that it supports the union of the data needed by ecomplete and that needed by mailaprop. (Mostly what mailaprop stores is a superset of what ecomplete stores, with one exception; more on that below.) An ecomplete record looks like this: (KEY TIMES_USED LAST_TIME_USED STRING) Here is an example (the `mail' at the front is so that you could have one alist of things for `mail' and another for, say, `twitter', etc): ((mail ("larsi@example.com" 381 1516109510 "Lars Ingebrigtsen ") ("kfogel@example.com" 10 1516065455 "Karl Fogel ") ... )) Meanwhile, a mailaprop on-disk record looks like this: (KEY ((VARIANT LAST_TIME_USED SENT_COUNT RECEIVED_COUNT) ...)) Here's an example of a key with three variants: ("a.szymanowski@example.com" (("a.szymanowski@example.com" "2017 Jun 12" 29 31) ("A. Szymanowski " "2017 Sep 03" 1 0) ("Abilene Szymanowski " "2018 Jan 15" 8 7))) Let's ignore the fact that ecomplete stores dates as seconds-since-epoch while mailaprop uses human-readable strings; I'd be happy to switch mailaprop to the ecomplete way for that. We'll just focus on substantive differences here. At the individual record level, the mailaprop information is a superset of the ecomplete information in two ways: * Mailaprop remembers all the real-name variations and case variations individually, including case variations in the email address portion as well as in the real name portion. So each variation gets its own record, but they're all tied together under the same case-folded KEY so they can be scored together. (Contrast with ecomplete, where I believe `ecomplete-add-item' just remembers the most recently-seen variant for a given key.) * Mailaprop splits the TIMES-USED into SENT_COUNT and RECEIVED_COUNT, that is, number of times the user has sent to the address in question, and number of times the user has received mail from the user in question. At the next level up, ecomplete stores a piece of information that mailaprop does not: * Ecomplete starts the alist with a symbol that offers the possibility of multiple types of records, e.g., `mail', `twitter', etc. So, here's a proposal for a unified format that supports both packages -- this format is more verbose but more extensible: (KEY ; string: downcased email addr ((VARIANT ; string: case-preserving address w/ real name (TYPE ; symbol: `mail', etc ('last-sent LAST_TIME_SENT_TO) ; int: seconds since epoch ('last-recv LAST_TIME_RECEIVED_FROM_TO) ; int: seconds since epoch ('sent-count SENT_COUNT) ; int: total times sent ('recv-count RECEIVED_COUNT) ; int: total times received ) ...further TYPEs could go here... ) ...further VARIANTs here... ) ...[reserved, in case we ever need something other than VARIANTs]... ) That's the format for one record; the master record file is just a list of elements of the above type. This format offers many possibilities for creative scoring mechanisms, and is more easily extensible than either package's current format. If we unify the format, we should probably unify on one default record file too. Right now, `ecomplete-database-file' defaults to ~/.ecompleterc or ~/.emacs.d/ecompleterc, whereas `mailaprop-address-file' doesn't default to anything -- the user must set it manually: email addresses are pretty private, and I didn't want to guess about what locations would be confidential enough. I'd be happy to just have mailaprop use ecomplete's defaults for the database file, though. The privacy concern can be addressed with documentation. Now, about database maintenance: Mailaprop adds new addresses to the database using a different mechanism than ecomplete uses. Mailaprop users run an asynchronous script that reads all of their email and generates the database. Ecomplete watches email as it comes and goes in Emacs, and automagically keeps its database up-to-date. (I don't think ecomplete has any way to "catch up to the present" when you start using it; you just start out with no email addresses, and it watches everything you do from then on.) These two methods of database maintenance are basically compatible. In fact, one could use mailaprop's script to generate the database the first time, and then depend on ecomplete to keep it up-to-date after that. As long as we document what's going on, and each package uses its current defaults, I think we're fine. Those who use ecomplete will still get what they've been getting, and those who use mailaprop can either use the mailaprop way of periodically updating the database, or they can ask ecomplete to maintain it in real time for them (this might necessitate a trivial flag in ecomplete to get it to maintain the database while not offering completion, for those who want a mailaprop-style popup-autofill UI, but that's easy to do). I guess we would also switch to UTF-8 for the coding system for the database? (Right now `ecomplete-database-file-coding-system' defaults to `iso-2022-7bit'.) Note that ecomplete would have to add code to convert the new on-disk format to the in-memory format that ecomplete currently uses. That is, this function... (defun ecomplete-setup () "Read the .ecompleterc file." (when (file-exists-p ecomplete-database-file) (with-temp-buffer (let ((coding-system-for-read ecomplete-database-file-coding-system)) (insert-file-contents ecomplete-database-file) (setq ecomplete-database (read (current-buffer))))))) ...would need to be supplemented with something that does what `mailaprop-digest-raw-addresses' does in mailaprop, and the reverse for writing the data out. Obviously, this proposed new format is pretty easily convertible to and from ecomplete's in-memory representation. Whew, okay, those are my thoughts. I'm not sure whether it makes sense to unify the two packages themselves ever, but in any case using the same on-disk format would be a good move. Modifications or counterproposals welcome of course, and it's also perfectly okay to say "Thanks, but this isn't worth the trouble." :-). These two packages are so close in functionality and data that it seems a shame for them not to share a datastore, but we may just decide it's too much effort. If we decide that, we should at least put pointers in each package mentioning the other, and this thread, so future programmers at least have their attention drawn to the redundancy before making further enhancements. Best regards, -Karl [1] https://lists.gnu.org/archive/html/emacs-tangents/2018-01/msg00023.html [2] https://lists.gnu.org/archive/html/emacs-tangents/2018-01/msg00003.html