unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Extending the ecomplete.el data store.
@ 2018-02-04  6:16 Karl Fogel
  2018-02-04 22:33 ` Stefan Monnier
  2018-02-05  9:40 ` Extending the ecomplete.el data store Lars Ingebrigtsen
  0 siblings, 2 replies; 20+ messages in thread
From: Karl Fogel @ 2018-02-04  6:16 UTC (permalink / raw)
  To: Emacs Devel

This post's primary audience is Lars Ingebrigtsen -- we agreed to move this thread over here from Emacs Tangents [1] -- though of course anyone's welcome to join in.

Some context for everyone else: after I wrote mailaprop [2] to do prioritized autofill for email addresses, Lars mentioned his ecomplete.el, which is part of Emacs.  Ecomplete offers similar functionality, although its UI is minibuffer-based rather than tooltip-based, and it uses a different address prioritization algorithm from mailaprop.

Lars, I'd like to propose extending the data stored by ecomplete.el so that it supports the union of the data needed by ecomplete and that needed by mailaprop.  (Mostly what mailaprop stores is a superset of what ecomplete stores, with one exception; more on that below.)

An ecomplete record looks like this:

  (KEY  TIMES_USED  LAST_TIME_USED  STRING)

Here is an example (the `mail' at the front is so that you could have one alist of things for `mail' and another for, say, `twitter', etc):

  ((mail
   ("larsi@example.com" 381 1516109510 "Lars Ingebrigtsen <larsi@example.com>")
   ("kfogel@example.com" 10 1516065455 "Karl Fogel <kfogel@example.com>")
   ...
   ))

Meanwhile, a mailaprop on-disk record looks like this:

  (KEY
   ((VARIANT  LAST_TIME_USED  SENT_COUNT  RECEIVED_COUNT)
    ...))

Here's an example of a key with three variants:

  ("a.szymanowski@example.com"
   (("a.szymanowski@example.com"                       "2017 Jun 12"  29 31)
    ("A. Szymanowski <a.szymanowski@example.com>"      "2017 Sep 03"   1  0)
    ("Abilene Szymanowski <A.Szymanowski@example.com>" "2018 Jan 15"   8  7)))

Let's ignore the fact that ecomplete stores dates as seconds-since-epoch while mailaprop uses human-readable strings; I'd be happy to switch mailaprop to the ecomplete way for that.  We'll just focus on substantive differences here.

At the individual record level, the mailaprop information is a superset of the ecomplete information in two ways:

* Mailaprop remembers all the real-name variations and case variations individually, including case variations in the email address portion as well as in the real name portion.  So each variation gets its own record, but they're all tied together under the same case-folded KEY so they can be scored together.  (Contrast with ecomplete, where I believe `ecomplete-add-item' just remembers the most recently-seen variant for a given key.)

* Mailaprop splits the TIMES-USED into SENT_COUNT and RECEIVED_COUNT, that is, number of times the user has sent to the address in question, and number of times the user has received mail from the user in question.

At the next level up, ecomplete stores a piece of information that mailaprop does not:

* Ecomplete starts the alist with a symbol that offers the possibility of multiple types of records, e.g., `mail', `twitter', etc.

So, here's a proposal for a unified format that supports both packages -- this format is more verbose but more extensible:

  (KEY          ; string: downcased email addr
    ((VARIANT   ; string: case-preserving address w/ real name
       (TYPE                                     ; symbol: `mail', etc
         ('last-sent  LAST_TIME_SENT_TO)          ; int: seconds since epoch
         ('last-recv  LAST_TIME_RECEIVED_FROM_TO) ; int: seconds since epoch
         ('sent-count SENT_COUNT)                 ; int: total times sent
         ('recv-count RECEIVED_COUNT)             ; int: total times received
       )
       ...further TYPEs could go here...
     )
     ...further VARIANTs here...
    )
    ...[reserved, in case we ever need something other than VARIANTs]...
  )

That's the format for one record; the master record file is just a list of elements of the above type.

This format offers many possibilities for creative scoring mechanisms, and is more easily extensible than either package's current format.

If we unify the format, we should probably unify on one default record file too.  Right now, `ecomplete-database-file' defaults to ~/.ecompleterc or ~/.emacs.d/ecompleterc, whereas `mailaprop-address-file' doesn't default to anything -- the user must set it manually: email addresses are pretty private, and I didn't want to guess about what locations would be confidential enough.  I'd be happy to just have mailaprop use ecomplete's defaults for the database file, though.  The privacy concern can be addressed with documentation.

Now, about database maintenance:

Mailaprop adds new addresses to the database using a different mechanism than ecomplete uses.  Mailaprop users run an asynchronous script that reads all of their email and generates the database.  Ecomplete watches email as it comes and goes in Emacs, and automagically keeps its database up-to-date.  (I don't think ecomplete has any way to "catch up to the present" when you start using it; you just start out with no email addresses, and it watches everything you do from then on.)

These two methods of database maintenance are basically compatible.  In fact, one could use mailaprop's script to generate the database the first time, and then depend on ecomplete to keep it up-to-date after that.  As long as we document what's going on, and each package uses its current defaults, I think we're fine.  Those who use ecomplete will still get what they've been getting, and those who use mailaprop can either use the mailaprop way of periodically updating the database, or they can ask ecomplete to maintain it in real time for them (this might necessitate a trivial flag in ecomplete to get it to maintain the database while not offering completion, for those who want a mailaprop-style popup-autofill UI, but that's easy to do).

I guess we would also switch to UTF-8 for the coding system for the database?  (Right now `ecomplete-database-file-coding-system' defaults to `iso-2022-7bit'.)

Note that ecomplete would have to add code to convert the new on-disk format to the in-memory format that ecomplete currently uses.  That is, this function...

  (defun ecomplete-setup ()
    "Read the .ecompleterc file."
    (when (file-exists-p ecomplete-database-file)
      (with-temp-buffer
        (let ((coding-system-for-read ecomplete-database-file-coding-system))
          (insert-file-contents ecomplete-database-file)
          (setq ecomplete-database (read (current-buffer)))))))
  
...would need to be supplemented with something that does what `mailaprop-digest-raw-addresses' does in mailaprop, and the reverse for writing the data out.  Obviously, this proposed new format is pretty easily convertible to and from ecomplete's in-memory representation.

Whew, okay, those are my thoughts.  I'm not sure whether it makes sense to unify the two packages themselves ever, but in any case using the same on-disk format would be a good move.

Modifications or counterproposals welcome of course, and it's also perfectly okay to say "Thanks, but this isn't worth the trouble." :-).  These two packages are so close in functionality and data that it seems a shame for them not to share a datastore, but we may just decide it's too much effort.  If we decide that, we should at least put pointers in each package mentioning the other, and this thread, so future programmers at least have their attention drawn to the redundancy before making further enhancements.

Best regards,
-Karl

[1] https://lists.gnu.org/archive/html/emacs-tangents/2018-01/msg00023.html

[2] https://lists.gnu.org/archive/html/emacs-tangents/2018-01/msg00003.html



^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2018-04-10 21:08 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-02-04  6:16 Extending the ecomplete.el data store Karl Fogel
2018-02-04 22:33 ` Stefan Monnier
2018-02-04 23:54   ` Karl Fogel
2018-02-05  2:34     ` Stefan Monnier
2018-02-05  7:17       ` Karl Fogel
2018-02-05 18:30         ` Stefan Monnier
2018-02-06 20:19           ` Karl Fogel
2018-02-06 20:39             ` Stefan Monnier
2018-02-05  9:41   ` Lars Ingebrigtsen
2018-02-06 21:01     ` Modifying a shared file (was: Extending the ecomplete.el data store) Stefan Monnier
2018-02-06 22:33       ` Modifying a shared file Clément Pit-Claudel
2018-02-05  9:40 ` Extending the ecomplete.el data store Lars Ingebrigtsen
2018-02-06 20:17   ` Karl Fogel
2018-04-10 20:47     ` Lars Ingebrigtsen
2018-02-06 21:12   ` Stefan Monnier
2018-02-06 23:04     ` Karl Fogel
2018-02-06 23:21       ` Stefan Monnier
2018-02-08 17:21         ` Karl Fogel
2018-04-10 21:00     ` Lars Ingebrigtsen
2018-04-10 21:08       ` Stefan Monnier

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).