unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
From: excalamus--- via Users list for the GNU Emacs text editor <help-gnu-emacs@gnu.org>
To: help-gnu-emacs@gnu.org
Subject: alist keys: strings or symbols
Date: Sun, 19 Jul 2020 18:23:52 +0200 (CEST)	[thread overview]
Message-ID: <MCbssBv--3-2@tutanota.com> (raw)

Some questions about alists:

- Is it a better practice to convert string keys to symbols?  Is
  =intern= best for this?  What about handling illegal symbol names?
- If a symbol is used as a key and that symbol is already in use
  elsewhere, is there potential for conflict with the existing symbol?

I have an alist created from parsing meta data from a file.  The file
looks like:

#+begin_src emacs-lisp :results verbatim :session exc
(defvar exc-post-meta-data
  (concat
   "#+TITLE: Test post\n"
   "#+AUTHOR: Excalamus\n"
   "#+DATE: 2020-07-17\n"
   "#+TAGS: blogging tests\n"
   "\n")
  "Sample post meta information.")

(defvar exc-post-content
  (concat
   "* Header\n"
   "** Subheader\n"
   "Hello, world!\n\n"
   "#+begin_src python\n"
   "    print('Goodbye, cruel world...')\n"
   "#+end_src\n")
  "Sample post file without meta information.")

(defvar exc-post
  (concat
   exc-post-meta-data
   exc-post-content)
  "Sample post file.")

(message "%s" exc-post)
#+end_src

#+RESULTS:
#+begin_example
"#+TITLE: Test post
,#+AUTHOR: Excalamus
,#+DATE: 2020-07-17
,#+TAGS: blogging tests

,* Header
,** Subheader
Hello, world!

,#+begin_src python
    print('Goodbye, cruel world...')
,#+end_src
"
#+end_example

The meta data is parsed into an alist:

#+begin_src emacs-lisp :results verbatim :session exc
(defun exc-parse-org-meta-data (data)
  "Parse Org formatted meta DATA into an alist.

Keywords are the '#+' options given within an Org file.  These
are things like TITLE, DATE, and FILETAGS.  Keywords are
case-sensitive!.  Values are whatever remains on that line."
  (with-temp-buffer
    (insert data)
    (org-element-map (org-element-parse-buffer 'element) 'keyword
      (lambda (x) (cons (org-element-property :key x)
                        (org-element-property :value x))))))

(setq exc-alist (exc-parse-org-meta-data exc-post))
exc-alist
#+end_src

#+RESULTS:
: (("TITLE" . "Test post") ("AUTHOR" . "Excalamus") ("DATE" . "2020-07-17") ("TAGS" . "blogging tests"))

Notice that the keys are strings.  This means that they require
an equality predicate like ='string-equal= to retrieve unless I use
=assoc= and =cdr=:

#+begin_src emacs-lisp :results verbatim :session exc
(alist-get "TITLE" exc-alist)
#+end_src

#+RESULTS:
: nil

#+begin_src emacs-lisp :results verbatim :session exc
(cdr (assoc "TITLE" exc-alist))
#+end_src

#+RESULTS:
: "Test post"

I can use =assoc/cdr= well enough.  The bother starts when I need
a default.  It looks like =alist-get= is what I need.

#+begin_src emacs-lisp :results verbatim :session exc
(alist-get "TYPE" exc-alist 'post nil 'string-equal)
#+end_src

#+RESULTS:
: post

This works, but now the code is getting messy. There are two forms of
lookup: the verbose =alist-get= and the brute force =assoc/cdr=.  One
requires ='string-equal=, the other does not.  If I forget the
predicate, the lookup will fail silently.

I could create a wrapper for =alist-get= which uses =string-equal=:

#+begin_src emacs-lisp :results none :session exc
(defun exc-alist-get (key alist &optional default remove)
  "Get value associated with KEY in ALIST using `string-equal'.

See `alist-get' for explanation of DEFAULT and REMOVE."
  (alist-get key alist default remove 'string-equal))
#+end_src

Now my calls are uniform and a bit more safe:

#+begin_src emacs-lisp :results verbatim :session exc
(exc-alist-get "TITLE" exc-alist)
#+end_src

#+RESULTS:
: "Test post"

#+begin_src emacs-lisp :results verbatim :session exc
(exc-alist-get "TYPE" exc-alist 'post)
#+end_src

#+RESULTS:
: post

This works, but seems like a smell.  All these problems go
back to strings as keys.  Maybe there's a better way?

I could convert the keys to symbols using =intern=.  

#+begin_src emacs-lisp :results verbatim :session exc
(defun exc-parse-org-meta-data-intern (data)
  "Parse Org formatted meta DATA into an alist.

Keywords are the '#+' options given within an Org file.  These
are things like TITLE, DATE, and FILETAGS.  Keywords are
case-sensitive!.  Values are whatever remains on that line."
  (with-temp-buffer
    (insert data)
    (org-element-map (org-element-parse-buffer 'element) 'keyword
      (lambda (x) (cons (intern (org-element-property :key x))
                        (org-element-property :value x))))))

(setq exc-alist-i (exc-parse-org-meta-data-intern exc-post))
exc-alist-i
#+end_src

#+RESULTS:
: ((TITLE . "Test post") (AUTHOR . "Excalamus") (DATE . "2020-07-17") (TAGS . "blogging tests"))

This has several apparent problems.

As I understand it, this would pollute the global obarray. Is that a
real concern?  I know the symbol is only being used as a lookup; the
variable, function, and properties shouldn't change.  Regardless, I
don't want my package to conflict with (i.e. overwrite) a person's
environment unknowingly.

The string may also have characters illegal for use as a symbol.  
Here's what happens with illegal symbol characters in the string.
#+begin_src emacs-lisp :results verbatim :session exc
(setq exc-bad-meta-data
  (concat
   "#+THE TITLE: Test post\n"
   "#+AUTHOR: Excalamus\n"
   "#+DATE: 2020-07-17\n"
   "#+POST TAGS: blogging tests\n"
   "\n"))

(setq exc-alist-i-bad (exc-parse-org-meta-data-intern exc-bad-meta-data))
exc-alist-i-bad
#+end_src

#+RESULTS:
: ((AUTHOR . "Excalamus") (DATE . "2020-07-17"))

How are situations like these best handled?



             reply	other threads:[~2020-07-19 16:23 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-19 16:23 excalamus--- via Users list for the GNU Emacs text editor [this message]
2020-07-19 23:23 ` alist keys: strings or symbols Dmitry Alexandrov
2020-07-20  9:01 ` tomas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=MCbssBv--3-2@tutanota.com \
    --to=help-gnu-emacs@gnu.org \
    --cc=excalamus@tutanota.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).