all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* alist keys: strings or symbols
@ 2020-07-19 16:23 excalamus--- via Users list for the GNU Emacs text editor
  2020-07-19 23:23 ` Dmitry Alexandrov
  2020-07-20  9:01 ` tomas
  0 siblings, 2 replies; 3+ messages in thread
From: excalamus--- via Users list for the GNU Emacs text editor @ 2020-07-19 16:23 UTC (permalink / raw)
  To: help-gnu-emacs

Some questions about alists:

- Is it a better practice to convert string keys to symbols?  Is
  =intern= best for this?  What about handling illegal symbol names?
- If a symbol is used as a key and that symbol is already in use
  elsewhere, is there potential for conflict with the existing symbol?

I have an alist created from parsing meta data from a file.  The file
looks like:

#+begin_src emacs-lisp :results verbatim :session exc
(defvar exc-post-meta-data
  (concat
   "#+TITLE: Test post\n"
   "#+AUTHOR: Excalamus\n"
   "#+DATE: 2020-07-17\n"
   "#+TAGS: blogging tests\n"
   "\n")
  "Sample post meta information.")

(defvar exc-post-content
  (concat
   "* Header\n"
   "** Subheader\n"
   "Hello, world!\n\n"
   "#+begin_src python\n"
   "    print('Goodbye, cruel world...')\n"
   "#+end_src\n")
  "Sample post file without meta information.")

(defvar exc-post
  (concat
   exc-post-meta-data
   exc-post-content)
  "Sample post file.")

(message "%s" exc-post)
#+end_src

#+RESULTS:
#+begin_example
"#+TITLE: Test post
,#+AUTHOR: Excalamus
,#+DATE: 2020-07-17
,#+TAGS: blogging tests

,* Header
,** Subheader
Hello, world!

,#+begin_src python
    print('Goodbye, cruel world...')
,#+end_src
"
#+end_example

The meta data is parsed into an alist:

#+begin_src emacs-lisp :results verbatim :session exc
(defun exc-parse-org-meta-data (data)
  "Parse Org formatted meta DATA into an alist.

Keywords are the '#+' options given within an Org file.  These
are things like TITLE, DATE, and FILETAGS.  Keywords are
case-sensitive!.  Values are whatever remains on that line."
  (with-temp-buffer
    (insert data)
    (org-element-map (org-element-parse-buffer 'element) 'keyword
      (lambda (x) (cons (org-element-property :key x)
                        (org-element-property :value x))))))

(setq exc-alist (exc-parse-org-meta-data exc-post))
exc-alist
#+end_src

#+RESULTS:
: (("TITLE" . "Test post") ("AUTHOR" . "Excalamus") ("DATE" . "2020-07-17") ("TAGS" . "blogging tests"))

Notice that the keys are strings.  This means that they require
an equality predicate like ='string-equal= to retrieve unless I use
=assoc= and =cdr=:

#+begin_src emacs-lisp :results verbatim :session exc
(alist-get "TITLE" exc-alist)
#+end_src

#+RESULTS:
: nil

#+begin_src emacs-lisp :results verbatim :session exc
(cdr (assoc "TITLE" exc-alist))
#+end_src

#+RESULTS:
: "Test post"

I can use =assoc/cdr= well enough.  The bother starts when I need
a default.  It looks like =alist-get= is what I need.

#+begin_src emacs-lisp :results verbatim :session exc
(alist-get "TYPE" exc-alist 'post nil 'string-equal)
#+end_src

#+RESULTS:
: post

This works, but now the code is getting messy. There are two forms of
lookup: the verbose =alist-get= and the brute force =assoc/cdr=.  One
requires ='string-equal=, the other does not.  If I forget the
predicate, the lookup will fail silently.

I could create a wrapper for =alist-get= which uses =string-equal=:

#+begin_src emacs-lisp :results none :session exc
(defun exc-alist-get (key alist &optional default remove)
  "Get value associated with KEY in ALIST using `string-equal'.

See `alist-get' for explanation of DEFAULT and REMOVE."
  (alist-get key alist default remove 'string-equal))
#+end_src

Now my calls are uniform and a bit more safe:

#+begin_src emacs-lisp :results verbatim :session exc
(exc-alist-get "TITLE" exc-alist)
#+end_src

#+RESULTS:
: "Test post"

#+begin_src emacs-lisp :results verbatim :session exc
(exc-alist-get "TYPE" exc-alist 'post)
#+end_src

#+RESULTS:
: post

This works, but seems like a smell.  All these problems go
back to strings as keys.  Maybe there's a better way?

I could convert the keys to symbols using =intern=.  

#+begin_src emacs-lisp :results verbatim :session exc
(defun exc-parse-org-meta-data-intern (data)
  "Parse Org formatted meta DATA into an alist.

Keywords are the '#+' options given within an Org file.  These
are things like TITLE, DATE, and FILETAGS.  Keywords are
case-sensitive!.  Values are whatever remains on that line."
  (with-temp-buffer
    (insert data)
    (org-element-map (org-element-parse-buffer 'element) 'keyword
      (lambda (x) (cons (intern (org-element-property :key x))
                        (org-element-property :value x))))))

(setq exc-alist-i (exc-parse-org-meta-data-intern exc-post))
exc-alist-i
#+end_src

#+RESULTS:
: ((TITLE . "Test post") (AUTHOR . "Excalamus") (DATE . "2020-07-17") (TAGS . "blogging tests"))

This has several apparent problems.

As I understand it, this would pollute the global obarray. Is that a
real concern?  I know the symbol is only being used as a lookup; the
variable, function, and properties shouldn't change.  Regardless, I
don't want my package to conflict with (i.e. overwrite) a person's
environment unknowingly.

The string may also have characters illegal for use as a symbol.  
Here's what happens with illegal symbol characters in the string.
#+begin_src emacs-lisp :results verbatim :session exc
(setq exc-bad-meta-data
  (concat
   "#+THE TITLE: Test post\n"
   "#+AUTHOR: Excalamus\n"
   "#+DATE: 2020-07-17\n"
   "#+POST TAGS: blogging tests\n"
   "\n"))

(setq exc-alist-i-bad (exc-parse-org-meta-data-intern exc-bad-meta-data))
exc-alist-i-bad
#+end_src

#+RESULTS:
: ((AUTHOR . "Excalamus") (DATE . "2020-07-17"))

How are situations like these best handled?



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: alist keys: strings or symbols
  2020-07-19 16:23 alist keys: strings or symbols excalamus--- via Users list for the GNU Emacs text editor
@ 2020-07-19 23:23 ` Dmitry Alexandrov
  2020-07-20  9:01 ` tomas
  1 sibling, 0 replies; 3+ messages in thread
From: Dmitry Alexandrov @ 2020-07-19 23:23 UTC (permalink / raw)
  To: excalamus; +Cc: help-gnu-emacs

[-- Attachment #1: Type: text/plain, Size: 830 bytes --]

excalamus@tutanota.com wrote:
> The string may also have characters illegal for use as a symbol.

Namely?

> Here's what happens with illegal symbol characters in the string.
>
> #+begin_src emacs-lisp :results verbatim :session exc
> (setq exc-bad-meta-data
>   (concat
>    "#+THE TITLE: Test post\n"
>    "#+AUTHOR: Excalamus\n"
>    "#+DATE: 2020-07-17\n"
>    "#+POST TAGS: blogging tests\n"
>    "\n"))
>
> (setq exc-alist-i-bad (exc-parse-org-meta-data-intern exc-bad-meta-data))
> exc-alist-i-bad
> #+end_src
>
> #+RESULTS:
> : ((AUTHOR . "Excalamus") (DATE . "2020-07-17"))
>
> How are situations like these best handled?

You mean space?  Space perfectly valid character for a symbol.

I suppose, the result above is due to space being invalid character for org-mode metadata. ;-)

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 247 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: alist keys: strings or symbols
  2020-07-19 16:23 alist keys: strings or symbols excalamus--- via Users list for the GNU Emacs text editor
  2020-07-19 23:23 ` Dmitry Alexandrov
@ 2020-07-20  9:01 ` tomas
  1 sibling, 0 replies; 3+ messages in thread
From: tomas @ 2020-07-20  9:01 UTC (permalink / raw)
  To: help-gnu-emacs

[-- Attachment #1: Type: text/plain, Size: 4796 bytes --]

On Sun, Jul 19, 2020 at 06:23:52PM +0200, excalamus--- via Users list for the GNU Emacs text editor wrote:
> Some questions about alists:
> 
> - Is it a better practice to convert string keys to symbols?

It depends. Strings have an "inner life", i.e. are sequences
of characters, symbols are atomic and have no innards (but
see below).

So if you just want to know whether two keys are equal or not,
symbols are the more appropriate choice: it'll be faster, too;
if you find yourself asking whether one key is "greater" (that'd
be lexicographically, I guess) or "less" than another, or whether
it has such-and-such a prefix, you'd rather want a string.

The borders are somewhat fuzzy, since it's possible to extract
the string representation of a symbol). In Emacs Lisp they are
even fuzzier, since you can treat, given the right context, a
symbol as a string. This works for Emacs Lisp:

  (string< 'boo "far")
  => t

Emacs lisp transforms 'boo to "foo" and compares the strings
lexicographically.

* Different equalities:

What you have to bear in mind is that there are different measures
of equality: if you are comparing just the "objects" (if you come
from C, that's --basically-- the object's addresses), you use eq.
In that case, asking for "greater" or "less" doesn't make much sense.

If you are comparing the object's "innards", you use =equal=

>  Is =intern= best for this?  What about handling illegal symbol names?

Yes. And... there are few, if any, illegal symbol names. Try

  (setq foo ".(")

It works. It's a funny symbol, but who cares ;-)

> - If a symbol is used as a key and that symbol is already in use
>   elsewhere, is there potential for conflict with the existing symbol?

No. Interning something gives you an address (well, there's a type
tag attached to it). If it's used somewhere else, it'll reuse that,
otherwise, a new symbol is created. Since those things are immutable,
you don't care.

[...]

> Notice that the keys are strings.  This means that they require
> an equality predicate like ='string-equal= to retrieve unless I use
> =assoc= and =cdr=:

They only require it because you want them compared _as strings_. Had
you put symbols in there, then you could have used =eq= as comparison,
which is the default (so you can leave it out).

[...]

> This works, but now the code is getting messy. There are two forms of
> lookup: the verbose =alist-get= and the brute force =assoc/cdr=.  One
> requires ='string-equal=, the other does not.  If I forget the
> predicate, the lookup will fail silently.

"fail silently" meaning that it's looking for the wrong thing in your
assoc list and not finding it.

> I could convert the keys to symbols using =intern=.  

All that said, I'd think you go with this... unless you find yourself
looking at the innards of your keys too often (extracting prefixes,
doing case-insensitive search, that kind of thing). Remember that
=eq= is just one comparison (address, basically), whereas =equal=
has to first dereference the string and then compare character by
character.

Your keywords are a choice from a limited set, and are immutable,
so to me, they /look/ like symbols. That seems to be the fitting
representation.

> This has several apparent problems.
> 
> As I understand it, this would pollute the global obarray. Is that a
> real concern?

Shouldn't be. The global obarray is built for this.

> [...]  Regardless, I
> don't want my package to conflict with (i.e. overwrite) a person's
> environment unknowingly.

It won't. The obarray just maps a string to some immutable thingy
(basically a pointer with some decorations). This thingy can be
used for many things in different contexts. If some package out
there, say =shiny-widgets.el= binds some variable to the symbol
named "THE TITLE", that won't interfere with your usage. You just
happen to both use the symbol =0xdeadbef-plus-some-type-tags=
(which points to the symbol "THE TITLE" in the obarray) for
different things.

> 
> The string may also have characters illegal for use as a symbol.  
> Here's what happens with illegal symbol characters in the string.
> #+begin_src emacs-lisp :results verbatim :session exc
> (setq exc-bad-meta-data
>   (concat
>    "#+THE TITLE: Test post\n"
>    "#+AUTHOR: Excalamus\n"
>    "#+DATE: 2020-07-17\n"
>    "#+POST TAGS: blogging tests\n"
>    "\n"))
> 
> (setq exc-alist-i-bad (exc-parse-org-meta-data-intern exc-bad-meta-data))

I havent't had a look at your code, but "THE TITLE" interns fine as a
symbol here.

The important thing is that you make a choice and stick consistently
to it. That includes being aware of the comparison functions used.

Cheers
-- t

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-07-20  9:01 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-07-19 16:23 alist keys: strings or symbols excalamus--- via Users list for the GNU Emacs text editor
2020-07-19 23:23 ` Dmitry Alexandrov
2020-07-20  9:01 ` tomas

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.