all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: "Pascal J. Bourguignon" <pjb@informatimago.com>
To: help-gnu-emacs@gnu.org
Subject: Re: How the backquote and the comma really work?
Date: Wed, 12 Aug 2015 18:30:52 +0200	[thread overview]
Message-ID: <878u9geaf7.fsf@kuiper.lan.informatimago.com> (raw)
In-Reply-To: mailman.8207.1439393377.904.help-gnu-emacs@gnu.org

Michael Heerdegen <michael_heerdegen@web.de> writes:

> Marcin Borkowski <mbork@mbork.pl> writes:
>
>> Interestingly, there's a lot of buzz about Lisp /interpreter/ written
>> in Lisp, but not so much about Lisp /reader/ written in Lisp.  In
>> fact, I didn't find one on the Internet.

Not looking good enough.

https://gitlab.com/com-informatimago/com-informatimago/tree/master/common-lisp/lisp-reader

and of course, there's one in each lisp implementation.


> Good question.  Maybe it's because doing such things is mainly for
> educational reasons, and when you want to learn how a language works,
> studying the interpreter is more beneficial.

But also, it's assumed that by teaching the most complex subjects,
people will be able to deal with the less complex subjects by
themselves. 

Sometimes indeed it looks like not.


>> Now I'm wondering: is my approach (read one token at a time, but never
>> go back, so that I can't really "peek" at the next one) reasonable?
>> Maybe I should just read all tokens in a list?  I do not like this
>> approach very much.  I could also set up a buffer, which would contain
>> zero or one tokens to read, and put the already read token in that
>> buffer in some cases (pretty much what TeX's \futurelet does.  Now
>> I appreciate why it's there...).

Most languages are designed to be (= to have a grammar that is) LL(1);
there are also LR(0), SLR(1), LALR(1) languages, but as you can see, the
parameter is at most 1 in general.   What this means is that the parser
can work my looking ahead at most 1 token.  That is, it reads the
current tokens, and it may look the next token, before deciding what
grammar rule to apply.  Theorically, we could design languages that
require a bigger look-ahead, but in practice it's not useful; in the
case where the grammar would require longer look ahead, we often can
easily add some syntax (a prefix keyword) to make it back into LL(1) (or
LALR(1) if you're into that kind of grammar).

Why is it useful?  Because it allows to read, scan and parse the source
code by leaving it in a file and loading only one or two tokens in
memory at once: it is basically an optimization for when you're
inventing parsers on computers that don't have a lot of memory in the 60s.

And then! Even the first FORTRAN compiler, the one in 63 passes,
actually kept the program source in memory (4 Kw), and instead loaded
alternatively the passes of the compiler to process the data structures
of the program that remained in memory!

So indeed, there's very little reason to use short look-ahead, only that
we have a theorical body well developped to generate parsers
automatically from grammar of these forms.

So, reading the whole source file in memory (or actually, already having
it in memory, eg. in editor/compiler IDEs), is also a natural solution.

Also for some languages, the processing of the source is defined in
phases such as you end up easily having the whole sequence of tokens in
memory. For example, the C preprocessor (but that's another story).

Finally, parser generators such as PACKRAT being able to process
grammars with unlimited lookahead, can benefit from pre-loading the
whole source in memory.


In any case, it's rather an immaterial question, since on one side, you
have abstractions such as lazy streams that let you process sequences
(finite or infinite) as an I/O stream where you get each element in
sequence and of course, you can copy a finite stream back into a
sequence.  Both abstractions can be useful and used to write elegant
algorithms.  So it doesn't matter.  Just have a pair of functions to
convert buffers into streams and streams into buffer and use whichever
you need for the current algorithm!



> I really don't get the point in which way the Python example would have
> advantages over yours.  The only difference is that your version
> combines the two steps that are separate in the Python example.  Your
> version is more efficient, since it avoids building a very long list
> that is not really needed and will cause a lot of garbage collection to
> be done afterwards.

Nowadays sources, even of complete OS such as Android, are much smaller
than the available RAM.  Therefore loading the whole file in RAM and
building an index of tokens into it will be more efficient than
performing O(n) I/O syscalls.

-- 
__Pascal Bourguignon__                 http://www.informatimago.com/
“The factory of the future will have only two employees, a man and a
dog. The man will be there to feed the dog. The dog will be there to
keep the man from touching the equipment.” -- Carl Bass CEO Autodesk


  parent reply	other threads:[~2015-08-12 16:30 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-25 17:09 How the backquote and the comma really work? Marcin Borkowski
2015-06-25 17:33 ` Michael Heerdegen
2015-06-25 18:06   ` Marcin Borkowski
2015-06-25 18:22     ` Michael Heerdegen
2015-06-25 18:39       ` Marcin Borkowski
2015-06-25 18:44         ` Marcin Borkowski
2015-06-25 19:06           ` Michael Heerdegen
2015-07-10 11:36         ` Marcin Borkowski
2015-07-12 15:54           ` Michael Heerdegen
2015-07-12 19:55             ` Marcin Borkowski
2015-07-12 20:33               ` Marcin Borkowski
2015-07-14 18:17                 ` Marcin Borkowski
2015-07-14 22:08                   ` Emanuel Berg
2015-07-21 22:08                   ` Michael Heerdegen
2015-07-24 13:01                     ` Michael Heerdegen
2015-08-11 11:41                       ` Marcin Borkowski
2015-08-12 15:29                         ` Michael Heerdegen
     [not found]                         ` <mailman.8207.1439393377.904.help-gnu-emacs@gnu.org>
2015-08-12 16:30                           ` Pascal J. Bourguignon [this message]
2015-08-23  8:30                             ` Marcin Borkowski
     [not found]                             ` <mailman.110.1440318650.11330.help-gnu-emacs@gnu.org>
2015-08-23 16:46                               ` Pascal J. Bourguignon
2015-07-21 21:54                 ` Michael Heerdegen
2015-08-11 10:15                   ` Marcin Borkowski
2015-08-11 17:20                     ` Thorsten Jolitz
2015-08-12 15:01                       ` Michael Heerdegen
2015-07-21 21:50               ` Michael Heerdegen
2015-06-25 18:10 ` Drew Adams
2015-06-25 18:40   ` Michael Heerdegen
2015-06-25 18:53     ` Marcin Borkowski
2015-06-25 19:39       ` Michael Heerdegen
2015-06-25 20:05         ` Drew Adams
2015-06-25 20:18           ` Marcin Borkowski
2015-06-25 20:37             ` Drew Adams
2015-06-25 23:55     ` Robert Thorpe
     [not found]     ` <mailman.5697.1435276533.904.help-gnu-emacs@gnu.org>
2015-06-26  1:41       ` Rusi
2015-06-26 14:24         ` Michael Heerdegen
     [not found]         ` <mailman.5716.1435328741.904.help-gnu-emacs@gnu.org>
2015-06-26 14:35           ` Rusi
2015-06-26 14:51             ` Michael Heerdegen
2015-06-25 18:46   ` Marcin Borkowski
2015-06-26  7:31 ` tomas
2015-06-26 13:48   ` Drew Adams
2015-06-26 14:06     ` tomas
2015-06-26 15:06 ` Emanuel Berg
2015-07-12 17:38 ` Vaidheeswaran C
     [not found] <mailman.5657.1435252169.904.help-gnu-emacs@gnu.org>
2015-06-30 16:27 ` sokobania.01

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=878u9geaf7.fsf@kuiper.lan.informatimago.com \
    --to=pjb@informatimago.com \
    --cc=help-gnu-emacs@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.