all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Marcin Borkowski <mbork@mbork.pl>
To: help-gnu-emacs@gnu.org
Subject: Re: How the backquote and the comma really work?
Date: Sun, 23 Aug 2015 10:30:23 +0200	[thread overview]
Message-ID: <87y4h2h0f4.fsf@mbork.pl> (raw)
In-Reply-To: <878u9geaf7.fsf@kuiper.lan.informatimago.com>


On 2015-08-12, at 18:30, Pascal J. Bourguignon <pjb@informatimago.com> wrote:

> Michael Heerdegen <michael_heerdegen@web.de> writes:
>
>> Marcin Borkowski <mbork@mbork.pl> writes:
>>
>>> Interestingly, there's a lot of buzz about Lisp /interpreter/ written
>>> in Lisp, but not so much about Lisp /reader/ written in Lisp.  In
>>> fact, I didn't find one on the Internet.
>
> Not looking good enough.
>
> https://gitlab.com/com-informatimago/com-informatimago/tree/master/common-lisp/lisp-reader

Thanks!

> and of course, there's one in each lisp implementation.

But often in C or something, not in Lisp.

>> Good question.  Maybe it's because doing such things is mainly for
>> educational reasons, and when you want to learn how a language works,
>> studying the interpreter is more beneficial.
>
> But also, it's assumed that by teaching the most complex subjects,
> people will be able to deal with the less complex subjects by
> themselves. 
>
> Sometimes indeed it looks like not.

Especially if one doesn't have a CS background, and is mostly
self-taught.

Also, it's not that I'm unable to deal with that; after a few
iterations, I usually succeed.  My problem was not that I can't do it,
my problem was that I felt I was doing it suboptimally, and wanted to
see how smarter/more knowledgeable people deal with that.

>>> Now I'm wondering: is my approach (read one token at a time, but never
>>> go back, so that I can't really "peek" at the next one) reasonable?
>>> Maybe I should just read all tokens in a list?  I do not like this
>>> approach very much.  I could also set up a buffer, which would contain
>>> zero or one tokens to read, and put the already read token in that
>>> buffer in some cases (pretty much what TeX's \futurelet does.  Now
>>> I appreciate why it's there...).
>
> Most languages are designed to be (= to have a grammar that is) LL(1);
> there are also LR(0), SLR(1), LALR(1) languages, but as you can see, the
> parameter is at most 1 in general.   What this means is that the parser
> can work my looking ahead at most 1 token.  That is, it reads the
> current tokens, and it may look the next token, before deciding what
> grammar rule to apply.  Theorically, we could design languages that
> require a bigger look-ahead, but in practice it's not useful; in the
> case where the grammar would require longer look ahead, we often can
> easily add some syntax (a prefix keyword) to make it back into LL(1) (or
> LALR(1) if you're into that kind of grammar).

Now my lack of education is easily seen.  I only heard about formal
grammars (well, I had one class about them - I mean, /one class/, 90
minutes, some 15 years ago).

> Why is it useful?  Because it allows to read, scan and parse the source
> code by leaving it in a file and loading only one or two tokens in
> memory at once: it is basically an optimization for when you're
> inventing parsers on computers that don't have a lot of memory in the 60s.

And basically, this confirms my intuition that reading one token at
a time is not necessarily a stupid thing to do.

> And then! Even the first FORTRAN compiler, the one in 63 passes,
> actually kept the program source in memory (4 Kw), and instead loaded
> alternatively the passes of the compiler to process the data structures
> of the program that remained in memory!

Interesting!

> So indeed, there's very little reason to use short look-ahead, only that
> we have a theorical body well developped to generate parsers
> automatically from grammar of these forms.

I see.

> So, reading the whole source file in memory (or actually, already having
> it in memory, eg. in editor/compiler IDEs), is also a natural solution.
>
> Also for some languages, the processing of the source is defined in
> phases such as you end up easily having the whole sequence of tokens in
> memory. For example, the C preprocessor (but that's another story).
>
> Finally, parser generators such as PACKRAT being able to process
> grammars with unlimited lookahead, can benefit from pre-loading the
> whole source in memory.

Thanks for sharing - as hinted above, I have a lot to learn!

> In any case, it's rather an immaterial question, since on one side, you
> have abstractions such as lazy streams that let you process sequences
> (finite or infinite) as an I/O stream where you get each element in
> sequence and of course, you can copy a finite stream back into a
> sequence.  Both abstractions can be useful and used to write elegant
> algorithms.  So it doesn't matter.  Just have a pair of functions to
> convert buffers into streams and streams into buffer and use whichever
> you need for the current algorithm!

And most probably I'll end up coding an abstraction like this, with
a function for looking at the next token without “consuming” it, and
a function for “popping” the next token.  Converting between buffers and
streams wouldn’t be very useful for me, since I would either lose the
whole text structure (line-breaks, comments), or have to do a lot of
work to actually preserve it.

>> I really don't get the point in which way the Python example would have
>> advantages over yours.  The only difference is that your version
>> combines the two steps that are separate in the Python example.  Your
>> version is more efficient, since it avoids building a very long list
>> that is not really needed and will cause a lot of garbage collection to
>> be done afterwards.
>
> Nowadays sources, even of complete OS such as Android, are much smaller
> than the available RAM.  Therefore loading the whole file in RAM and
> building an index of tokens into it will be more efficient than
> performing O(n) I/O syscalls.

OTOH, here I walk an Emacs buffer and not an external file.  Moreover,
as I said, I don’t want to lose info on where I am in the source.

Thanks!

-- 
Marcin Borkowski
http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski
Faculty of Mathematics and Computer Science
Adam Mickiewicz University



  reply	other threads:[~2015-08-23  8:30 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-25 17:09 How the backquote and the comma really work? Marcin Borkowski
2015-06-25 17:33 ` Michael Heerdegen
2015-06-25 18:06   ` Marcin Borkowski
2015-06-25 18:22     ` Michael Heerdegen
2015-06-25 18:39       ` Marcin Borkowski
2015-06-25 18:44         ` Marcin Borkowski
2015-06-25 19:06           ` Michael Heerdegen
2015-07-10 11:36         ` Marcin Borkowski
2015-07-12 15:54           ` Michael Heerdegen
2015-07-12 19:55             ` Marcin Borkowski
2015-07-12 20:33               ` Marcin Borkowski
2015-07-14 18:17                 ` Marcin Borkowski
2015-07-14 22:08                   ` Emanuel Berg
2015-07-21 22:08                   ` Michael Heerdegen
2015-07-24 13:01                     ` Michael Heerdegen
2015-08-11 11:41                       ` Marcin Borkowski
2015-08-12 15:29                         ` Michael Heerdegen
     [not found]                         ` <mailman.8207.1439393377.904.help-gnu-emacs@gnu.org>
2015-08-12 16:30                           ` Pascal J. Bourguignon
2015-08-23  8:30                             ` Marcin Borkowski [this message]
     [not found]                             ` <mailman.110.1440318650.11330.help-gnu-emacs@gnu.org>
2015-08-23 16:46                               ` Pascal J. Bourguignon
2015-07-21 21:54                 ` Michael Heerdegen
2015-08-11 10:15                   ` Marcin Borkowski
2015-08-11 17:20                     ` Thorsten Jolitz
2015-08-12 15:01                       ` Michael Heerdegen
2015-07-21 21:50               ` Michael Heerdegen
2015-06-25 18:10 ` Drew Adams
2015-06-25 18:40   ` Michael Heerdegen
2015-06-25 18:53     ` Marcin Borkowski
2015-06-25 19:39       ` Michael Heerdegen
2015-06-25 20:05         ` Drew Adams
2015-06-25 20:18           ` Marcin Borkowski
2015-06-25 20:37             ` Drew Adams
2015-06-25 23:55     ` Robert Thorpe
     [not found]     ` <mailman.5697.1435276533.904.help-gnu-emacs@gnu.org>
2015-06-26  1:41       ` Rusi
2015-06-26 14:24         ` Michael Heerdegen
     [not found]         ` <mailman.5716.1435328741.904.help-gnu-emacs@gnu.org>
2015-06-26 14:35           ` Rusi
2015-06-26 14:51             ` Michael Heerdegen
2015-06-25 18:46   ` Marcin Borkowski
2015-06-26  7:31 ` tomas
2015-06-26 13:48   ` Drew Adams
2015-06-26 14:06     ` tomas
2015-06-26 15:06 ` Emanuel Berg
2015-07-12 17:38 ` Vaidheeswaran C
     [not found] <mailman.5657.1435252169.904.help-gnu-emacs@gnu.org>
2015-06-30 16:27 ` sokobania.01

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y4h2h0f4.fsf@mbork.pl \
    --to=mbork@mbork.pl \
    --cc=help-gnu-emacs@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.