From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Marcin Borkowski <mbork@mbork.pl>
Newsgroups: gmane.emacs.help
Subject: Re: How the backquote and the comma really work?
Date: Sun, 23 Aug 2015 10:30:23 +0200
Message-ID: <87y4h2h0f4.fsf@mbork.pl>
References: <87vbebg1fs.fsf@mbork.pl> <87r3ozy9pf.fsf@web.de>
	<87si9ffys0.fsf@mbork.pl> <87d20jbqbj.fsf@web.de>
	<87pp4jfx9y.fsf@mbork.pl> <87615sxn1a.fsf@mbork.pl>
	<87zj318j7z.fsf@web.de> <87mvz1b16h.fsf@mbork.pl>
	<87k2u5azfi.fsf@mbork.pl> <87615mbo3z.fsf@mbork.pl>
	<877fptnoyu.fsf@web.de> <87k2tpyajy.fsf@web.de>
	<87h9o6gihr.fsf@mbork.pl>
	<mailman.8207.1439393377.904.help-gnu-emacs@gnu.org>
	<878u9geaf7.fsf@kuiper.lan.informatimago.com>
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Trace: ger.gmane.org 1440318669 7115 80.91.229.3 (23 Aug 2015 08:31:09 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Sun, 23 Aug 2015 08:31:09 +0000 (UTC)
To: help-gnu-emacs@gnu.org
Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Sun Aug 23 10:31:01 2015
Return-path: <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org>
Envelope-to: geh-help-gnu-emacs@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org>)
	id 1ZTQg7-0000zX-VC
	for geh-help-gnu-emacs@m.gmane.org; Sun, 23 Aug 2015 10:31:00 +0200
Original-Received: from localhost ([::1]:49682 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org>)
	id 1ZTQg7-0008Qw-2T
	for geh-help-gnu-emacs@m.gmane.org; Sun, 23 Aug 2015 04:30:59 -0400
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:42822)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mbork@mbork.pl>) id 1ZTQfw-0008Pw-69
	for help-gnu-emacs@gnu.org; Sun, 23 Aug 2015 04:30:49 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <mbork@mbork.pl>) id 1ZTQfs-0006KC-V8
	for help-gnu-emacs@gnu.org; Sun, 23 Aug 2015 04:30:48 -0400
Original-Received: from mail.mojserwer.eu ([2a01:5e00:2:52::8]:51390)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mbork@mbork.pl>) id 1ZTQfs-0006J3-HF
	for help-gnu-emacs@gnu.org; Sun, 23 Aug 2015 04:30:44 -0400
Original-Received: from localhost (localhost [127.0.0.1])
	by mail.mojserwer.eu (Postfix) with ESMTP id 59DBF572006
	for <help-gnu-emacs@gnu.org>; Sun, 23 Aug 2015 10:30:38 +0200 (CEST)
X-Virus-Scanned: Debian amavisd-new at mail.mojserwer.eu
Original-Received: from mail.mojserwer.eu ([127.0.0.1])
	by localhost (mail.mojserwer.eu [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id QsiysXRwgdwR for <help-gnu-emacs@gnu.org>;
	Sun, 23 Aug 2015 10:30:32 +0200 (CEST)
Original-Received: from localhost (103-115.echostar.pl [213.156.103.115])
	by mail.mojserwer.eu (Postfix) with ESMTPSA id 42A01572005
	for <help-gnu-emacs@gnu.org>; Sun, 23 Aug 2015 10:30:31 +0200 (CEST)
In-reply-to: <878u9geaf7.fsf@kuiper.lan.informatimago.com>
X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address
	(bad octet value).
X-Received-From: 2a01:5e00:2:52::8
X-BeenThere: help-gnu-emacs@gnu.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Users list for the GNU Emacs text editor <help-gnu-emacs.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/help-gnu-emacs>,
	<mailto:help-gnu-emacs-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/help-gnu-emacs>
List-Post: <mailto:help-gnu-emacs@gnu.org>
List-Help: <mailto:help-gnu-emacs-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/help-gnu-emacs>,
	<mailto:help-gnu-emacs-request@gnu.org?subject=subscribe>
Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org
Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.help:106776
Archived-At: <http://permalink.gmane.org/gmane.emacs.help/106776>


On 2015-08-12, at 18:30, Pascal J. Bourguignon <pjb@informatimago.com> wr=
ote:

> Michael Heerdegen <michael_heerdegen@web.de> writes:
>
>> Marcin Borkowski <mbork@mbork.pl> writes:
>>
>>> Interestingly, there's a lot of buzz about Lisp /interpreter/ written
>>> in Lisp, but not so much about Lisp /reader/ written in Lisp.  In
>>> fact, I didn't find one on the Internet.
>
> Not looking good enough.
>
> https://gitlab.com/com-informatimago/com-informatimago/tree/master/comm=
on-lisp/lisp-reader

Thanks!

> and of course, there's one in each lisp implementation.

But often in C or something, not in Lisp.

>> Good question.  Maybe it's because doing such things is mainly for
>> educational reasons, and when you want to learn how a language works,
>> studying the interpreter is more beneficial.
>
> But also, it's assumed that by teaching the most complex subjects,
> people will be able to deal with the less complex subjects by
> themselves.=20
>
> Sometimes indeed it looks like not.

Especially if one doesn't have a CS background, and is mostly
self-taught.

Also, it's not that I'm unable to deal with that; after a few
iterations, I usually succeed.  My problem was not that I can't do it,
my problem was that I felt I was doing it suboptimally, and wanted to
see how smarter/more knowledgeable people deal with that.

>>> Now I'm wondering: is my approach (read one token at a time, but neve=
r
>>> go back, so that I can't really "peek" at the next one) reasonable?
>>> Maybe I should just read all tokens in a list?  I do not like this
>>> approach very much.  I could also set up a buffer, which would contai=
n
>>> zero or one tokens to read, and put the already read token in that
>>> buffer in some cases (pretty much what TeX's \futurelet does.  Now
>>> I appreciate why it's there...).
>
> Most languages are designed to be (=3D to have a grammar that is) LL(1)=
;
> there are also LR(0), SLR(1), LALR(1) languages, but as you can see, th=
e
> parameter is at most 1 in general.   What this means is that the parser
> can work my looking ahead at most 1 token.  That is, it reads the
> current tokens, and it may look the next token, before deciding what
> grammar rule to apply.  Theorically, we could design languages that
> require a bigger look-ahead, but in practice it's not useful; in the
> case where the grammar would require longer look ahead, we often can
> easily add some syntax (a prefix keyword) to make it back into LL(1) (o=
r
> LALR(1) if you're into that kind of grammar).

Now my lack of education is easily seen.  I only heard about formal
grammars (well, I had one class about them - I mean, /one class/, 90
minutes, some 15 years ago).

> Why is it useful?  Because it allows to read, scan and parse the source
> code by leaving it in a file and loading only one or two tokens in
> memory at once: it is basically an optimization for when you're
> inventing parsers on computers that don't have a lot of memory in the 6=
0s.

And basically, this confirms my intuition that reading one token at
a time is not necessarily a stupid thing to do.

> And then! Even the first FORTRAN compiler, the one in 63 passes,
> actually kept the program source in memory (4 Kw), and instead loaded
> alternatively the passes of the compiler to process the data structures
> of the program that remained in memory!

Interesting!

> So indeed, there's very little reason to use short look-ahead, only tha=
t
> we have a theorical body well developped to generate parsers
> automatically from grammar of these forms.

I see.

> So, reading the whole source file in memory (or actually, already havin=
g
> it in memory, eg. in editor/compiler IDEs), is also a natural solution.
>
> Also for some languages, the processing of the source is defined in
> phases such as you end up easily having the whole sequence of tokens in
> memory. For example, the C preprocessor (but that's another story).
>
> Finally, parser generators such as PACKRAT being able to process
> grammars with unlimited lookahead, can benefit from pre-loading the
> whole source in memory.

Thanks for sharing - as hinted above, I have a lot to learn!

> In any case, it's rather an immaterial question, since on one side, you
> have abstractions such as lazy streams that let you process sequences
> (finite or infinite) as an I/O stream where you get each element in
> sequence and of course, you can copy a finite stream back into a
> sequence.  Both abstractions can be useful and used to write elegant
> algorithms.  So it doesn't matter.  Just have a pair of functions to
> convert buffers into streams and streams into buffer and use whichever
> you need for the current algorithm!

And most probably I'll end up coding an abstraction like this, with
a function for looking at the next token without =E2=80=9Cconsuming=E2=80=
=9D it, and
a function for =E2=80=9Cpopping=E2=80=9D the next token.  Converting betw=
een buffers and
streams wouldn=E2=80=99t be very useful for me, since I would either lose=
 the
whole text structure (line-breaks, comments), or have to do a lot of
work to actually preserve it.

>> I really don't get the point in which way the Python example would hav=
e
>> advantages over yours.  The only difference is that your version
>> combines the two steps that are separate in the Python example.  Your
>> version is more efficient, since it avoids building a very long list
>> that is not really needed and will cause a lot of garbage collection t=
o
>> be done afterwards.
>
> Nowadays sources, even of complete OS such as Android, are much smaller
> than the available RAM.  Therefore loading the whole file in RAM and
> building an index of tokens into it will be more efficient than
> performing O(n) I/O syscalls.

OTOH, here I walk an Emacs buffer and not an external file.  Moreover,
as I said, I don=E2=80=99t want to lose info on where I am in the source.

Thanks!

--=20
Marcin Borkowski
http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski
Faculty of Mathematics and Computer Science
Adam Mickiewicz University