From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Marcin Borkowski Newsgroups: gmane.emacs.help Subject: Re: How the backquote and the comma really work? Date: Sun, 23 Aug 2015 10:30:23 +0200 Message-ID: <87y4h2h0f4.fsf@mbork.pl> References: <87vbebg1fs.fsf@mbork.pl> <87r3ozy9pf.fsf@web.de> <87si9ffys0.fsf@mbork.pl> <87d20jbqbj.fsf@web.de> <87pp4jfx9y.fsf@mbork.pl> <87615sxn1a.fsf@mbork.pl> <87zj318j7z.fsf@web.de> <87mvz1b16h.fsf@mbork.pl> <87k2u5azfi.fsf@mbork.pl> <87615mbo3z.fsf@mbork.pl> <877fptnoyu.fsf@web.de> <87k2tpyajy.fsf@web.de> <87h9o6gihr.fsf@mbork.pl> <878u9geaf7.fsf@kuiper.lan.informatimago.com> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1440318669 7115 80.91.229.3 (23 Aug 2015 08:31:09 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 23 Aug 2015 08:31:09 +0000 (UTC) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Sun Aug 23 10:31:01 2015 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1ZTQg7-0000zX-VC for geh-help-gnu-emacs@m.gmane.org; Sun, 23 Aug 2015 10:31:00 +0200 Original-Received: from localhost ([::1]:49682 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZTQg7-0008Qw-2T for geh-help-gnu-emacs@m.gmane.org; Sun, 23 Aug 2015 04:30:59 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:42822) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZTQfw-0008Pw-69 for help-gnu-emacs@gnu.org; Sun, 23 Aug 2015 04:30:49 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZTQfs-0006KC-V8 for help-gnu-emacs@gnu.org; Sun, 23 Aug 2015 04:30:48 -0400 Original-Received: from mail.mojserwer.eu ([2a01:5e00:2:52::8]:51390) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZTQfs-0006J3-HF for help-gnu-emacs@gnu.org; Sun, 23 Aug 2015 04:30:44 -0400 Original-Received: from localhost (localhost [127.0.0.1]) by mail.mojserwer.eu (Postfix) with ESMTP id 59DBF572006 for ; Sun, 23 Aug 2015 10:30:38 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at mail.mojserwer.eu Original-Received: from mail.mojserwer.eu ([127.0.0.1]) by localhost (mail.mojserwer.eu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id QsiysXRwgdwR for ; Sun, 23 Aug 2015 10:30:32 +0200 (CEST) Original-Received: from localhost (103-115.echostar.pl [213.156.103.115]) by mail.mojserwer.eu (Postfix) with ESMTPSA id 42A01572005 for ; Sun, 23 Aug 2015 10:30:31 +0200 (CEST) In-reply-to: <878u9geaf7.fsf@kuiper.lan.informatimago.com> X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2a01:5e00:2:52::8 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:106776 Archived-At: On 2015-08-12, at 18:30, Pascal J. Bourguignon wr= ote: > Michael Heerdegen writes: > >> Marcin Borkowski writes: >> >>> Interestingly, there's a lot of buzz about Lisp /interpreter/ written >>> in Lisp, but not so much about Lisp /reader/ written in Lisp. In >>> fact, I didn't find one on the Internet. > > Not looking good enough. > > https://gitlab.com/com-informatimago/com-informatimago/tree/master/comm= on-lisp/lisp-reader Thanks! > and of course, there's one in each lisp implementation. But often in C or something, not in Lisp. >> Good question. Maybe it's because doing such things is mainly for >> educational reasons, and when you want to learn how a language works, >> studying the interpreter is more beneficial. > > But also, it's assumed that by teaching the most complex subjects, > people will be able to deal with the less complex subjects by > themselves.=20 > > Sometimes indeed it looks like not. Especially if one doesn't have a CS background, and is mostly self-taught. Also, it's not that I'm unable to deal with that; after a few iterations, I usually succeed. My problem was not that I can't do it, my problem was that I felt I was doing it suboptimally, and wanted to see how smarter/more knowledgeable people deal with that. >>> Now I'm wondering: is my approach (read one token at a time, but neve= r >>> go back, so that I can't really "peek" at the next one) reasonable? >>> Maybe I should just read all tokens in a list? I do not like this >>> approach very much. I could also set up a buffer, which would contai= n >>> zero or one tokens to read, and put the already read token in that >>> buffer in some cases (pretty much what TeX's \futurelet does. Now >>> I appreciate why it's there...). > > Most languages are designed to be (=3D to have a grammar that is) LL(1)= ; > there are also LR(0), SLR(1), LALR(1) languages, but as you can see, th= e > parameter is at most 1 in general. What this means is that the parser > can work my looking ahead at most 1 token. That is, it reads the > current tokens, and it may look the next token, before deciding what > grammar rule to apply. Theorically, we could design languages that > require a bigger look-ahead, but in practice it's not useful; in the > case where the grammar would require longer look ahead, we often can > easily add some syntax (a prefix keyword) to make it back into LL(1) (o= r > LALR(1) if you're into that kind of grammar). Now my lack of education is easily seen. I only heard about formal grammars (well, I had one class about them - I mean, /one class/, 90 minutes, some 15 years ago). > Why is it useful? Because it allows to read, scan and parse the source > code by leaving it in a file and loading only one or two tokens in > memory at once: it is basically an optimization for when you're > inventing parsers on computers that don't have a lot of memory in the 6= 0s. And basically, this confirms my intuition that reading one token at a time is not necessarily a stupid thing to do. > And then! Even the first FORTRAN compiler, the one in 63 passes, > actually kept the program source in memory (4 Kw), and instead loaded > alternatively the passes of the compiler to process the data structures > of the program that remained in memory! Interesting! > So indeed, there's very little reason to use short look-ahead, only tha= t > we have a theorical body well developped to generate parsers > automatically from grammar of these forms. I see. > So, reading the whole source file in memory (or actually, already havin= g > it in memory, eg. in editor/compiler IDEs), is also a natural solution. > > Also for some languages, the processing of the source is defined in > phases such as you end up easily having the whole sequence of tokens in > memory. For example, the C preprocessor (but that's another story). > > Finally, parser generators such as PACKRAT being able to process > grammars with unlimited lookahead, can benefit from pre-loading the > whole source in memory. Thanks for sharing - as hinted above, I have a lot to learn! > In any case, it's rather an immaterial question, since on one side, you > have abstractions such as lazy streams that let you process sequences > (finite or infinite) as an I/O stream where you get each element in > sequence and of course, you can copy a finite stream back into a > sequence. Both abstractions can be useful and used to write elegant > algorithms. So it doesn't matter. Just have a pair of functions to > convert buffers into streams and streams into buffer and use whichever > you need for the current algorithm! And most probably I'll end up coding an abstraction like this, with a function for looking at the next token without =E2=80=9Cconsuming=E2=80= =9D it, and a function for =E2=80=9Cpopping=E2=80=9D the next token. Converting betw= een buffers and streams wouldn=E2=80=99t be very useful for me, since I would either lose= the whole text structure (line-breaks, comments), or have to do a lot of work to actually preserve it. >> I really don't get the point in which way the Python example would hav= e >> advantages over yours. The only difference is that your version >> combines the two steps that are separate in the Python example. Your >> version is more efficient, since it avoids building a very long list >> that is not really needed and will cause a lot of garbage collection t= o >> be done afterwards. > > Nowadays sources, even of complete OS such as Android, are much smaller > than the available RAM. Therefore loading the whole file in RAM and > building an index of tokens into it will be more efficient than > performing O(n) I/O syscalls. OTOH, here I walk an Emacs buffer and not an external file. Moreover, as I said, I don=E2=80=99t want to lose info on where I am in the source. Thanks! --=20 Marcin Borkowski http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski Faculty of Mathematics and Computer Science Adam Mickiewicz University