From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "Pascal J. Bourguignon" Newsgroups: gmane.emacs.help Subject: Re: How the backquote and the comma really work? Date: Wed, 12 Aug 2015 18:30:52 +0200 Organization: Informatimago Message-ID: <878u9geaf7.fsf@kuiper.lan.informatimago.com> References: <87vbebg1fs.fsf@mbork.pl> <87r3ozy9pf.fsf@web.de> <87si9ffys0.fsf@mbork.pl> <87d20jbqbj.fsf@web.de> <87pp4jfx9y.fsf@mbork.pl> <87615sxn1a.fsf@mbork.pl> <87zj318j7z.fsf@web.de> <87mvz1b16h.fsf@mbork.pl> <87k2u5azfi.fsf@mbork.pl> <87615mbo3z.fsf@mbork.pl> <877fptnoyu.fsf@web.de> <87k2tpyajy.fsf@web.de> <87h9o6gihr.fsf@mbork.pl> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: ger.gmane.org 1439397333 28037 80.91.229.3 (12 Aug 2015 16:35:33 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 12 Aug 2015 16:35:33 +0000 (UTC) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Wed Aug 12 18:35:26 2015 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1ZPYzt-0001ju-Dj for geh-help-gnu-emacs@m.gmane.org; Wed, 12 Aug 2015 18:35:25 +0200 Original-Received: from localhost ([::1]:39442 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZPYzs-00088c-Rn for geh-help-gnu-emacs@m.gmane.org; Wed, 12 Aug 2015 12:35:24 -0400 Original-Path: usenet.stanford.edu!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail Original-Newsgroups: gnu.emacs.help Original-Lines: 99 Original-X-Trace: individual.net U8xCwXGs+Jkzlk8wptvP6AG3oA8texeRBrOH/Lg03sxctX2fWB Cancel-Lock: sha1:MzdhOWIzZmQ1NTczY2Y1NTE3ZDNhNWI5N2U2NDQwY2I1ZGQ1YTE0Yw== sha1:0UwbtytcbrgbMSdS9unUp9mT9KE= Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwAQMAAABtzGvEAAAABlBMVEUAAAD///+l2Z/dAAAA oElEQVR4nK3OsRHCMAwF0O8YQufUNIQRGIAja9CxSA55AxZgFO4coMgYrEDDQZWPIlNAjwq9 033pbOBPtbXuB6PKNBn5gZkhGa86Z4x2wE67O+06WxGD/HCOGR0deY3f9Ijwwt7rNGNf6Oac l/GuZTF1wFGKiYYHKSFAkjIo1b6sCYS1sVmFhhhahKQssRjRT90ITWUk6vvK3RsPGs+M1RuR mV+hO/VvFAAAAABJRU5ErkJggg== X-Accept-Language: fr, es, en User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) Original-Xref: usenet.stanford.edu gnu.emacs.help:214225 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:106509 Archived-At: Michael Heerdegen writes: > Marcin Borkowski writes: > >> Interestingly, there's a lot of buzz about Lisp /interpreter/ written >> in Lisp, but not so much about Lisp /reader/ written in Lisp. In >> fact, I didn't find one on the Internet. Not looking good enough. https://gitlab.com/com-informatimago/com-informatimago/tree/master/common-lisp/lisp-reader and of course, there's one in each lisp implementation. > Good question. Maybe it's because doing such things is mainly for > educational reasons, and when you want to learn how a language works, > studying the interpreter is more beneficial. But also, it's assumed that by teaching the most complex subjects, people will be able to deal with the less complex subjects by themselves. Sometimes indeed it looks like not. >> Now I'm wondering: is my approach (read one token at a time, but never >> go back, so that I can't really "peek" at the next one) reasonable? >> Maybe I should just read all tokens in a list? I do not like this >> approach very much. I could also set up a buffer, which would contain >> zero or one tokens to read, and put the already read token in that >> buffer in some cases (pretty much what TeX's \futurelet does. Now >> I appreciate why it's there...). Most languages are designed to be (= to have a grammar that is) LL(1); there are also LR(0), SLR(1), LALR(1) languages, but as you can see, the parameter is at most 1 in general. What this means is that the parser can work my looking ahead at most 1 token. That is, it reads the current tokens, and it may look the next token, before deciding what grammar rule to apply. Theorically, we could design languages that require a bigger look-ahead, but in practice it's not useful; in the case where the grammar would require longer look ahead, we often can easily add some syntax (a prefix keyword) to make it back into LL(1) (or LALR(1) if you're into that kind of grammar). Why is it useful? Because it allows to read, scan and parse the source code by leaving it in a file and loading only one or two tokens in memory at once: it is basically an optimization for when you're inventing parsers on computers that don't have a lot of memory in the 60s. And then! Even the first FORTRAN compiler, the one in 63 passes, actually kept the program source in memory (4 Kw), and instead loaded alternatively the passes of the compiler to process the data structures of the program that remained in memory! So indeed, there's very little reason to use short look-ahead, only that we have a theorical body well developped to generate parsers automatically from grammar of these forms. So, reading the whole source file in memory (or actually, already having it in memory, eg. in editor/compiler IDEs), is also a natural solution. Also for some languages, the processing of the source is defined in phases such as you end up easily having the whole sequence of tokens in memory. For example, the C preprocessor (but that's another story). Finally, parser generators such as PACKRAT being able to process grammars with unlimited lookahead, can benefit from pre-loading the whole source in memory. In any case, it's rather an immaterial question, since on one side, you have abstractions such as lazy streams that let you process sequences (finite or infinite) as an I/O stream where you get each element in sequence and of course, you can copy a finite stream back into a sequence. Both abstractions can be useful and used to write elegant algorithms. So it doesn't matter. Just have a pair of functions to convert buffers into streams and streams into buffer and use whichever you need for the current algorithm! > I really don't get the point in which way the Python example would have > advantages over yours. The only difference is that your version > combines the two steps that are separate in the Python example. Your > version is more efficient, since it avoids building a very long list > that is not really needed and will cause a lot of garbage collection to > be done afterwards. Nowadays sources, even of complete OS such as Android, are much smaller than the available RAM. Therefore loading the whole file in RAM and building an index of tokens into it will be more efficient than performing O(n) I/O syscalls. -- __Pascal Bourguignon__ http://www.informatimago.com/ “The factory of the future will have only two employees, a man and a dog. The man will be there to feed the dog. The dog will be there to keep the man from touching the equipment.” -- Carl Bass CEO Autodesk