From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: A Soare Newsgroups: gmane.emacs.devel Subject: Embedding HTML in Lisp. Date: Sun, 22 Jun 2008 22:53:21 +0200 (CEST) Message-ID: <25112253.8571481214168001329.JavaMail.www@wwinf4622> Reply-To: alinsoar@voila.fr NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1214168018 17334 80.91.229.12 (22 Jun 2008 20:53:38 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 22 Jun 2008 20:53:38 +0000 (UTC) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Jun 22 22:54:22 2008 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1KAWZp-0002FK-DS for ged-emacs-devel@m.gmane.org; Sun, 22 Jun 2008 22:54:21 +0200 Original-Received: from localhost ([127.0.0.1]:52234 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KAWZ0-0004Yk-7g for ged-emacs-devel@m.gmane.org; Sun, 22 Jun 2008 16:53:30 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1KAWYv-0004Yf-MM for emacs-devel@gnu.org; Sun, 22 Jun 2008 16:53:25 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1KAWYu-0004YS-Q4 for emacs-devel@gnu.org; Sun, 22 Jun 2008 16:53:25 -0400 Original-Received: from [199.232.76.173] (port=56340 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KAWYu-0004YN-Mx for emacs-devel@gnu.org; Sun, 22 Jun 2008 16:53:24 -0400 Original-Received: from smtp3.voila.fr ([193.252.22.173]:45277) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1KAWYt-00006n-SC for emacs-devel@gnu.org; Sun, 22 Jun 2008 16:53:24 -0400 Original-Received: from me-wanadoo.net (localhost [127.0.0.1]) by mwinf4201.voila.fr (SMTP Server) with ESMTP id 8EA131C00085 for ; Sun, 22 Jun 2008 22:53:21 +0200 (CEST) Original-Received: from wwinf4622 (wwinf4622 [10.232.13.46]) by mwinf4201.voila.fr (SMTP Server) with ESMTP id 58E721C00081 for ; Sun, 22 Jun 2008 22:53:21 +0200 (CEST) X-ME-UUID: 20080622205321364.58E721C00081@mwinf4201.voila.fr X-Originating-IP: [93.112.69.191] X-Wum-Nature: EMAIL-NATURE X-WUM-FROM: |~| X-WUM-TO: |~| X-WUM-REPLYTO: |~| X-me-spamlevel: not-spam X-me-spamrating: 47.070705 X-me-spamcause: OK, (35)(0000)fghrlhcuvffnffculddutddmnegsohhnjhhouhhrucdlqddvhedmnefvrghgshcuihhnuchtvgigthcuphgrrhhtucdlhedtmd X-detected-kernel: by monty-python.gnu.org: Linux 2.4-2.6 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:99712 Archived-At: Bonjour =C3=A0 tous. I have to prepare an exam and I needed a good access to a French dictionary. I considered that the best choice is http://www.cnrtl.fr/, a database of the "Centre national de la recherche scientifique". This dictionary is provided in html format. So I thought how to get access to a html page. I tried to define html. I define html as =C2=AB html is a programming language like any other programming language =C2=BB. Being a language, it can be transformed in lisp and evaluated using the lisp evaluator. Apart a few special tags (like BR, HR, , etc), the grammar of the html is identical ( via an isomorphism ) to the lisp grammar: text <----> (tag (expr) text) So I transformed html in lisp via this iso.
x y
a b
will become (table (tr (td "x") (td "y")) (tr (td "a") (td "b"))) and this form will be evaluated after the rules of the emacs lisp evaluator. Now I have to give the tags' definitions. I observe that the lisp evaluator will first call the functions `td' with a string as parameter, and then will call the functions `tr' with the output of `td' as parameter, and finally will call `table' with a list of elements that are the ouputs from `tr' functions as parameter. Hence html is a subset of lisp, in which the functions are the symbols like table, div, input, center, I, B, et ctera. So I did so: 1. I downloaded the page from CNRTL. 2. I filtered it to extract just the definition, and cut the unuseful informations. (using a signal->filter->accumulate library (not included in the example I send here)) 3. I transformed the result using the grammar isomorphism 4. I evaluated the resulted lisp structure. The result of the 4th step is the context of the filtered html page (just the definition). So that was my implementation of html. Now if I look in the dictionary, to see the definition of html: http://fr.wikipedia.org/wiki/Hypertext_Markup_Language =C2=AB L'Hypertext Markup Language, g=C3=A9n=C3=A9ralement abr=C3=A9g=C3=A9= HTML, est le **format de donn=C3=A9es** con=C3=A7u pour repr=C3=A9senter les pages web. = Il permet notamment d'implanter de l'hypertexte dans le contenu des pages et repose sur un **langage de balisage**, d=E2=80=99o=C3=B9 son nom =C2=BB More, look at the definition of =C2=ABlangage de balisage=C2=BB: http://fr.wikipedia.org/wiki/Langage_de_balisage =C2=AB L'inclusion de balises permet de transf=C3=A9rer =C3=A0 la fois la s= tructure du document et son contenu. Cette structure est compr=C3=A9hensible par un **programme informatique** , ce qui autorise un affichage personnalis=C3=A9 selon des r=C3=A8gles pr=C3=A9-=C3=A9tablies =C2=BB So, in the dictionaries, html is considered as a data structure, that can be rendered by a program. Maybe the lisp evaluator is that program in my code... and the data structure is a lisp structure obtained by html->lisp... When I first read this definition, it sounded me very strange. I redefined html as a programming language =C2=ABen tant que tel=C2=BB, as = any other programming language, neither as a =C2=ABlangage de balisage=C2=BB, n= or as =C2=ABformat de donn=C3=A9es=C2=BB. In this new implementation of html the emacs lisp user will have all the liberty to redefine the rendering of every html (=3D lisp) symbol as one wishes. For example, if the user does not like the default implementation of the tag `table', he will know that the tag `table' is a lisp function that receives as parameter a list that contains many list (rows), and every such list contains strings that are the information of columns (depending of the implementation of TD and TR). The function must return a string that is the image of the table. More than that, one could attach to this tag everything semantically, like a cond form in lisp, or progn. But in this case the html standard would not be accomplished. In lisp a procedure must always return an object (Lisp_Object) that can be of a few types. In html we return an object that must be a string or a list, depending on the html function (id est: we chop, like Edward Scissorhands, the lisp evaluator and we obtain so a html evaluator). When a table is inside another table (or any object inside another object), the inner table will be evaluated first, then the outside table. Maybe the outside object (function) will not like the dimensions of the returned image of the inner object, and want to rescale it. To solve this, during parsing one can add the quoted text of every procedure after its definition, for example: (table (tr (td "x") '(td "x") (td "y") '(td "y")) '(tr (td "x") '(td "x") (td "y") '(td "y")) (tr (td (table (tr (td "a") '(td "a") (td "b") '(td "b")) (tr (td "x") '(td "x") (td "b") '(td "b"))) '(table (tr (td "a") '(td "a") (td "b") '(td "b")) (tr (td "x") '(td "x") (td "b") '(td "b")))) (td "o"))) and, in case the table "x y" will be able to rescale the inner table "a b". This has the disavantage that the lisp code grows exponentially with the html tree's deep. Another solution is to insert the percentage of the current object during the transformation html->lisp. For example: (table (tr (td 50 "x") (td 50 "y")) (tr (td 40 "a") (td 60 "b"))) When the lisp evaluator will call every function, it will know exactly how huch the width of that element must me. There are many possibilities to solve this problem. The transformation of structures from html to lisp in html->lisp is very inefficient in my implementation (it's just for test) and have to be embedded in C (with DEFUN) for a good speed. For the rest, I will write in the near future for myself a few more filters for a few French newspapers that I read dayly, and I will add some more definitions like , etc. But I will not define all the tags of the html language; I will define just that ones that I need for these pages. The current definitions of div, span, etc, are adapted for the CNRTL dictionnary. That was just what I needed here. One can test it using the call (cnrtl-get 'french-word). I believe that the ideas from this implementation of html (cnrtl), can be used in future to implement a complete web browser embedded in emacs lisp, more customizable than any other browser of emacs (in fact emacs does not have an incorporated web browser). Html is lisp. Finally, I wish to dedicate this implementation of html as a programming language to Julie B. White from San Diego with all the gratitude for her encouragements and for a beautiful friendship. Alin C. Soare PS: I promised 1 year ago that I would send the indentation of the lisp code in O(n) time complexity, and I will send it, I hope soon, because I have much to work and I do not have time at this instant to check it out before. ____________________________________________________ En quelques secondes, cr=C3=A9ez-vous une autre adresse mail ! http://mail.= voila.fr