From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: vasan999@hotmail.com Newsgroups: gmane.emacs.help Subject: Re: What is the best html to latex program on the market or the internet ? Date: Tue, 23 Oct 2007 00:05:22 -0000 Organization: http://groups.google.com Message-ID: <1193097922.257490.266390@i13g2000prf.googlegroups.com> References: <1193090235.827063.9090@v29g2000prd.googlegroups.com> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" X-Trace: ger.gmane.org 1193100042 28395 80.91.229.12 (23 Oct 2007 00:40:42 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 23 Oct 2007 00:40:42 +0000 (UTC) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Tue Oct 23 02:40:41 2007 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1Ik7p1-0003Ir-TX for geh-help-gnu-emacs@m.gmane.org; Tue, 23 Oct 2007 02:40:40 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Ik7ou-0007nS-5g for geh-help-gnu-emacs@m.gmane.org; Mon, 22 Oct 2007 20:40:32 -0400 Original-Path: shelby.stanford.edu!newsfeed.stanford.edu!postnews.google.com!i13g2000prf.googlegroups.com!not-for-mail Original-Newsgroups: comp.text.tex, alt.html, comp.lang.scheme, comp.unix.shell, gnu.emacs.help Original-Lines: 127 Original-NNTP-Posting-Host: 75.30.150.242 Original-X-Trace: posting.google.com 1193097922 342 127.0.0.1 (23 Oct 2007 00:05:22 GMT) Original-X-Complaints-To: groups-abuse@google.com Original-NNTP-Posting-Date: Tue, 23 Oct 2007 00:05:22 +0000 (UTC) In-Reply-To: <1193090235.827063.9090@v29g2000prd.googlegroups.com> User-Agent: G2/1.0 X-HTTP-UserAgent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8.1.8) Gecko/20071008 Firefox/2.0.0.8,gzip(gfe),gzip(gfe) Complaints-To: groups-abuse@google.com Injection-Info: i13g2000prf.googlegroups.com; posting-host=75.30.150.242; posting-account=ps2QrAMAAAA6_jCuRt2JEIpn5Otqf_w0 Original-Xref: shelby.stanford.edu comp.text.tex:360555 comp.lang.scheme:74493 comp.unix.shell:211020 gnu.emacs.help:153202 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:48702 Archived-At: The site says, that this will convert html to latex. Can anyone explain me this code? I am not familiar with such difficult commands especially there are no comments line by line explanation and overall operation. 1i\ \\documentstyle{article} 1i\ \\begin{document} $a\ \\end{document} # Too bad there's no way to make sed ignore case! /<[Xx][Mm][Pp]>/,/<.[Xx][Mm][Pp]>/b lit /<.[Xx][Mm][Pp]>/b lit /<[Ll][Ii][Ss][Tt][Ii][Nn][Gg]>/,/<.[Ll][Ii][Ss][Tt][Ii][Nn][Gg]>/b lit /<.[Ll][Ii][Ss][Tt][Ii][Nn][Gg]>/b lit /<[Pp][Rr][Ee]>/,/<.[Pp][Rr][Ee]>/b pre /<.[Pp][Rr][Ee]>/b pre # Stuff to ignore s?<[Ii][Ss][Ii][Nn][Dd][Ee][Xx]>?? s???g s?<[Nn][Ee][Xx][Tt][Ii][Dd][^>]*>??g # character set translations for LaTex special chars s?>.?>?g s?<.??\\par ?g s???g # Headings s?<[Tt][Ii][Tt][Ll][Ee]>\([^<]*\)?\ \section*{\1}?g s?<[Hh]n>?\\part{?g s??}?g s?<[Hh]1>?\\section*{?g s??}?g s?<[Hh]2>?\\subsection*{?g s?<[Hh]3>?\\subsubsection*{?g s?<[Hh]4>?\\subsubsection*{?g s?<[Hh]5>?\\paragraph{?g s?<[Hh]6>?\\subparagraph{?g # UL is itemize s?<[Uu][Ll]>?\\begin{itemize}?g s??\\end{itemize}?g s?<[Ll][Ii]>?\\item ?g # DL is description s?<[Dd][Ll]>?\\begin{description}?g s??\\end{description}?g # closing delimiter for DT is first < or end of line which ever comes first NO #s?<[Dd][Tt]>\([^<]*\)\([^<]*\)$?\\item[\1]?g #s?<[Dd][Dd]>??g s?<[Dd][Tt]>?\\item[?]?g # Other common SGML markup. this is ad-hoc s??? s???g # Italics s?\([^<]*\)?{\\it \1 }?g # Get rid of Anchors :pre s?<[Aa][^>]*>??g s???g # This is a subroutine in sed, in case you are not a sed guru : lit s?<[Xx][Mm][Pp]>?\\begin{verbatim}?g s??\\end{verbatim}? s?<[Ll][Ii][Ss][Tt][Ii][Nn][Gg]>?\\begin{verbatim}?g s??\\end{verbatim}? On Oct 22, 2:57 pm, vasan...@hotmail.com wrote: > Basically, it should do all that any of the tools below and in > addition, > > 1/ > human readable output that maintains the text lines of the source, ie > does not scramble the text lines or insert newlines unnecessarily or > removes them. inserts minimal latex elements. > > 2/ > maintains cross-links, ie convert > but if the set of htmls is incomplete proceed with the assumption that > the reference is there, ie dont delete the links or try to modify them > or their addresses. One of the tool I tested is too smart in this > respect and actually ruins the result. > > 3/ > proper conversion of images, tables, etc. No math mode involved in > html. > > 4/ > Even an emacs lisp function could be written by a guru that can do the > job. > > 5/ > Is there any commercial wysiwig tool ? > > LaTeX etc > > * html2latex is a program based on the NCSA html parser. Contact: > Nathan.Torking...@vuw.ac.nz. > * Another html2latex can combine several HTML files into a single > LaTeX file, converting links between the files to references. External > URL's can be converted into footnotes or into a bibliography sorted on > URL. Contact: F.J.Fa...@cs.utwente.nl (Frans J. Faase) > * Another html2latex implemented on Linux by yacc+lex+C. Also > available from the TSX-11 Linux FTP site as nc-html2latex-0.97.tar.gz. > Contact: naoc...@naochan.com (Naoya Tozuka) > * htmlatex.pl is a perl script to do the conversion (may be moving > soon). Contact: n9146...@cc.wwu.edu (Jake Kesinger) > * There is also a sed script to convert HTML into LaTeX.