* When is a text file not a text file?
@ 2004-01-02 17:24 sebyte
2004-01-09 9:43 ` Oliver Scholz
0 siblings, 1 reply; 4+ messages in thread
From: sebyte @ 2004-01-02 17:24 UTC (permalink / raw)
Hi all,
I often use the command 'html2text <htmlfile>' in a *shell* buffer to view html
files, and nine times out of ten the file is displayed beautifully, (with all
the html tags removed etc). However, the command 'html2text -o <newfilename>
<htmlfile>', writes a file to disk which when opened in an Emacs buffer, (or any
text editor for that matter), displays more formatting tags than actual text!
In fact, it appears as if only some of the text is to be found buried amongst
all the tags! Yet returning to the *shell* buffer and issuing the command 'cat
<newfilename>' diplays the file as it is meant to be seen once more.
No doubt there is a simple explanation for this, but damned if I know where to
even start! I don't believe it as an html2text issue as I have observed similar
behaviour before with html files I've downloaded.
TIA for any explanations of what's actually going on here.
sebyte
P.S. html2text is available through Fink, (for potentially interested OS X users
out there).
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: When is a text file not a text file?
2004-01-02 17:24 When is a text file not a text file? sebyte
@ 2004-01-09 9:43 ` Oliver Scholz
2004-01-09 18:17 ` sebyte
0 siblings, 1 reply; 4+ messages in thread
From: Oliver Scholz @ 2004-01-09 9:43 UTC (permalink / raw)
sebyte <sdt133@netscape.net> writes:
[...]
> However, the command 'html2text -o <newfilename> <htmlfile>', writes
> a file to disk which when opened in an Emacs buffer, (or any text
> editor for that matter), displays more formatting tags than actual
> text!
[...]
> Yet returning to the *shell* buffer and issuing the command cat
> <newfilename>' diplays the file as it is meant to be seen once more.
[...]
What "tags" are these? I don't know the actual program you are
using. But I seem to recall that I once had a program "html2text" or
"htmltotxt" or whatever that procuced a text file as output
*containing ANSI escape sequences for colours*.
Could that be the case? It would explain why dumping such a file on
the tty---unlike visiting it with a text editor---would display it
correctly.
If this *is* the case, then you probably could do something with
ansi-color.el, though I don't know offhand how exactly.
Oliver
--
20 Nivôse an 212 de la Révolution
Liberté, Egalité, Fraternité!
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: When is a text file not a text file?
2004-01-09 9:43 ` Oliver Scholz
@ 2004-01-09 18:17 ` sebyte
2004-01-09 18:48 ` Oliver Scholz
0 siblings, 1 reply; 4+ messages in thread
From: sebyte @ 2004-01-09 18:17 UTC (permalink / raw)
> What "tags" are these? I don't know the actual program you are
> using. But I seem to recall that I once had a program "html2text" or
> "htmltotxt" or whatever that procuced a text file as output
> *containing ANSI escape sequences for colours*.
>
> Could that be the case? It would explain why dumping such a file on
> the tty---unlike visiting it with a text editor---would display it
> correctly.
>
> If this *is* the case, then you probably could do something with
> ansi-color.el, though I don't know offhand how exactly.
>
> Oliver
Hi Oliver,
Thanks for your time. Here's an example of html2text's output, displayed in an
Emacs buffer:
C^HCo^Hop^Hpy^Hyr^Hri^Hig^Hgh^Hht^Ht n^Hno^Hot^Hti^Hic^Hce^He:^H: All
reader-contributed material on freshmeat.net is the
property and responsibility of its author; for reprint rights, please contact
the author directly.
-----------------------------------------------------------------------------
Let me repeat that: OS X is not Unix.
Consider the following: all of Apple.com's
_^Hm_^Ha_^Hr_^Hk_^He_^Ht_^Hi_^Hn_^Hg_^H _^Hp_^Ha_^Hg_^He_^Hs on the subject of
their darling new operating system are extremely careful to note that OS X is
"_^HU_^HN_^HI_^HX_^H-_^Hb_^Ha_^Hs_^He_^Hd".
Here is how it looks on a tty or in an Emacs *shell* buffer:
Copyright notice: All reader-contributed material on freshmeat.net is the
property and responsibility of its author; for reprint rights, please contact
the author directly.
-----------------------------------------------------------------------------
Let me repeat that: OS X is not Unix.
Consider the following: all of Apple.com's marketing pages on the subject of
their darling new operating system are extremely careful to note that OS X is
"UNIX-based".
I had thought that they might be remnants of HTML tags, (I must admit I didn't
look very closely), but I have found out since they are actually ANSI 'backspace
control sequences', used to preserve things like underlining and boldface. The
html2text option '-nobs' gets rid of them.
(After days spent looking for information, I discovered that html2text comes
with a manpage and all was revealed. DOH!)
sebyte
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: When is a text file not a text file?
2004-01-09 18:17 ` sebyte
@ 2004-01-09 18:48 ` Oliver Scholz
0 siblings, 0 replies; 4+ messages in thread
From: Oliver Scholz @ 2004-01-09 18:48 UTC (permalink / raw)
sebyte <sdt133@netscape.net> writes:
>> What "tags" are these? I don't know the actual program you are
>> using. But I seem to recall that I once had a program "html2text" or
>> "htmltotxt" or whatever that procuced a text file as output
>> *containing ANSI escape sequences for colours*.
[...]
> Thanks for your time. Here's an example of html2text's output,
> displayed in an Emacs buffer:
>
>
> C^HCo^Hop^Hpy^Hyr^Hri^Hig^Hgh^Hht^Ht n^Hno^Hot^Hti^Hic^Hce^He:^H:
> All reader-contributed material on freshmeat.net is the
[...]
> I had thought that they might be remnants of HTML tags, (I must admit
> I didn't look very closely), but I have found out since they are
> actually ANSI 'backspace control sequences', used to preserve things
> like underlining and boldface. The html2text option '-nobs' gets rid
> of them.
[...]
Untested: You could also try to call
(ansi-color-apply-on-region (point-min) (point-max))
on a buffer containing such ANSI control sequences (after a `(require
'ansi-color)' that is).
Oliver
--
20 Nivôse an 212 de la Révolution
Liberté, Egalité, Fraternité!
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2004-01-09 18:48 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-01-02 17:24 When is a text file not a text file? sebyte
2004-01-09 9:43 ` Oliver Scholz
2004-01-09 18:17 ` sebyte
2004-01-09 18:48 ` Oliver Scholz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).