* making "$BEZ20(B $B2mL-(B" say "TSUCHIYA Masatoshi" in Japanese
@ 2005-04-06 6:18 Joe Corneli
2005-04-06 15:45 ` Kevin Rodgers
[not found] ` <mailman.452.1112800777.2895.help-gnu-emacs@gnu.org>
0 siblings, 2 replies; 3+ messages in thread
From: Joe Corneli @ 2005-04-06 6:18 UTC (permalink / raw)
The question is: how to make sense of the output from
lynx -dump http://emacs-w3m.namazu.org/ml/msg07861.html
This looks harder than chinese utf translations, though I think that
once I get the pattern down, it won't really be any harder (see
http://lists.gnu.org/archive/html/help-gnu-emacs/2005-03/msg00689.html).
But what is the pattern?
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: making "$BEZ20(B $B2mL-(B" say "TSUCHIYA Masatoshi" in Japanese
2005-04-06 6:18 making "$BEZ20(B $B2mL-(B" say "TSUCHIYA Masatoshi" in Japanese Joe Corneli
@ 2005-04-06 15:45 ` Kevin Rodgers
[not found] ` <mailman.452.1112800777.2895.help-gnu-emacs@gnu.org>
1 sibling, 0 replies; 3+ messages in thread
From: Kevin Rodgers @ 2005-04-06 15:45 UTC (permalink / raw)
Joe Corneli wrote:
> The question is: how to make sense of the output from
>
> lynx -dump http://emacs-w3m.namazu.org/ml/msg07861.html
>
> This looks harder than chinese utf translations, though I think that
> once I get the pattern down, it won't really be any harder (see
> http://lists.gnu.org/archive/html/help-gnu-emacs/2005-03/msg00689.html).
>
> But what is the pattern?
You have to pay attention to the HTTP headers sent by the server, in
particular:
Content-Type: text/html; charset=iso-2022-jp
According to the lynx man page, the -mime_header option "prints the MIME
header of a fetched document along with its source". Alternatively, you
can call lynx twice, first with the -head option to get just the HTTP
headers and then with -dump to get the content.
You should also try to handle Content-Transfer-Encoding headers.
--
Kevin Rodgers
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: making "$BEZ20(B $B2mL-(B" say "TSUCHIYA Masatoshi" in Japanese
[not found] ` <mailman.452.1112800777.2895.help-gnu-emacs@gnu.org>
@ 2005-04-07 0:20 ` Miles Bader
0 siblings, 0 replies; 3+ messages in thread
From: Miles Bader @ 2005-04-07 0:20 UTC (permalink / raw)
Kevin Rodgers <ihs_4664@yahoo.com> writes:
> > The question is: how to make sense of the output from
> >
> > lynx -dump http://emacs-w3m.namazu.org/ml/msg07861.html
>
> You have to pay attention to the HTTP headers sent by the server, in
> particular:
>
> Content-Type: text/html; charset=iso-2022-jp
In addition, you have to make sure you actually get the raw text --
the "encoded" text in Joe's original post is missing crucial ESC (\033)
characters[*], so there's no way to decode it correctly.
I tried using "lynx -dump ..." myself, and it looks like it's lynx
which is doing the stripping; the documentation seems to indicate that
the "-raw" option will disable this stripping, but I couldn't get it
to work(maybe this only works for terminal display, not with -dump?).
-Miles
[*] The correct binary text, with the escape characters replaced
by the two-character sequence "^[" to make Gnus happy, is:
"^[$BEZ20^[(B ^[$B2mL-^[(B" say "TSUCHIYA Masatoshi"
Emacs can decode this using the above charset to:
"土屋 雅稔" say "TSUCHIYA Masatoshi"
--
If you can't beat them, arrange to have them beaten. [George Carlin]
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2005-04-07 0:20 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-06 6:18 making "$BEZ20(B $B2mL-(B" say "TSUCHIYA Masatoshi" in Japanese Joe Corneli
2005-04-06 15:45 ` Kevin Rodgers
[not found] ` <mailman.452.1112800777.2895.help-gnu-emacs@gnu.org>
2005-04-07 0:20 ` Miles Bader
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.