unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* making "$BEZ20(B $B2mL-(B" say "TSUCHIYA Masatoshi" in Japanese
@ 2005-04-06  6:18 Joe Corneli
  2005-04-06 15:45 ` Kevin Rodgers
       [not found] ` <mailman.452.1112800777.2895.help-gnu-emacs@gnu.org>
  0 siblings, 2 replies; 3+ messages in thread
From: Joe Corneli @ 2005-04-06  6:18 UTC (permalink / raw)



The question is: how to make sense of the output from

 lynx -dump http://emacs-w3m.namazu.org/ml/msg07861.html

This looks harder than chinese utf translations, though I think that
once I get the pattern down, it won't really be any harder (see
http://lists.gnu.org/archive/html/help-gnu-emacs/2005-03/msg00689.html).

But what is the pattern?

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: making "$BEZ20(B $B2mL-(B" say "TSUCHIYA Masatoshi" in Japanese
  2005-04-06  6:18 making "$BEZ20(B $B2mL-(B" say "TSUCHIYA Masatoshi" in Japanese Joe Corneli
@ 2005-04-06 15:45 ` Kevin Rodgers
       [not found] ` <mailman.452.1112800777.2895.help-gnu-emacs@gnu.org>
  1 sibling, 0 replies; 3+ messages in thread
From: Kevin Rodgers @ 2005-04-06 15:45 UTC (permalink / raw)


Joe Corneli wrote:
 > The question is: how to make sense of the output from
 >
 >  lynx -dump http://emacs-w3m.namazu.org/ml/msg07861.html
 >
 > This looks harder than chinese utf translations, though I think that
 > once I get the pattern down, it won't really be any harder (see
 > http://lists.gnu.org/archive/html/help-gnu-emacs/2005-03/msg00689.html).
 >
 > But what is the pattern?

You have to pay attention to the HTTP headers sent by the server, in
particular:

Content-Type: text/html; charset=iso-2022-jp

According to the lynx man page, the -mime_header option "prints the MIME
header of a fetched document along with its source".  Alternatively, you
can call lynx twice, first with the -head option to get just the HTTP
headers and then with -dump to get the content.

You should also try to handle Content-Transfer-Encoding headers.

-- 
Kevin Rodgers

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: making "$BEZ20(B $B2mL-(B" say "TSUCHIYA Masatoshi" in Japanese
       [not found] ` <mailman.452.1112800777.2895.help-gnu-emacs@gnu.org>
@ 2005-04-07  0:20   ` Miles Bader
  0 siblings, 0 replies; 3+ messages in thread
From: Miles Bader @ 2005-04-07  0:20 UTC (permalink / raw)


Kevin Rodgers <ihs_4664@yahoo.com> writes:
>  > The question is: how to make sense of the output from
>  >
>  >  lynx -dump http://emacs-w3m.namazu.org/ml/msg07861.html
>
> You have to pay attention to the HTTP headers sent by the server, in
> particular:
>
> Content-Type: text/html; charset=iso-2022-jp

In addition, you have to make sure you actually get the raw text --
the "encoded" text in Joe's original post is missing crucial ESC (\033)
characters[*], so there's no way to decode it correctly.

I tried using "lynx -dump ..." myself, and it looks like it's lynx
which is doing the stripping; the documentation seems to indicate that
the "-raw" option will disable this stripping, but I couldn't get it
to work(maybe this only works for terminal display, not with -dump?).

-Miles


[*] The correct binary text, with the escape characters replaced
by the two-character sequence "^[" to make Gnus happy, is:

   "^[$BEZ20^[(B ^[$B2mL-^[(B" say "TSUCHIYA Masatoshi"

Emacs can decode this using the above charset to:

   "土屋 雅稔" say "TSUCHIYA Masatoshi"


-- 
If you can't beat them, arrange to have them beaten.  [George Carlin]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2005-04-07  0:20 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-06  6:18 making "$BEZ20(B $B2mL-(B" say "TSUCHIYA Masatoshi" in Japanese Joe Corneli
2005-04-06 15:45 ` Kevin Rodgers
     [not found] ` <mailman.452.1112800777.2895.help-gnu-emacs@gnu.org>
2005-04-07  0:20   ` Miles Bader

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).