unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* convert whole website from iso-8859-2/1 to utf-8
@ 2004-07-07  2:18 Miroslav Rovis
  0 siblings, 0 replies; 3+ messages in thread
From: Miroslav Rovis @ 2004-07-07  2:18 UTC (permalink / raw)


The files all have either:
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-2">
if in Croatian and
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
if in English or Italian, just as they're supposed to.

The iso-8859-2 files gave much resistance and broke a few nerves here at 
my desktop and around...
Until that is, I found:
http://lists.gnu.org/archive/html/help-gnu-emacs/2003-09/msg00467.html
(
 > prefer-coding-system utf-8
 > C-x C-f filename
After these two lines, what does Emacs print if you say
   M-: buffer-file-coding-system RET
?
)
Mine said iso-latin-1-dos instead of iso-latin-2-dos (pages made long 
ago, in the days mule wasn't yet an option for simple Emacs users... and 
Linux was truly hard to understand as well; now I do use it much better...)
So, as per:
http://lists.gnu.org/archive/html/help-gnu-emacs/2002-09/msg00181.html
I set:
(add-hook 'find-file-hooks
           '(lambda ()
              (if (equal buffer-file-coding-system 'iso-latin-2)
                  (set-language-environment "Latin-2"))))
in my .emacs file
But I am not certain that worked for me though...

Therefore I also made sure I learned to just
M-x set-buffer-file-coding-system RET
to
latin-2

in case it read -1(DOS)--
instead of -2(DOS)-- in the modeline (at the very left)

Surely now:
http://lists.gnu.org/archive/html/help-gnu-emacs/2004-06/msg00508.html
(Perhaps it is easiest to tell Emacs to save as UTF-8, by doing C-x
RET f utf-8 RET.)
the advice above worked fine.

(There was just no way to get it right when the 
buffer-file-coding-system variable was set to latin-1, the "meta ... 
charset=iso-8859-2" in the head of the html file made no difference.)

But, going manually is not an option, since I have a couple of hundreds 
pages, and then, I would like to be able to do it on other similar 
occasions.

Is there a lisp program for such conversion that could be run in batch mode?
If I engage I would certainly employ at least a week to learn such 
lisping if I would be able to make it at all.
Does anyone have any suggestion?
At least which functions to begin considering for this task?

I did find e.g.:
http://lists.gnu.org/archive/html/help-gnu-emacs/2003-11/msg00436.html
(
#!/use/bin/emacs --script
;; And after that you can use regular elisp code:
(princ "Hello world!")
;; end.
)
and I already experimented in the line on:
emacs -batch file -f function -l lisp-code-from-file
but I remained disoriented as to how these kind of things need are 
really done.
But if someone supplied a few hints I'll delve with some more hope into 
all those volumes of Emacs lore...

So thank you, knowledgeable and kind reader if you care to help!
Miroslav Rovis
www.exDeo.com

^ permalink raw reply	[flat|nested] 3+ messages in thread

* convert whole website from iso-8859-2/1 to utf-8
@ 2004-07-08 22:22 Miroslav Rovis
  0 siblings, 0 replies; 3+ messages in thread
From: Miroslav Rovis @ 2004-07-08 22:22 UTC (permalink / raw)


No one helped yet...
So I went scuba diving into the huge Emacs lisping ocean... Breath! Breath!
Two days! First modest success...
----------------------------------------------------------------------
----------------------------------------------------------------------
An HTML file (schoolih.htm):
----------------------------------------------------------------------
<html>
<head>
<title>Virtualna Škola Supero!</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-2">
</head>

<body>
   <h3>Dobrodošli u Virtualnu školu Supero! </h3>
   <p>Podučavam engleski, hrvatski i talijanski kao strane jezike. I u 
tome sam
     istinski vješt.</p>
   <p>Nudim također poduku početnika u informatici. I pisanju na Mreži.</p>
   <p>No ovo je ustvari virtualna škola. Nije &quot;prava&quot;.</p>
   <p>Naravno, budem li ikad imao stvarno malo čedo u obliku 
institucije, tako će se zvati.</p>
   <p>Podučavam brojne đake u stvarnom životu već lijepi broj godina.</p>
</body>
</html>
----------------------------------------------------------------------
----------------------------------------------------------------------
My first lisp program (latin2-utf8.sh):
(use at your own risk ;-)
----------------------------------------------------------------------
#!/usr/bin/emacs --script
;; step
(find-file "/test/schoolih.htm")
(princ "buffer-name is: ")
(princ (buffer-name))
(princ "\n")
(princ "buffer-file-name is: ")
(princ (buffer-file-name))
(princ "\n")
(princ "buffer-file-coding-system is: ")
(princ buffer-file-coding-system)
(princ "\n")
(princ "coding-system-for-write is: ")
(princ coding-system-for-write)
(princ "\n")
(princ "\n")
;; step
(set-visited-file-name "/test/schoolih_u.htm")
;; step
(search-forward "iso-8859-2" nil t)
(replace-match "utf-8" nil t)
;; step
(let ((coding-system-for-write 'utf-8))
(princ "buffer-name is: ")
(princ (buffer-name))
(princ "\n")
(princ "buffer-file-name is: ")
(princ (buffer-file-name))
(princ "\n")
(princ "buffer-file-coding-system is: ")
(princ buffer-file-coding-system)
(princ "\n")
(princ "coding-system-for-write is: ")
(princ coding-system-for-write)
(princ "\n")
(princ "\n")
;; step
;;(revert-buffer-with-coding-system 'utf-8)
(save-buffer (current-buffer)))
----------------------------------------------------------------------
((Of course, most of it is nothing other than my groping for solutions 
in this entirely new territory for me and can be cut out and forgotten.))
----------------------------------------------------------------------
----------------------------------------------------------------------
The program writes a file schoolih_u.htm which is identical in all to
   schoolih.htm, except for:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
and the fact that those:

š č đ ž ć

character are in genuine utf-8 flavour!
(any other characters peculiar to utf-8 would have been so in just the
same fashion -- anyone interested is encouraged to try)...

----------------------------------------------------------------------
----------------------------------------------------------------------
OK. Enough braggadoccio (oh, I know how little and puny this is, I
  know...).
This is a very small part of the whole project.

Any help is still appreciated.

May all lispers stay well and healthy, esp. in their souls!
Miroslav Rovis
www.rovis.org

----------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: convert whole website from iso-8859-2/1 to utf-8
       [not found] <mailman.290.1089318338.22971.help-gnu-emacs@gnu.org>
@ 2004-07-21  9:55 ` Thien-Thi Nguyen
  0 siblings, 0 replies; 3+ messages in thread
From: Thien-Thi Nguyen @ 2004-07-21  9:55 UTC (permalink / raw)


Miroslav Rovis <m.rovis@inet.hr> writes:

> Any help is still appreciated.

you may wish to use `message' instead of `princ'.
w/ `message' you can use %-style formatting, etc.
(output appears on stderr, however.)

thi

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2004-07-21  9:55 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-07-07  2:18 convert whole website from iso-8859-2/1 to utf-8 Miroslav Rovis
  -- strict thread matches above, loose matches on Subject: below --
2004-07-08 22:22 Miroslav Rovis
     [not found] <mailman.290.1089318338.22971.help-gnu-emacs@gnu.org>
2004-07-21  9:55 ` Thien-Thi Nguyen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).