* convert whole website from iso-8859-2/1 to utf-8
@ 2004-07-07 2:18 Miroslav Rovis
0 siblings, 0 replies; 3+ messages in thread
From: Miroslav Rovis @ 2004-07-07 2:18 UTC (permalink / raw)
The files all have either:
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-2">
if in Croatian and
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
if in English or Italian, just as they're supposed to.
The iso-8859-2 files gave much resistance and broke a few nerves here at
my desktop and around...
Until that is, I found:
http://lists.gnu.org/archive/html/help-gnu-emacs/2003-09/msg00467.html
(
> prefer-coding-system utf-8
> C-x C-f filename
After these two lines, what does Emacs print if you say
M-: buffer-file-coding-system RET
?
)
Mine said iso-latin-1-dos instead of iso-latin-2-dos (pages made long
ago, in the days mule wasn't yet an option for simple Emacs users... and
Linux was truly hard to understand as well; now I do use it much better...)
So, as per:
http://lists.gnu.org/archive/html/help-gnu-emacs/2002-09/msg00181.html
I set:
(add-hook 'find-file-hooks
'(lambda ()
(if (equal buffer-file-coding-system 'iso-latin-2)
(set-language-environment "Latin-2"))))
in my .emacs file
But I am not certain that worked for me though...
Therefore I also made sure I learned to just
M-x set-buffer-file-coding-system RET
to
latin-2
in case it read -1(DOS)--
instead of -2(DOS)-- in the modeline (at the very left)
Surely now:
http://lists.gnu.org/archive/html/help-gnu-emacs/2004-06/msg00508.html
(Perhaps it is easiest to tell Emacs to save as UTF-8, by doing C-x
RET f utf-8 RET.)
the advice above worked fine.
(There was just no way to get it right when the
buffer-file-coding-system variable was set to latin-1, the "meta ...
charset=iso-8859-2" in the head of the html file made no difference.)
But, going manually is not an option, since I have a couple of hundreds
pages, and then, I would like to be able to do it on other similar
occasions.
Is there a lisp program for such conversion that could be run in batch mode?
If I engage I would certainly employ at least a week to learn such
lisping if I would be able to make it at all.
Does anyone have any suggestion?
At least which functions to begin considering for this task?
I did find e.g.:
http://lists.gnu.org/archive/html/help-gnu-emacs/2003-11/msg00436.html
(
#!/use/bin/emacs --script
;; And after that you can use regular elisp code:
(princ "Hello world!")
;; end.
)
and I already experimented in the line on:
emacs -batch file -f function -l lisp-code-from-file
but I remained disoriented as to how these kind of things need are
really done.
But if someone supplied a few hints I'll delve with some more hope into
all those volumes of Emacs lore...
So thank you, knowledgeable and kind reader if you care to help!
Miroslav Rovis
www.exDeo.com
^ permalink raw reply [flat|nested] 3+ messages in thread
* convert whole website from iso-8859-2/1 to utf-8
@ 2004-07-08 22:22 Miroslav Rovis
0 siblings, 0 replies; 3+ messages in thread
From: Miroslav Rovis @ 2004-07-08 22:22 UTC (permalink / raw)
No one helped yet...
So I went scuba diving into the huge Emacs lisping ocean... Breath! Breath!
Two days! First modest success...
----------------------------------------------------------------------
----------------------------------------------------------------------
An HTML file (schoolih.htm):
----------------------------------------------------------------------
<html>
<head>
<title>Virtualna Škola Supero!</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-2">
</head>
<body>
<h3>Dobrodošli u Virtualnu školu Supero! </h3>
<p>Podučavam engleski, hrvatski i talijanski kao strane jezike. I u
tome sam
istinski vješt.</p>
<p>Nudim također poduku početnika u informatici. I pisanju na Mreži.</p>
<p>No ovo je ustvari virtualna škola. Nije "prava".</p>
<p>Naravno, budem li ikad imao stvarno malo čedo u obliku
institucije, tako će se zvati.</p>
<p>Podučavam brojne đake u stvarnom životu već lijepi broj godina.</p>
</body>
</html>
----------------------------------------------------------------------
----------------------------------------------------------------------
My first lisp program (latin2-utf8.sh):
(use at your own risk ;-)
----------------------------------------------------------------------
#!/usr/bin/emacs --script
;; step
(find-file "/test/schoolih.htm")
(princ "buffer-name is: ")
(princ (buffer-name))
(princ "\n")
(princ "buffer-file-name is: ")
(princ (buffer-file-name))
(princ "\n")
(princ "buffer-file-coding-system is: ")
(princ buffer-file-coding-system)
(princ "\n")
(princ "coding-system-for-write is: ")
(princ coding-system-for-write)
(princ "\n")
(princ "\n")
;; step
(set-visited-file-name "/test/schoolih_u.htm")
;; step
(search-forward "iso-8859-2" nil t)
(replace-match "utf-8" nil t)
;; step
(let ((coding-system-for-write 'utf-8))
(princ "buffer-name is: ")
(princ (buffer-name))
(princ "\n")
(princ "buffer-file-name is: ")
(princ (buffer-file-name))
(princ "\n")
(princ "buffer-file-coding-system is: ")
(princ buffer-file-coding-system)
(princ "\n")
(princ "coding-system-for-write is: ")
(princ coding-system-for-write)
(princ "\n")
(princ "\n")
;; step
;;(revert-buffer-with-coding-system 'utf-8)
(save-buffer (current-buffer)))
----------------------------------------------------------------------
((Of course, most of it is nothing other than my groping for solutions
in this entirely new territory for me and can be cut out and forgotten.))
----------------------------------------------------------------------
----------------------------------------------------------------------
The program writes a file schoolih_u.htm which is identical in all to
schoolih.htm, except for:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
and the fact that those:
š č đ ž ć
character are in genuine utf-8 flavour!
(any other characters peculiar to utf-8 would have been so in just the
same fashion -- anyone interested is encouraged to try)...
----------------------------------------------------------------------
----------------------------------------------------------------------
OK. Enough braggadoccio (oh, I know how little and puny this is, I
know...).
This is a very small part of the whole project.
Any help is still appreciated.
May all lispers stay well and healthy, esp. in their souls!
Miroslav Rovis
www.rovis.org
----------------------------------------------------------------------
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: convert whole website from iso-8859-2/1 to utf-8
[not found] <mailman.290.1089318338.22971.help-gnu-emacs@gnu.org>
@ 2004-07-21 9:55 ` Thien-Thi Nguyen
0 siblings, 0 replies; 3+ messages in thread
From: Thien-Thi Nguyen @ 2004-07-21 9:55 UTC (permalink / raw)
Miroslav Rovis <m.rovis@inet.hr> writes:
> Any help is still appreciated.
you may wish to use `message' instead of `princ'.
w/ `message' you can use %-style formatting, etc.
(output appears on stderr, however.)
thi
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2004-07-21 9:55 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-07-07 2:18 convert whole website from iso-8859-2/1 to utf-8 Miroslav Rovis
-- strict thread matches above, loose matches on Subject: below --
2004-07-08 22:22 Miroslav Rovis
[not found] <mailman.290.1089318338.22971.help-gnu-emacs@gnu.org>
2004-07-21 9:55 ` Thien-Thi Nguyen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).