From: tomas@tuxteam.de (Tomas Zerolo)
Cc: Piet van Oostrum <piet@cs.uu.nl>, emacs-devel@gnu.org
Subject: Re: Problem with national characters in XHTML
Date: Sat, 1 Oct 2005 06:29:16 +0200 [thread overview]
Message-ID: <20051001042916.GA29675@www.trapp.net> (raw)
In-Reply-To: <433DC407.1070208@student.lu.se>
[-- Attachment #1.1: Type: text/plain, Size: 1807 bytes --]
On Sat, Oct 01, 2005 at 01:02:31AM +0200, Lennart Borgman wrote:
> Piet van Oostrum wrote:
[...]
> >That is just the internal representation of the character in Emacs. It's
> >not important. What matters is what Emacs writes to your file. When you
> >write out utf-8 (for example by giving the command
[...]
> So you mean that at a - what should I call it? - "text semantic level"
> the utf-8 char and the latin-1 char has the same meaning?
Yes. You put that nicely. The *character* (a dieresis) stays the same.
The *representation* (loosely referred to as `encoding') changes.
I said loosely, because on more complex things as utf-8 there are
actually two layers: the `character set', mapping each character to an
integer (aka `code point', which in this case would be UNICODE or
ISO-10646, which nowadays are equivalent), and the representation in a
file, which may be utf-8 (most common), ucs-16 or whatnot.
Now the advantage of utf-8: it is a variable-width encoding, and uses up
just one byte for one ASCII character (on ASCII it uses the same code
points). So you can interpret an ASCII file ``as-is'' as an utf-8 file.
For higher characters (the ones, for example with codes >127 in
iso-8859-1 (aka Latin1)), you need more than one byte in utf-8. AFAIK,
up to 6 bytes, but don't take that too seriously.
The disadvantage is: it is a variable-width encoding, so you have to
process a text sequentially, byte-for-byte to get the character
boundaries right (it's designed to re-synchronize gracefully, though).
If you want the whole story (on UNICODE, ISO10646, UTF8), see here:
<http://www.cl.cam.ac.uk/~mgk25/unicode.html>
(very recommended). From the perspective of a web slave, see:
<http://www.w3.org/TR/REC-html40/charset.html>
HTH
-- tomas
[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
[-- Attachment #2: Type: text/plain, Size: 142 bytes --]
_______________________________________________
Emacs-devel mailing list
Emacs-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-devel
next prev parent reply other threads:[~2005-10-01 4:29 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-09-28 8:29 Problem with national characters in XHTML LENNART BORGMAN
2005-09-28 10:19 ` Jason Rumney
2005-09-28 10:22 ` David Hansen
2005-09-28 10:22 ` Paul Pogonyshev
2005-09-28 10:41 ` Tomas Zerolo
2005-09-28 10:44 ` Juanma Barranquero
2005-09-29 11:11 ` Mathias Dahl
2005-09-29 13:28 ` Piet van Oostrum
2005-09-29 13:52 ` Lennart Borgman
2005-09-28 11:09 ` Kenichi Handa
2005-09-28 14:05 ` Lennart Borgman
2005-09-28 19:12 ` Lennart Borgman
2005-09-29 8:43 ` Tomas Zerolo
2005-09-29 13:34 ` Piet van Oostrum
2005-09-29 14:02 ` Lennart Borgman
2005-09-30 22:15 ` Piet van Oostrum
2005-09-30 23:02 ` Lennart Borgman
2005-10-01 4:29 ` Tomas Zerolo [this message]
2005-10-01 11:22 ` Piet van Oostrum
-- strict thread matches above, loose matches on Subject: below --
2005-09-28 11:08 LENNART BORGMAN
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20051001042916.GA29675@www.trapp.net \
--to=tomas@tuxteam.de \
--cc=emacs-devel@gnu.org \
--cc=piet@cs.uu.nl \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).