unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: tomas@tuxteam.de (Tomas Zerolo)
Cc: Piet van Oostrum <piet@cs.uu.nl>, emacs-devel@gnu.org
Subject: Re: Problem with national characters in XHTML
Date: Sat, 1 Oct 2005 06:29:16 +0200	[thread overview]
Message-ID: <20051001042916.GA29675@www.trapp.net> (raw)
In-Reply-To: <433DC407.1070208@student.lu.se>


[-- Attachment #1.1: Type: text/plain, Size: 1807 bytes --]

On Sat, Oct 01, 2005 at 01:02:31AM +0200, Lennart Borgman wrote:
> Piet van Oostrum wrote:
[...]
> >That is just the internal representation of the character in Emacs. It's
> >not important. What matters is what Emacs writes to your file. When you
> >write out utf-8 (for example by giving the command
[...]
> So you mean that at a - what should I call it? - "text semantic level" 
> the utf-8 char and the latin-1 char has the same meaning?

Yes. You put that nicely. The *character* (a dieresis) stays the same.
The *representation* (loosely referred to as `encoding') changes.

I said loosely, because on more complex things as utf-8 there are
actually two layers: the `character set', mapping each character to an
integer (aka `code point', which in this case would be UNICODE or
ISO-10646, which nowadays are equivalent), and the representation in a
file, which may be utf-8 (most common), ucs-16 or whatnot.

Now the advantage of utf-8: it is a variable-width encoding, and uses up
just one byte for one ASCII character (on ASCII it uses the same code
points). So you can interpret an ASCII file ``as-is'' as an utf-8 file.

For higher characters (the ones, for example with codes >127 in
iso-8859-1 (aka Latin1)), you need more than one byte in utf-8. AFAIK,
up to 6 bytes, but don't take that too seriously.

The disadvantage is: it is a variable-width encoding, so you have to
process a text sequentially, byte-for-byte to get the character
boundaries right (it's designed to re-synchronize gracefully, though).

If you want the whole story (on UNICODE, ISO10646, UTF8), see here:

  <http://www.cl.cam.ac.uk/~mgk25/unicode.html>

(very recommended). From the perspective of a web slave, see:

  <http://www.w3.org/TR/REC-html40/charset.html>

HTH
-- tomas

[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

[-- Attachment #2: Type: text/plain, Size: 142 bytes --]

_______________________________________________
Emacs-devel mailing list
Emacs-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-devel

  reply	other threads:[~2005-10-01  4:29 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-09-28  8:29 Problem with national characters in XHTML LENNART BORGMAN
2005-09-28 10:19 ` Jason Rumney
2005-09-28 10:22 ` David Hansen
2005-09-28 10:22 ` Paul Pogonyshev
2005-09-28 10:41 ` Tomas Zerolo
2005-09-28 10:44 ` Juanma Barranquero
2005-09-29 11:11   ` Mathias Dahl
2005-09-29 13:28     ` Piet van Oostrum
2005-09-29 13:52       ` Lennart Borgman
2005-09-28 11:09 ` Kenichi Handa
2005-09-28 14:05   ` Lennart Borgman
2005-09-28 19:12     ` Lennart Borgman
2005-09-29  8:43       ` Tomas Zerolo
2005-09-29 13:34         ` Piet van Oostrum
2005-09-29 14:02           ` Lennart Borgman
2005-09-30 22:15             ` Piet van Oostrum
2005-09-30 23:02               ` Lennart Borgman
2005-10-01  4:29                 ` Tomas Zerolo [this message]
2005-10-01 11:22                 ` Piet van Oostrum
  -- strict thread matches above, loose matches on Subject: below --
2005-09-28 11:08 LENNART BORGMAN

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20051001042916.GA29675@www.trapp.net \
    --to=tomas@tuxteam.de \
    --cc=emacs-devel@gnu.org \
    --cc=piet@cs.uu.nl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).