unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Re: charsets and character sets (was: Re: 21.1: list-charset-chars)
       [not found]   ` <87k7t9aivq.fsf_-_@mimuw.edu.pl>
@ 2002-02-19 18:42     ` Janusz S. Bień
       [not found]     ` <871yfh71pf.fsf@mimuw.edu.pl>
  1 sibling, 0 replies; 2+ messages in thread
From: Janusz S. Bień @ 2002-02-19 18:42 UTC (permalink / raw)
  Cc: Ulrich.Windl, bug-gnu-emacs, emacs-devel


I quote my letter in full as I intended to send it also to emacs-devel
but forgot to add it to the adressee list.

On 19 Feb 2002  jsbien@mimuw.edu.pl (Janusz S. Bień) wrote:

> On Mon, 18 Feb 2002  "Eli Zaretskii" <eliz@is.elta.co.il> wrote:
> 
> > > From: "Ulrich Windl" <Ulrich.Windl@rz.uni-regensburg.de>
> > > Date: Mon, 18 Feb 2002 15:58:51 +0100
> > > 
> > > I found out that the result of list-charset-chars (e.g. for latin15) is 
> > > contrary to the documentation: Only characters > 127 are displayed, but 
> > > the name and documentation creates the impression that all characters 
> > > are listed.
> > 
> > What led you to believe that ASCII characters with codes below 128
> > belong to the other charsets?  Whatever gave you that impression is
> > the place where the documentation should be improved, because ASCII
> > characters are a separate charset in Emacs.
> 
> On Tue, 19 Feb 2002  "Ulrich Windl" <Ulrich.Windl@rz.uni-regensburg.de> wrote:
> 
> [...]
> 
> > "list charset chars": What else than listing the characters in the 
> > charset could be expected?
> > 
> > Regards,
> > Ulrich
> 
> The Emacs documentation fails to make clear distinction between Emacs
> charsets and character sets in the sense of ISO and related
> standards. 
> 
> Charset named e.g. latin15 *is not* ISO/IEC Latin 15 character set, it
> is just its right-hand part, registered as such in ISO International
> Register (available online) as ISO-IR 203. However, iso-8859-15
> *coding system* is equivalent to ISO/IEC Latin 15, cf. the output of
> `describe-coding-system':
> 
> ------------------------------------------------------------------------------
> 0 -- iso-8859-15 (alias of iso-latin-9)
>   ISO 2022 based 8-bit encoding for Latin-9 (MIME:ISO-8859-15)
> Type: 2 (variant of ISO-2022)
> Initial designations:
>   G0 -- ascii:ASCII (ISO646 IRV)
>   G1 -- latin-iso8859-15:Right-Hand Part of Latin Alphabet 9 (ISO/IEC 8859-15): ISO-IR-203
> -----------------------------------------------------------------------------
> 
> Long, long ago I proposed to change the name of charsets
> appropriately, but my suggestion was rejected and I didn't pressed the
> point. I think there is now the right time to come back to the
> problem, as the correct terminology is important for the development
> work.
> 
> My current proposal is:
> 
> -  make explicit in the manuals and documentation strings that
>   charsets are Emacs specific technical terms,
> 
> - add `describe-charset' analogical to `describe-coding-system' to
>   minimize the chance of user confusion,
> 
> - on the first convenient occasion rename `latin-15' and related
>   charsets to something more adequate, e.g. `latin-no9-rp' (15 is the
>   number of the ISO/IEC 8859 standard part which containes the
>   definiton of Latin alphabet number 9 while `latin-15' suggests Latin
>   alphabet number 15; `rp' is to stands for `right-hand part of',
>   which is ISO/IEC technical term).
> 
> Best regards
> 
> Janusz
> 
> -- 
>                      ,   
> dr hab. Janusz S. Bien, prof. UW
> Prof. Janusz S. Bien, Warsaw Uniwersity
> http://www.orient.uw.edu.pl/~jsbien/
> ---------------------------------------------------------------------
> Na tym koncie czytam i wysylam poczte i wiadomosci offline.
> On this account I read/post mail/news offline.

On Tue, 19 Feb 2002  "Eli Zaretskii" <eliz@is.elta.co.il> wrote:


[...]

> > I don't have a v21 Emacs at hand in the moment, but a ISO 8859 15 
> > charset is a superset of US-ASCII
> 
> Not in Emacs, it isn't.  

Because charset *is not* character set.

> The full name of latin-iso8859-15 in Emacs
> is this:
> 
>   "Right-Hand Part of Latin Alphabet 9 (ISO/IEC 8859-15): ISO-IR-203."
> 
> See mule-conf.el for more information.  The ``right-hand part'' thing
> means that characters below 128 are not included.

In other words, the charset name is not adequate.

> What I'm asking is where would you suggest to explain this
> fundamental fact so that it becomes clear.

For example, after

-------------------------------------------------------------------------
International Character Set Support
***********************************

   Emacs supports a wide variety of international character sets,
including European variants of the Latin alphabet, as well as Chinese,
Cyrillic, Devanagari (Hindi and Marathi), Ethiopic, Greek, Hebrew, IPA,
Japanese, Korean, Lao, Thai, Tibetan, and Vietnamese scripts.  These
features have been merged from the modified version of Emacs known as
MULE (for "MULti-lingual Enhancement to GNU Emacs")
------------------------------------------------------------------------

add

        To implement the character set support Emacs uses the notion
        of charset. For historical reasons most 8-bit character codes
        are considered to consist of two separate 7-bit charsets,
        namely ASCII and so called right-hand part of the appropriate
        character code, for example...

        Please note also that characters belonging to different
        charsets are always different, even if they look the same: the
        letter o with acute accent from Latin alphabet no 1 (charset
        `latin-no1-rp', intended to be used e.g. for French) is
        different from the letter o with acute accent from Latin
        alphabet no 2 (charset `latin-no2-rp', intended to be used
        e.g. for Polish).

Best regards

Janusz

-- 
                     ,   
dr hab. Janusz S. Bien, prof. UW
Prof. Janusz S. Bien, Warsaw Uniwersity
http://www.orient.uw.edu.pl/~jsbien/
---------------------------------------------------------------------
Na tym koncie czytam i wysylam poczte i wiadomosci offline.
On this account I read/post mail/news offline.

_______________________________________________
Emacs-devel mailing list
Emacs-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/emacs-devel


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: charsets and character sets (was: Re: 21.1: list-charset-chars)
       [not found]     ` <871yfh71pf.fsf@mimuw.edu.pl>
@ 2002-02-20 22:13       ` Richard Stallman
  0 siblings, 0 replies; 2+ messages in thread
From: Richard Stallman @ 2002-02-20 22:13 UTC (permalink / raw)
  Cc: eliz, Ulrich.Windl, bug-gnu-emacs, emacs-devel

I added this section to mule.texi.  Thanks.  Any comments?

@node Charsets
@section Charsets
@cindex charsets

  Emacs groups all supported characters into disjoint @dfn{charsets}.
Each character code belongs to one and only one charset.  For
historical reasons, Emacs typically divides an 8-bit character code
for an extended version of ASCII into two charsets: ASCII, which
covers the codes 0 through 127, plus another charset which covers the
``right-hand part'' (the codes 128 and up).  For instance, the
characters of Latin-1 include the Emacs charset @code{ascii} plus the
Emacs charset @code{latin-iso8859-1}.

  Emacs characters belonging to different charsets may look the same,
but they are still different characters.  For example, the letter
@samp{o} with acute accent in charset @code{latin-iso8859-1}, used for
Latin-1, is different from the letter @samp{o} with acute accent in
charset @code{latin-iso8859-2}, used for Latin-2.

@findex list-charset-chars
@cindex characters in a certain charset
@findex describe-character-set
  There are two commands for obtaining information about Emacs
charsets.  The command @kbd{M-x list-charset-chars} prompts for a name
of a character set, and displays all the characters in that character
set.  The command @kbd{M-x describe-character-set} prompts for a
charset name and displays information about that charset, including
its internal representation within Emacs.

  To find out which charset a character in the buffer belongs to,
put point before it and type @kbd{C-u C-x =}.

_______________________________________________
Emacs-devel mailing list
Emacs-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/emacs-devel


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2002-02-20 22:13 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <3C7124BB.14633.1B9F2BD@localhost>
     [not found] ` <6503-Mon18Feb2002213318+0200-eliz@is.elta.co.il>
     [not found]   ` <87k7t9aivq.fsf_-_@mimuw.edu.pl>
2002-02-19 18:42     ` charsets and character sets (was: Re: 21.1: list-charset-chars) Janusz S. Bień
     [not found]     ` <871yfh71pf.fsf@mimuw.edu.pl>
2002-02-20 22:13       ` Richard Stallman

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).