Re: 27.0.50; Use utf-8 is all our Elisp files

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Re: 27.0.50; Use utf-8 is all our Elisp files
       [not found]         ` <83va3nban3.fsf@gnu.org>
@ 2018-12-21 21:07           ` Paul Eggert
  2018-12-22  1:19             ` Eric Lindblad
  2018-12-22  8:12             ` etc/HELLO markup etc Eli Zaretskii
  0 siblings, 2 replies; 19+ messages in thread
From: Paul Eggert @ 2018-12-21 21:07 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: handa, monnier, Emacs Development

[removing 33796@debbugs.gnu.org and adding emacs-devel@gnu.org to cc list]

Eli Zaretskii wrote:
> Which markup is not necessary for display, in your opinion?

At most all that's useful is markup that distinguishes Chinese and Japanese 
variants of Han characters; this might also include hanja (Korean) and Chữ Nôm 
(Vietnamese) variants if we ever added such characters to etc/HELLO. Such markup 
might be useful because a significant set of east Asian users dislike Unicode's 
Han unification and prefer specific variants of Han characters. I'm not aware of 
any other set of users who dislike unification in that way.

> That markup is precisely what keeps the charset properties on the
> corresponding greetings.  Removing it would be losing information that
> HELLO is trying to preserve.

Although the etc/HELLO markup might be of interest to those who care about 
annotating languages in the text, it's irrelevant to the ordinary purpose of 
that file, which is to show textual translations of "Hello", as examples, to an 
audience that doesn't know all those languages, but who can easily see the 
language names in the English (or native-language) parts of the text without 
involving any of the markup.

It's a bit like reading a translation of (say) "War and Peace". Most people just 
want to read the translated text. A small fraction might want to know which part 
of the original was written in Russian, which in French, which in English, etc. 
Markup can help that small fraction, but just gets in the way of the primary use.

> Is it possible that you are looking
> at a file/buffer that was modified from its original contents?

No, I was using Emacs 26 by mistake. Sorry about the noise.

It's still not a good user interface, though, as it is difficult to see the 
markup's effect when visiting etc/HELLO in the usual way, and this makes it hard 
to see mistakes in the markup. etc/HELLO is littered with so much useless 
markup, and the effect of markup errors is so subtle, and it's so much of a pain 
to edit the markup in its ordinary form of display, that the file is not a good 
showroom for how to maintain multilingual text. It's not a good sign that there 
seem to be errors in the possibly-useful (i.e., CJ) markup that nobody has 
noticed since the markup was introduced in May, and that I noticed these errors 
now only because I was visiting the file literally.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 27.0.50; Use utf-8 is all our Elisp files
  2018-12-21 21:07           ` 27.0.50; Use utf-8 is all our Elisp files Paul Eggert
@ 2018-12-22  1:19             ` Eric Lindblad
  2018-12-22  7:56               ` etc/HELLO markup etc. (Was: 27.0.50; Use utf-8 is all our Elisp files) Eli Zaretskii
  2018-12-22  8:12             ` etc/HELLO markup etc Eli Zaretskii
  1 sibling, 1 reply; 19+ messages in thread
From: Eric Lindblad @ 2018-12-22  1:19 UTC (permalink / raw)
  To: Emacs-devel

[-- Attachment #1: Type: text/html, Size: 450 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: etc/HELLO markup etc. (Was: 27.0.50; Use utf-8 is all our Elisp files)
  2018-12-22  1:19             ` Eric Lindblad
@ 2018-12-22  7:56               ` Eli Zaretskii
  0 siblings, 0 replies; 19+ messages in thread
From: Eli Zaretskii @ 2018-12-22  7:56 UTC (permalink / raw)
  To: Eric Lindblad; +Cc: Emacs-devel

> From: "Eric Lindblad" <lindblad@gmx.com>
> Date: Sat, 22 Dec 2018 02:19:47 +0100
> Sensitivity: Normal
> 
> Would there be any sympathy to adding a link to this webpage in the etc/HELLO file?
>  
> See also: UTF-8 SAMPLER
> http://kermitproject.org/utf8.html

Thanks, I looked at that file when I added a few scripts to HELLO.

The goals of that file are different from what we try doing in HELLO.
Our goal is to show the different scripts, not different languages or
fonts.  For that reason, many languages are absent from HELLO if they
use the same scripts which are already present in the file (for other
languages).  IOW, the different languages in HELLO are just the means
to a certain end: we need a language using a script to say "hello" for
that script.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: etc/HELLO markup etc.
  2018-12-21 21:07           ` 27.0.50; Use utf-8 is all our Elisp files Paul Eggert
  2018-12-22  1:19             ` Eric Lindblad
@ 2018-12-22  8:12             ` Eli Zaretskii
  2018-12-22 19:41               ` Paul Eggert
                                 ` (3 more replies)
  1 sibling, 4 replies; 19+ messages in thread
From: Eli Zaretskii @ 2018-12-22  8:12 UTC (permalink / raw)
  To: Paul Eggert; +Cc: handa, monnier, Emacs-devel

> Cc: handa@gnu.org, monnier@iro.umontreal.ca,
>  Emacs Development <Emacs-devel@gnu.org>
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Fri, 21 Dec 2018 13:07:09 -0800
> 
> [removing 33796@debbugs.gnu.org and adding emacs-devel@gnu.org to cc list]

I've changed the Subject, as the original one was too similar to the
bug report.

> Eli Zaretskii wrote:
> > Which markup is not necessary for display, in your opinion?
> 
> At most all that's useful is markup that distinguishes Chinese and Japanese 
> variants of Han characters; this might also include hanja (Korean) and Chữ Nôm 
> (Vietnamese) variants if we ever added such characters to etc/HELLO. Such markup 
> might be useful because a significant set of east Asian users dislike Unicode's 
> Han unification and prefer specific variants of Han characters. I'm not aware of 
> any other set of users who dislike unification in that way.

I'm not yet sure this is only about Han unification.  Using charsets
for specifying fonts is a general feature in Emacs, which can be used
to control which fonts are selected independently of what the OS
facilities such as fontconfig do.

I hope Handa-san will be able to comment on this stuff.

If Han unification is the only important user of the charset property,
then yes, we could remove the rest of the charset info from HELLO.
But please realize that the current HELLO just keeps the information
that was there before recoding it in UTF-8, nothing was added.  It is
just kept in a different form, which makes the charset info
human-readable, where previously it was encoded in the ISO 2022
sequences.

> > That markup is precisely what keeps the charset properties on the
> > corresponding greetings.  Removing it would be losing information that
> > HELLO is trying to preserve.
> 
> Although the etc/HELLO markup might be of interest to those who care about 
> annotating languages in the text, it's irrelevant to the ordinary purpose of 
> that file, which is to show textual translations of "Hello"

That's not the original purpose of that file.  The purpose is to show
scripts, not languages, and to show how we display different scripts
in the same buffer.

> It's still not a good user interface, though, as it is difficult to see the 
> markup's effect when visiting etc/HELLO in the usual way

If the usual way is via find-file and its ilk, then you should see the
same results as with "C-h h", so I'm not sure I understand what you
mean here.

> etc/HELLO is littered with so much useless markup

I disagree that it's useless.  Most of it is useful.

> the effect of markup errors is so subtle, and it's so much of a pain
> to edit the markup in its ordinary form of display

If you mean manually editing the markup, then you aren't supposed to
do that.

In what way most of what you say is not applicable to etc/enriched.txt
in general?  If you just dislike what Enriched mode produces on disk,
then let's stop this argument, as you seem to be arguing against files
with markup in general, and that's a non-starter for me.

> the file is not a good showroom for how to maintain multilingual
> text.

What other facilities are you aware of or can suggest for showing
multilingual text with such level of detail and precision?

> It's not a good sign that there seem to be errors in the
> possibly-useful (i.e., CJ) markup that nobody has noticed since the
> markup was introduced in May, and that I noticed these errors now
> only because I was visiting the file literally.

Which errors?  I don't think we discovered any errors.  We may have
discovered some markup on whitespace where we perhaps could do without
it (I'm not yet sure of that), but that's all, and is not necessarily
an error.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: etc/HELLO markup etc.
  2018-12-22  8:12             ` etc/HELLO markup etc Eli Zaretskii
@ 2018-12-22 19:41               ` Paul Eggert
  2018-12-22 20:42                 ` Eli Zaretskii
  2018-12-23  7:47               ` Yuri Khan
                                 ` (2 subsequent siblings)
  3 siblings, 1 reply; 19+ messages in thread
From: Paul Eggert @ 2018-12-22 19:41 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: handa, monnier, Emacs-devel

Eli Zaretskii wrote:

> If Han unification is the only important user of the charset property,
> then yes, we could remove the rest of the charset info from HELLO.

Yes, that's the case.

> the current HELLO just keeps the information
> that was there before recoding it in UTF-8, nothing was added.

Sure, but the non-Han markup is merely a relic of that file's old method of 
encoding, which avoided Unicode and instead used ISO 2022 escape sequences to 
switch among various 8- and 16-bit encodings, as that was the only way to show 
text in (say) Russian under the constraints of the old method. The non-Han 
markup is completely unnecessary now that the file uses UTF-8. (The Han markup 
probably isn't needed either, though I also would like Handa's opinion on that.)

>> Although the etc/HELLO markup might be of interest to those who care about
>> annotating languages in the text, it's irrelevant to the ordinary purpose of
>> that file, which is to show textual translations of "Hello"
> 
> That's not the original purpose of that file.  The purpose is to show
> scripts, not languages, and to show how we display different scripts
> in the same buffer.

OK, but either way the non-Han markup is irrelevant to the ordinary purpose of 
the file.

>> It's still not a good user interface, though, as it is difficult to see the
>> markup's effect when visiting etc/HELLO in the usual way
> 
> If the usual way is via find-file and its ilk, then you should see the
> same results as with "C-h h", so I'm not sure I understand what you
> mean here.

I meant that one cannot see the markup's effect when visiting the file with 
either C-h h or find-file in the usual way. It's useless markup.

> In what way most of what you say is not applicable to etc/enriched.txt
> in general?

Other forms of enriched-text markup are typically easily visible. If I visit 
etc/enriched.txt I can easily see which parts are marked white on blue 
background, which parts are marked italic, etc. Invisible enriched-text markup 
is much harder to deal with when editing an enriched-text file.

>> the file is not a good showroom for how to maintain multilingual
>> text.
> 
> What other facilities are you aware of or can suggest for showing
> multilingual text with such level of detail and precision?

In practice the most common and often the best way to deal with the situation is 
to do what the non-markup part of etc/HELLO is already doing: indicate within 
the text itself what language or script is being used, to help the reader who 
may be unacquainted with them, and with enough punctuation within the text so 
that the reader can easily see what's going on. This technique has been used for 
centuries, it's by far the most popular technique in common practice today, and 
it suffices for this particular application (with the possible exception of its 
Chinese and Japanese text).

>> It's not a good sign that there seem to be errors in the
>> possibly-useful (i.e., CJ) markup that nobody has noticed since the
>> markup was introduced in May, and that I noticed these errors now
>> only because I was visiting the file literally.
> 
> Which errors?  I don't think we discovered any errors.

Yes, and that's the point! The approach we're taking is not good for dealing 
with the situation.

One example of such an error is that "日本語" has no charset properties even 
though it's obviously intended to use a Japanese script (since it follows the 
word "Japanese"). I'm sure there are others.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: etc/HELLO markup etc.
  2018-12-22 19:41               ` Paul Eggert
@ 2018-12-22 20:42                 ` Eli Zaretskii
  0 siblings, 0 replies; 19+ messages in thread
From: Eli Zaretskii @ 2018-12-22 20:42 UTC (permalink / raw)
  To: Paul Eggert; +Cc: handa, monnier, Emacs-devel

> Cc: handa@gnu.org, monnier@iro.umontreal.ca, Emacs-devel@gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Sat, 22 Dec 2018 11:41:05 -0800
> 
> Eli Zaretskii wrote:
> 
> > If Han unification is the only important user of the charset property,
> > then yes, we could remove the rest of the charset info from HELLO.
> 
> Yes, that's the case.

Says you.  The issue at hand is precisely whether that is so, or just
your opinion and tendency.

> the non-Han markup is merely a relic of that file's old method of 
> encoding

It could be both a relic and an important piece of information.

> one cannot see the markup's effect when visiting the file with
> either C-h h or find-file in the usual way.

Of course, one can: via the fonts used to display the various scripts.

> > In what way most of what you say is not applicable to etc/enriched.txt
> > in general?
> 
> Other forms of enriched-text markup are typically easily visible.

Typically, but not exclusively.  There's read-only property, there's
the 'display' property, and to some extent even the "fixed" face.

> > What other facilities are you aware of or can suggest for showing
> > multilingual text with such level of detail and precision?
> 
> In practice the most common and often the best way to deal with the situation is 
> to do what the non-markup part of etc/HELLO is already doing: indicate within 
> the text itself what language or script is being used, to help the reader who 
> may be unacquainted with them, and with enough punctuation within the text so 
> that the reader can easily see what's going on.

That's useless for preserving text properties, so won't fit the bill.

> One example of such an error is that "日本語" has no charset properties even 
> though it's obviously intended to use a Japanese script (since it follows the 
> word "Japanese").

Thanks, I fixed that.

> I'm sure there are others.

Please report them if you find them.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: etc/HELLO markup etc.
  2018-12-22  8:12             ` etc/HELLO markup etc Eli Zaretskii
  2018-12-22 19:41               ` Paul Eggert
@ 2018-12-23  7:47               ` Yuri Khan
  2018-12-23 15:42                 ` Eli Zaretskii
  2018-12-28  7:10               ` Eli Zaretskii
  2018-12-29  7:23               ` handa
  3 siblings, 1 reply; 19+ messages in thread
From: Yuri Khan @ 2018-12-23  7:47 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: handa, Paul Eggert, Stefan Monnier, Emacs developers

On Sat, Dec 22, 2018 at 3:13 PM Eli Zaretskii <eliz@gnu.org> wrote:

> I'm not yet sure this is only about Han unification.  Using charsets
> for specifying fonts is a general feature in Emacs, which can be used
> to control which fonts are selected independently of what the OS
> facilities such as fontconfig do.

There is at least one more situation where different glyphs
could/should be selected for the same Unicode code points, which
charset markup does not solve.

I’m talking about italic shapes of Cyrillic letters. For some of them,
Russian and Bulgarian use one shape but Serbian and Macedonian use
another shape[1]. There are no examples of Bulgarian, Serbian, or
Macedonian in HELLO, but Russian, Ukrainian and Mongolian examples are
all marked up as “cyrillic-iso8859-5”, which is an encoding that does
not carry language information.

So: charset markup is not the right solution to the problem of
rendering the same Unicode code point with different glyphs.

[1]: https://en.wikipedia.org/wiki/Cyrillic_script#/media/File:Cyrillic_cursive.svg

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: etc/HELLO markup etc.
  2018-12-23  7:47               ` Yuri Khan
@ 2018-12-23 15:42                 ` Eli Zaretskii
  2018-12-23 15:53                   ` Werner LEMBERG
  0 siblings, 1 reply; 19+ messages in thread
From: Eli Zaretskii @ 2018-12-23 15:42 UTC (permalink / raw)
  To: Yuri Khan; +Cc: handa, eggert, monnier, Emacs-devel

> From: Yuri Khan <yurivkhan@gmail.com>
> Date: Sun, 23 Dec 2018 14:47:39 +0700
> Cc: Paul Eggert <eggert@cs.ucla.edu>, handa@gnu.org, 
> 	Stefan Monnier <monnier@iro.umontreal.ca>, Emacs developers <Emacs-devel@gnu.org>
> 
> There is at least one more situation where different glyphs
> could/should be selected for the same Unicode code points, which
> charset markup does not solve.
> 
> I’m talking about italic shapes of Cyrillic letters. For some of them,
> Russian and Bulgarian use one shape but Serbian and Macedonian use
> another shape[1]. There are no examples of Bulgarian, Serbian, or
> Macedonian in HELLO, but Russian, Ukrainian and Mongolian examples are
> all marked up as “cyrillic-iso8859-5”, which is an encoding that does
> not carry language information.
> 
> So: charset markup is not the right solution to the problem of
> rendering the same Unicode code point with different glyphs.

You mean, it's not a perfect solution, right?  Because in the "good"
department, it's "good enough" to solve at least part of the problem.
No one says we need to reject a solution because it is only partial.

I would also like to point out that, as far as the 'charset' property
is considered, HELLO is just an example of what _can_ be done, it
doesn't pretend to show _everything_ that you could do.  E.g., if it's
important to be able to display Ukrainian in a font different from
that used for Russian, we could use the koi8-u charset for the
Ukrainian greeting, and tweak our default fontset to use special fonts
for that.  We could even invent additional charsets (see
define-charset) and then use them for some greetings.  Of course, this
machinery works best when a charset is unequivocally determined by the
prevalent encoding used for text that uses that charset, and that
isn't always the case.  But still, the feature is there, and it can be
extended if needed.

Finally, regarding the special handling of italics in Serbian: is
there _any_ application out there that solves this problem
satisfactorily in multilingual environment?  I'm not sure how you
could go about that, since fonts generally cover scripts, and there's
no special Serbian Cyrillic script, there's just Cyrl to cover them
all.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: etc/HELLO markup etc.
  2018-12-23 15:42                 ` Eli Zaretskii
@ 2018-12-23 15:53                   ` Werner LEMBERG
  2018-12-23 16:04                     ` Eli Zaretskii
  0 siblings, 1 reply; 19+ messages in thread
From: Werner LEMBERG @ 2018-12-23 15:53 UTC (permalink / raw)
  To: eliz; +Cc: yurivkhan, eggert, Emacs-devel, monnier, handa


>> So: charset markup is not the right solution to the problem of
>> rendering the same Unicode code point with different glyphs.
>
> Finally, regarding the special handling of italics in Serbian: is
> there _any_ application out there that solves this problem
> satisfactorily in multilingual environment?  I'm not sure how you
> could go about that, since fonts generally cover scripts, and
> there's no special Serbian Cyrillic script, there's just Cyrl to
> cover them all.

OpenType fonts provide a language tag (in addition to a script tag) to
handle this.  XeTeX and luatex support language tags – I don't know
whether there is an editor with such a capability.


    Werner

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: etc/HELLO markup etc.
  2018-12-23 15:53                   ` Werner LEMBERG
@ 2018-12-23 16:04                     ` Eli Zaretskii
  2018-12-23 21:11                       ` Werner LEMBERG
  0 siblings, 1 reply; 19+ messages in thread
From: Eli Zaretskii @ 2018-12-23 16:04 UTC (permalink / raw)
  To: Werner LEMBERG; +Cc: yurivkhan, eggert, Emacs-devel, monnier, handa

> Date: Sun, 23 Dec 2018 16:53:14 +0100 (CET)
> Cc: yurivkhan@gmail.com, handa@gnu.org, eggert@cs.ucla.edu,
>  monnier@iro.umontreal.ca, Emacs-devel@gnu.org
> From: Werner LEMBERG <wl@gnu.org>
> 
> > Finally, regarding the special handling of italics in Serbian: is
> > there _any_ application out there that solves this problem
> > satisfactorily in multilingual environment?  I'm not sure how you
> > could go about that, since fonts generally cover scripts, and
> > there's no special Serbian Cyrillic script, there's just Cyrl to
> > cover them all.
> 
> OpenType fonts provide a language tag (in addition to a script tag) to
> handle this.

Yes, but aren't these tags used only to select fonts that have
features required by the language's shaping requirements?  That's what
Emacs does with those.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: etc/HELLO markup etc.
  2018-12-23 16:04                     ` Eli Zaretskii
@ 2018-12-23 21:11                       ` Werner LEMBERG
  0 siblings, 0 replies; 19+ messages in thread
From: Werner LEMBERG @ 2018-12-23 21:11 UTC (permalink / raw)
  To: eliz; +Cc: yurivkhan, eggert, Emacs-devel, monnier, handa

>> > Finally, regarding the special handling of italics in Serbian: is
>> > there _any_ application out there that solves this problem
>> > satisfactorily in multilingual environment?  I'm not sure how you
>> > could go about that, since fonts generally cover scripts, and
>> > there's no special Serbian Cyrillic script, there's just Cyrl to
>> > cover them all.
>> 
>> OpenType fonts provide a language tag (in addition to a script tag)
>> to handle this.
> 
> Yes, but aren't these tags used only to select fonts that have
> features required by the language's shaping requirements?  That's
> what Emacs does with those.

Well, I could imagine the following use case: Within Emacs, you
activate a Serbian language environment.  This passes the script tag
`Cyrl' and the language tag `SRB' to the current font (which must be
reloaded).

Within a document, the language tag must be explicitly passed to the
text snippet in question (using some sort of markup or text
properties); while it might be possible to algorithmically deduce a
language tag for longer texts, this certainly doesn't work for just a
few characters.

    Werner

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: etc/HELLO markup etc.
  2018-12-22  8:12             ` etc/HELLO markup etc Eli Zaretskii
  2018-12-22 19:41               ` Paul Eggert
  2018-12-23  7:47               ` Yuri Khan
@ 2018-12-28  7:10               ` Eli Zaretskii
  2018-12-29  7:23               ` handa
  3 siblings, 0 replies; 19+ messages in thread
From: Eli Zaretskii @ 2018-12-28  7:10 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: eggert, monnier, Emacs-devel

Ping!

Kenichi, could you please comment on this issue?  TIA.

> Date: Sat, 22 Dec 2018 10:12:37 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: handa@gnu.org, monnier@iro.umontreal.ca, Emacs-devel@gnu.org
> 
> > Cc: handa@gnu.org, monnier@iro.umontreal.ca,
> >  Emacs Development <Emacs-devel@gnu.org>
> > From: Paul Eggert <eggert@cs.ucla.edu>
> > Date: Fri, 21 Dec 2018 13:07:09 -0800
> > 
> > [removing 33796@debbugs.gnu.org and adding emacs-devel@gnu.org to cc list]
> 
> I've changed the Subject, as the original one was too similar to the
> bug report.
> 
> > Eli Zaretskii wrote:
> > > Which markup is not necessary for display, in your opinion?
> > 
> > At most all that's useful is markup that distinguishes Chinese and Japanese 
> > variants of Han characters; this might also include hanja (Korean) and Chữ Nôm 
> > (Vietnamese) variants if we ever added such characters to etc/HELLO. Such markup 
> > might be useful because a significant set of east Asian users dislike Unicode's 
> > Han unification and prefer specific variants of Han characters. I'm not aware of 
> > any other set of users who dislike unification in that way.
> 
> I'm not yet sure this is only about Han unification.  Using charsets
> for specifying fonts is a general feature in Emacs, which can be used
> to control which fonts are selected independently of what the OS
> facilities such as fontconfig do.
> 
> I hope Handa-san will be able to comment on this stuff.
> 
> If Han unification is the only important user of the charset property,
> then yes, we could remove the rest of the charset info from HELLO.
> But please realize that the current HELLO just keeps the information
> that was there before recoding it in UTF-8, nothing was added.  It is
> just kept in a different form, which makes the charset info
> human-readable, where previously it was encoded in the ISO 2022
> sequences.
> 
> > > That markup is precisely what keeps the charset properties on the
> > > corresponding greetings.  Removing it would be losing information that
> > > HELLO is trying to preserve.
> > 
> > Although the etc/HELLO markup might be of interest to those who care about 
> > annotating languages in the text, it's irrelevant to the ordinary purpose of 
> > that file, which is to show textual translations of "Hello"
> 
> That's not the original purpose of that file.  The purpose is to show
> scripts, not languages, and to show how we display different scripts
> in the same buffer.
> 
> > It's still not a good user interface, though, as it is difficult to see the 
> > markup's effect when visiting etc/HELLO in the usual way
> 
> If the usual way is via find-file and its ilk, then you should see the
> same results as with "C-h h", so I'm not sure I understand what you
> mean here.
> 
> > etc/HELLO is littered with so much useless markup
> 
> I disagree that it's useless.  Most of it is useful.
> 
> > the effect of markup errors is so subtle, and it's so much of a pain
> > to edit the markup in its ordinary form of display
> 
> If you mean manually editing the markup, then you aren't supposed to
> do that.
> 
> In what way most of what you say is not applicable to etc/enriched.txt
> in general?  If you just dislike what Enriched mode produces on disk,
> then let's stop this argument, as you seem to be arguing against files
> with markup in general, and that's a non-starter for me.
> 
> > the file is not a good showroom for how to maintain multilingual
> > text.
> 
> What other facilities are you aware of or can suggest for showing
> multilingual text with such level of detail and precision?
> 
> > It's not a good sign that there seem to be errors in the
> > possibly-useful (i.e., CJ) markup that nobody has noticed since the
> > markup was introduced in May, and that I noticed these errors now
> > only because I was visiting the file literally.
> 
> Which errors?  I don't think we discovered any errors.  We may have
> discovered some markup on whitespace where we perhaps could do without
> it (I'm not yet sure of that), but that's all, and is not necessarily
> an error.
> 
> 



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: etc/HELLO markup etc.
  2018-12-22  8:12             ` etc/HELLO markup etc Eli Zaretskii
                                 ` (2 preceding siblings ...)
  2018-12-28  7:10               ` Eli Zaretskii
@ 2018-12-29  7:23               ` handa
  2018-12-29  7:37                 ` Eli Zaretskii
  3 siblings, 1 reply; 19+ messages in thread
From: handa @ 2018-12-29  7:23 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: eggert, monnier, Emacs-devel

In article <838t0iasju.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> > Eli Zaretskii wrote:
> > > Which markup is not necessary for display, in your opinion?
> > 
> > At most all that's useful is markup that distinguishes Chinese and Japanese 
> > variants of Han characters; this might also include hanja (Korean) and Chữ Nôm 
> > (Vietnamese) variants if we ever added such characters to etc/HELLO. Such markup 
> > might be useful because a significant set of east Asian users dislike Unicode's 
> > Han unification and prefer specific variants of Han characters. I'm not aware of 
> > any other set of users who dislike unification in that way.

> I'm not yet sure this is only about Han unification.  Using charsets
> for specifying fonts is a general feature in Emacs, which can be used
> to control which fonts are selected independently of what the OS
> facilities such as fontconfig do.

> I hope Handa-san will be able to comment on this stuff.

> If Han unification is the only important user of the charset property,
> then yes, we could remove the rest of the charset info from HELLO.

Long ago, the quality of fonts designed for a specific regacy charset
were far better than so-called Unicode fonts even for non-Han charaters.
So, the charset information for non-Han charsets did have some meaning.
But, I don't know the current situation.  Perhaps, it is good to remove
them and wait for complaint from users.

---
K. Handa
handa@gnu.org



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: etc/HELLO markup etc.
  2018-12-29  7:23               ` handa
@ 2018-12-29  7:37                 ` Eli Zaretskii
  2019-01-06 12:06                   ` handa
  0 siblings, 1 reply; 19+ messages in thread
From: Eli Zaretskii @ 2018-12-29  7:37 UTC (permalink / raw)
  To: handa; +Cc: eggert, monnier, Emacs-devel

> From: handa <handa@gnu.org>
> Cc: eggert@cs.ucla.edu, monnier@iro.umontreal.ca, Emacs-devel@gnu.org
> Date: Sat, 29 Dec 2018 16:23:24 +0900
> 
> > I'm not yet sure this is only about Han unification.  Using charsets
> > for specifying fonts is a general feature in Emacs, which can be used
> > to control which fonts are selected independently of what the OS
> > facilities such as fontconfig do.
> 
> > I hope Handa-san will be able to comment on this stuff.
> 
> > If Han unification is the only important user of the charset property,
> > then yes, we could remove the rest of the charset info from HELLO.
> 
> Long ago, the quality of fonts designed for a specific regacy charset
> were far better than so-called Unicode fonts even for non-Han charaters.
> So, the charset information for non-Han charsets did have some meaning.
> But, I don't know the current situation.  Perhaps, it is good to remove
> them and wait for complaint from users.

Thanks.

What about using the charset information in general for font
selection?  Do you think this is a valuable feature, or was it again
designed only due to the issues you mention above with fonts designed
for legacy charsets?



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: etc/HELLO markup etc.
  2018-12-29  7:37                 ` Eli Zaretskii
@ 2019-01-06 12:06                   ` handa
  2019-01-06 15:29                     ` Eli Zaretskii
  0 siblings, 1 reply; 19+ messages in thread
From: handa @ 2019-01-06 12:06 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: eggert, monnier, Emacs-devel

In article <83lg486awy.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> What about using the charset information in general for font
> selection?  Do you think this is a valuable feature, or was it again
> designed only due to the issues you mention above with fonts designed
> for legacy charsets?

The latter.  As an Open Type font has shaping rules for script and/or
language, script and language information is more useful than charset.

---
K. Handa
handa@gnu.org



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: etc/HELLO markup etc.
  2019-01-06 12:06                   ` handa
@ 2019-01-06 15:29                     ` Eli Zaretskii
  2019-01-06 17:26                       ` Stefan Monnier
  0 siblings, 1 reply; 19+ messages in thread
From: Eli Zaretskii @ 2019-01-06 15:29 UTC (permalink / raw)
  To: handa; +Cc: eggert, monnier, Emacs-devel

> From: handa <handa@gnu.org>
> Cc: eggert@cs.ucla.edu, monnier@iro.umontreal.ca, Emacs-devel@gnu.org
> Date: Sun, 06 Jan 2019 21:06:22 +0900
> 
> In article <83lg486awy.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:
> 
> > What about using the charset information in general for font
> > selection?  Do you think this is a valuable feature, or was it again
> > designed only due to the issues you mention above with fonts designed
> > for legacy charsets?
> 
> The latter.  As an Open Type font has shaping rules for script and/or
> language, script and language information is more useful than charset.

Thanks.  I guess we can remove most of charset markup from HELLO,
leaving only one or two as an example of the facility.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: etc/HELLO markup etc.
  2019-01-06 15:29                     ` Eli Zaretskii
@ 2019-01-06 17:26                       ` Stefan Monnier
  2019-01-06 17:39                         ` Eli Zaretskii
  0 siblings, 1 reply; 19+ messages in thread
From: Stefan Monnier @ 2019-01-06 17:26 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: handa, eggert, Emacs-devel

> Thanks.  I guess we can remove most of charset markup from HELLO,
> leaving only one or two as an example of the facility.

And to get back to bug#33796: does that mean I can install a change to
convert those Elisp files to utf-8?


        Stefan



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: etc/HELLO markup etc.
  2019-01-06 17:26                       ` Stefan Monnier
@ 2019-01-06 17:39                         ` Eli Zaretskii
  2019-01-06 18:08                           ` Stefan Monnier
  0 siblings, 1 reply; 19+ messages in thread
From: Eli Zaretskii @ 2019-01-06 17:39 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: handa, eggert, Emacs-devel

> From: Stefan Monnier <monnier@IRO.UMontreal.CA>
> Cc: handa <handa@gnu.org>, eggert@cs.ucla.edu, Emacs-devel@gnu.org
> Date: Sun, 06 Jan 2019 12:26:39 -0500
> 
> > Thanks.  I guess we can remove most of charset markup from HELLO,
> > leaving only one or two as an example of the facility.
> 
> And to get back to bug#33796: does that mean I can install a change to
> convert those Elisp files to utf-8?

Yes, I think so.  Except that I'd prefer not to mix code changes and
encoding changes.  Can you do that in two separate patches?



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: etc/HELLO markup etc.
  2019-01-06 17:39                         ` Eli Zaretskii
@ 2019-01-06 18:08                           ` Stefan Monnier
  0 siblings, 0 replies; 19+ messages in thread
From: Stefan Monnier @ 2019-01-06 18:08 UTC (permalink / raw)
  To: emacs-devel

>> > Thanks.  I guess we can remove most of charset markup from HELLO,
>> > leaving only one or two as an example of the facility.
>> 
>> And to get back to bug#33796: does that mean I can install a change to
>> convert those Elisp files to utf-8?
>
> Yes, I think so.  Except that I'd prefer not to mix code changes and
> encoding changes.  Can you do that in two separate patches?

Yes, of course,


        Stefan




^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2019-01-06 18:08 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <3fd27fe5-e650-b207-fdd4-36f805b89b4d@cs.ucla.edu>
     [not found] ` <83bm5hcroa.fsf@gnu.org>
     [not found]   ` <9f33127d-f01b-b138-7a0c-ffeac7b77938@cs.ucla.edu>
     [not found]     ` <835zvochdj.fsf@gnu.org>
     [not found]       ` <5f113128-36c9-30c6-3413-8dc36051e058@cs.ucla.edu>
     [not found]         ` <83va3nban3.fsf@gnu.org>
2018-12-21 21:07           ` 27.0.50; Use utf-8 is all our Elisp files Paul Eggert
2018-12-22  1:19             ` Eric Lindblad
2018-12-22  7:56               ` etc/HELLO markup etc. (Was: 27.0.50; Use utf-8 is all our Elisp files) Eli Zaretskii
2018-12-22  8:12             ` etc/HELLO markup etc Eli Zaretskii
2018-12-22 19:41               ` Paul Eggert
2018-12-22 20:42                 ` Eli Zaretskii
2018-12-23  7:47               ` Yuri Khan
2018-12-23 15:42                 ` Eli Zaretskii
2018-12-23 15:53                   ` Werner LEMBERG
2018-12-23 16:04                     ` Eli Zaretskii
2018-12-23 21:11                       ` Werner LEMBERG
2018-12-28  7:10               ` Eli Zaretskii
2018-12-29  7:23               ` handa
2018-12-29  7:37                 ` Eli Zaretskii
2019-01-06 12:06                   ` handa
2019-01-06 15:29                     ` Eli Zaretskii
2019-01-06 17:26                       ` Stefan Monnier
2019-01-06 17:39                         ` Eli Zaretskii
2019-01-06 18:08                           ` Stefan Monnier

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).