unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#27505: LC_CTYPE affects tutorial language
@ 2017-06-27 14:48 ` Leonard Lausen
  2017-06-27 15:05   ` Eli Zaretskii
                     ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Leonard Lausen @ 2017-06-27 14:48 UTC (permalink / raw)
  To: 27505

Dear all,

as far as I know the environment variable LC_CTYPE applies to
classification and conversion of characters, and to multibyte and wide
characters. So setting it should not influence the interface language,
correct?

However, with the following locale:
LANG=en_US.UTF-8
LC_CTYPE=zh_CN.UTF-8
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE=C
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

I find that the emacs tutorial (C-h t) is displayed in Chinese.

Is this expected behavior or a bug? This may or may not be related to
bug#27312 where I reported that I can't activate fcitx even though env
is set up correctly.

(I.e. the following is the first line of the displayed tutorial:
Emacs 快速指南.(查看版权声明请至本文末尾))


> 
> In GNU Emacs 25.2.1 (x86_64-pc-linux-gnu, GTK+ Version 3.22.15)
>  of 2017-06-10 built on leonard-xps13
> Windowing system distributor 'The X.Org Foundation', version 11.0.11903000
> Configured using:
>  'configure --prefix=/usr --build=x86_64-pc-linux-gnu
>  --host=x86_64-pc-linux-gnu --mandir=/usr/share/man
>  --infodir=/usr/share/info --datadir=/usr/share --sysconfdir=/etc
>  --localstatedir=/var/lib --disable-dependency-tracking
>  --disable-silent-rules --docdir=/usr/share/doc/emacs-25.2
>  --htmldir=/usr/share/doc/emacs-25.2/html --libdir=/usr/lib64
>  --program-suffix=-emacs-25 --infodir=/usr/share/info/emacs-25
>  --localstatedir=/var
>  --enable-locallisppath=/etc/emacs:/usr/share/emacs/site-lisp
>  --with-gameuser=:gamestat --without-compress-install
>  --with-file-notification=inotify --enable-acl --with-dbus
>  --with-modules --with-gpm --without-hesiod --without-kerberos
>  --without-kerberos5 --with-xml2 --without-selinux --with-gnutls
>  --without-wide-int --with-zlib --with-sound=alsa --with-x --without-ns
>  --with-gconf --with-gsettings --without-toolkit-scroll-bars --with-gif
>  --with-jpeg --with-png --with-rsvg --with-tiff --with-xpm
>  --with-imagemagick --with-xft --without-cairo --with-libotf
>  --with-m17n-flt --with-x-toolkit=gtk3 --without-xwidgets
>  GENTOO_PACKAGE=app-editors/emacs-25.2 'CFLAGS=-march=native
>  -mtune=native -O2 -pipe' CPPFLAGS= 'LDFLAGS=-Wl,-O1 -Wl,--as-needed''
> 
> Configured features:
> XPM JPEG TIFF GIF PNG RSVG IMAGEMAGICK SOUND GPM DBUS GCONF GSETTINGS
> NOTIFY ACL GNUTLS LIBXML2 FREETYPE M17N_FLT LIBOTF XFT ZLIB GTK3 X11
> MODULES
> 
> Important settings:
>   value of $LC_COLLATE: C
>   value of $LC_CTYPE: zh_CN.UTF-8
>   value of $LANG: en_US.UTF-8
>   value of $XMODIFIERS: @im=fcitx
>   locale-coding-system: utf-8-unix
> 
> Major mode: Lisp Interaction
> 
> Minor modes in effect:
>   tooltip-mode: t
>   global-eldoc-mode: t
>   electric-indent-mode: t
>   mouse-wheel-mode: t
>   tool-bar-mode: t
>   menu-bar-mode: t
>   file-name-shadow-mode: t
>   global-font-lock-mode: t
>   font-lock-mode: t
>   blink-cursor-mode: t
>   auto-composition-mode: t
>   auto-encryption-mode: t
>   auto-compression-mode: t
>   line-number-mode: t
>   transient-mark-mode: t
> 
> Recent messages:
> For information about GNU Emacs and the GNU system, type C-h C-a.
> Making completion list... [2 times]
> delete-backward-char: Text is read-only [3 times]
> Making completion list...
> 
> Load-path shadows:
> None found.
> 
> Features:
> (shadow sort mail-extr emacsbug message dired format-spec rfc822 mml
> mml-sec password-cache epg epg-config gnus-util mm-decode mm-bodies
> mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader sendmail
> rfc2047 rfc2045 ietf-drums mm-util help-fns help-mode easymenu
> cl-loaddefs pcase cl-lib mail-prsvr mail-utils time-date mule-util
> china-util tooltip eldoc electric uniquify ediff-hook vc-hooks
> lisp-float-type mwheel x-win term/common-win x-dnd tool-bar dnd fontset
> image regexp-opt fringe tabulated-list newcomment elisp-mode lisp-mode
> prog-mode register page menu-bar rfn-eshadow timer select scroll-bar
> mouse jit-lock font-lock syntax facemenu font-core frame cl-generic cham
> georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao
> korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech
> european ethiopic indian cyrillic chinese charscript case-table epa-hook
> jka-cmpr-hook help simple abbrev minibuffer cl-preloaded nadvice
> loaddefs button faces cus-face macroexp files text-properties overlay
> sha1 md5 base64 format env code-pages mule custom widget
> hashtable-print-readable backquote dbusbind inotify dynamic-setting
> system-font-setting font-render-setting move-toolbar gtk x-toolkit x
> multi-tty make-network-process emacs)
> 
> Memory information:
> ((conses 16 86605 6233)
>  (symbols 48 19787 0)
>  (miscs 40 46 96)
>  (strings 32 14398 4574)
>  (string-bytes 1 414247)
>  (vectors 16 12192)
>  (vector-slots 8 484142 16017)
>  (floats 8 167 9)
>  (intervals 56 279 0)
>  (buffers 976 19)
>  (heap 1024 16015 1078)





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#27505: LC_CTYPE affects tutorial language
  2017-06-27 14:48 ` bug#27505: LC_CTYPE affects tutorial language Leonard Lausen
@ 2017-06-27 15:05   ` Eli Zaretskii
  2017-06-27 15:13   ` Andreas Schwab
       [not found]   ` <handler.27505.C.150189707129878.notifdonectrl.0@debbugs.gnu.org>
  2 siblings, 0 replies; 18+ messages in thread
From: Eli Zaretskii @ 2017-06-27 15:05 UTC (permalink / raw)
  To: Leonard Lausen; +Cc: 27505

> From: Leonard Lausen <leonard@lausen.nl>
> Date: Tue, 27 Jun 2017 23:48:41 +0900
> 
> as far as I know the environment variable LC_CTYPE applies to
> classification and conversion of characters, and to multibyte and wide
> characters. So setting it should not influence the interface language,
> correct?
> 
> However, with the following locale:
> LANG=en_US.UTF-8
> LC_CTYPE=zh_CN.UTF-8
> LC_NUMERIC="en_US.UTF-8"
> LC_TIME="en_US.UTF-8"
> LC_COLLATE=C
> LC_MONETARY="en_US.UTF-8"
> LC_MESSAGES="en_US.UTF-8"
> LC_PAPER="en_US.UTF-8"
> LC_NAME="en_US.UTF-8"
> LC_ADDRESS="en_US.UTF-8"
> LC_TELEPHONE="en_US.UTF-8"
> LC_MEASUREMENT="en_US.UTF-8"
> LC_IDENTIFICATION="en_US.UTF-8"
> LC_ALL=
> 
> I find that the emacs tutorial (C-h t) is displayed in Chinese.
> 
> Is this expected behavior or a bug?

It's the intended behavior: LC_CTYPE affects the language environment
which Emacs sets up by default.  From the Emacs manual:

     Some operating systems let you specify the character-set locale you
  are using by setting the locale environment variables ‘LC_ALL’,
  ‘LC_CTYPE’, or ‘LANG’.  (If more than one of these is set, the first one
  that is nonempty specifies your locale for this purpose.)  During
  startup, Emacs looks up your character-set locale’s name in the system
  locale alias table, matches its canonical name against entries in the
  value of the variables ‘locale-charset-language-names’ and
  ‘locale-language-names’ (the former overrides the latter), and selects
  the corresponding language environment if a match is found.  It also
  adjusts the display table and terminal coding system, the locale coding
  system, the preferred coding system as needed for the locale, and—last
  but not least—the way Emacs decodes non-ASCII characters sent by your
  keyboard.

And the language environment includes a setting for the default
tutorial.





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#27505: LC_CTYPE affects tutorial language
  2017-06-27 14:48 ` bug#27505: LC_CTYPE affects tutorial language Leonard Lausen
  2017-06-27 15:05   ` Eli Zaretskii
@ 2017-06-27 15:13   ` Andreas Schwab
       [not found]   ` <handler.27505.C.150189707129878.notifdonectrl.0@debbugs.gnu.org>
  2 siblings, 0 replies; 18+ messages in thread
From: Andreas Schwab @ 2017-06-27 15:13 UTC (permalink / raw)
  To: Leonard Lausen; +Cc: 27505

On Jun 27 2017, Leonard Lausen <leonard@lausen.nl> wrote:

> as far as I know the environment variable LC_CTYPE applies to
> classification and conversion of characters, and to multibyte and wide
> characters. So setting it should not influence the interface language,
> correct?

current-language-environment is set from LC_CTYPE, which also controls
the tutorial language.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#27505: acknowledged by developer (Re: bug#27505: LC_CTYPE affects tutorial language)
       [not found]   ` <handler.27505.C.150189707129878.notifdonectrl.0@debbugs.gnu.org>
@ 2017-08-05  1:54     ` Leonard Lausen
  2017-08-05  2:06       ` npostavs
  2017-08-05  7:06       ` Eli Zaretskii
  0 siblings, 2 replies; 18+ messages in thread
From: Leonard Lausen @ 2017-08-05  1:54 UTC (permalink / raw)
  To: 27505

Please reopen this bug. Unfortunately my previous reply was only sent to
Andreas, but not to the bug list. I am attaching it below. A short
summary is that emacs is assuming the language I occasionally need to
input is also the language I want to read by default, which is a wrong
assumption. Note that its not possible to input Chinese characters in
emacs without setting LC_CTYPE to zh_CN.


Thanks Andreas and Eli for the prompt reply.

In that case though I believe the intended emacs behavior does not make
sense. Given that I need to set LC_CTYPE=zh_CN.UTF-8 just to make it
possible to use input system input methods for Chinese characters
doesn't mean I want to actually use a Chinese language interface.

Or concretely, I am learning Chinese and am comfortable typing it or
having daily conversations, however I don't feel comfortable reading the
emacs manual in Chinese. For my language learning I also tend to keep
some notes in Chinese which I would like to edit with emacs.

Shouldn't there be a way to allow people to input Chinese (or other
non-European languages) without affecting the language environment? The
current behavior seems to discriminate language learners

What do you think?

Thanks!

Best regards
Leonard





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#27505: acknowledged by developer (Re: bug#27505: LC_CTYPE affects tutorial language)
  2017-08-05  1:54     ` bug#27505: acknowledged by developer (Re: bug#27505: LC_CTYPE affects tutorial language) Leonard Lausen
@ 2017-08-05  2:06       ` npostavs
  2017-08-05  5:59         ` Leonard Lausen
  2017-08-05  7:10         ` Eli Zaretskii
  2017-08-05  7:06       ` Eli Zaretskii
  1 sibling, 2 replies; 18+ messages in thread
From: npostavs @ 2017-08-05  2:06 UTC (permalink / raw)
  To: Leonard Lausen; +Cc: 27505

reopen 27505
tags 27505 - notabug
quit

Leonard Lausen <leonard@lausen.nl> writes:

> Please reopen this bug. Unfortunately my previous reply was only sent to
> Andreas, but not to the bug list. I am attaching it below. A short
> summary is that emacs is assuming the language I occasionally need to
> input is also the language I want to read by default, which is a wrong
> assumption. Note that its not possible to input Chinese characters in
> emacs without setting LC_CTYPE to zh_CN.

Does setting LC_ALL=en_US.UTF-8 and LC_CTYPE=zh_CN.UTF-8 work?

     Some operating systems let you specify the character-set locale you
  are using by setting the locale environment variables ‘LC_ALL’,
  ‘LC_CTYPE’, or ‘LANG’.  (If more than one of these is set, the first one
  that is nonempty specifies your locale for this purpose.)

Or should LANG should take precedence over LC_CTYPE perhaps?





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#27505: acknowledged by developer (Re: bug#27505: LC_CTYPE affects tutorial language)
  2017-08-05  2:06       ` npostavs
@ 2017-08-05  5:59         ` Leonard Lausen
  2017-08-05  7:10         ` Eli Zaretskii
  1 sibling, 0 replies; 18+ messages in thread
From: Leonard Lausen @ 2017-08-05  5:59 UTC (permalink / raw)
  To: npostavs; +Cc: 27505

> Does setting LC_ALL=en_US.UTF-8 and LC_CTYPE=zh_CN.UTF-8 work?

Due to bug 27312 unfortunately I can't test if setting
LC_ALL=en_US.UTF-8 and LC_CTYPE=zh_CN.UTF-8 works (i.e. still allows
using the X input method). But I would expect it not to work, as LC_ALL
is supposed to overwrite  LC_CTYPE (see also below).

Arguably #10867 should be fixed directly and the X input method should
work independently of the setting of LC_CTYPE.

> Or should LANG should take precedence over LC_CTYPE perhaps?

Using LANG to decide the language of the tutorial should also work. But
would changing the precedence order to set current-language-environment
based on LANG not interfere with bug #10867? I.e. does emacs directly
check LC_CTYPE to decide if it supports using the X input method or does
it check current-language-environment (#10867).

In the latter case simply changing the precedence order wouldn't fix the
problem, as it would still require me to have
current-language-environment to be set to Chinese just to input Chinese
characters..





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#27505: acknowledged by developer (Re: bug#27505: LC_CTYPE affects tutorial language)
  2017-08-05  1:54     ` bug#27505: acknowledged by developer (Re: bug#27505: LC_CTYPE affects tutorial language) Leonard Lausen
  2017-08-05  2:06       ` npostavs
@ 2017-08-05  7:06       ` Eli Zaretskii
  2017-08-05  8:17         ` Leonard Lausen
  2017-08-05  8:18         ` bug#27505: acknowledged by developer (Re: bug#27505: LC_CTYPE affects tutorial language) Leonard Lausen
  1 sibling, 2 replies; 18+ messages in thread
From: Eli Zaretskii @ 2017-08-05  7:06 UTC (permalink / raw)
  To: Leonard Lausen; +Cc: 27505

> From: Leonard Lausen <leonard@lausen.nl>
> Date: Sat, 5 Aug 2017 10:54:37 +0900
> 
> Please reopen this bug.

Continuing the discussions doesn't require reopening the bug, as long
as we don't intend to make any changes for it.

> In that case though I believe the intended emacs behavior does not make
> sense. Given that I need to set LC_CTYPE=zh_CN.UTF-8 just to make it
> possible to use input system input methods for Chinese characters
> doesn't mean I want to actually use a Chinese language interface.
> 
> Or concretely, I am learning Chinese and am comfortable typing it or
> having daily conversations, however I don't feel comfortable reading the
> emacs manual in Chinese. For my language learning I also tend to keep
> some notes in Chinese which I would like to edit with emacs.
> 
> Shouldn't there be a way to allow people to input Chinese (or other
> non-European languages) without affecting the language environment? The
> current behavior seems to discriminate language learners

Yes, there should be such a way, and in fact it is already, and always
was, implemented in Emacs.  The values of LC_CTYPE etc. environment
variables are only used to set up the _defaults_; users can use
commands and options to override those defaults in many ways.  For
example, "C-h t" can be invoked with a numeric argument ("C-u C-h t")
in which case Emacs will ask you in what language to display the
tutorial.  As another example, input method of your choosing can be
invoked at any moment with "C-u C-\"; then you can switch it back off
as soon as you've finished typing characters that are not directly
accessible from your system keyboard.  Finally, the language
environment of your choosing can be set with "C-x RET l", and doing
that will set many other defaults according to the language
environment you select.

Given all these facilities, I'm not sure I understand what exactly is
your problem.  The original report was about the tutorial language,
but you never explained why did you set LC_CTYPE to the value that
specified Chinese.  If you did that for some reason other than for
using Chinese in your programs, then perhaps you shouldn't set
LC_CTYPE, and instead should use the above-mentioned, more focused,
Emacs features to specify Chinese where you want it?





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#27505: acknowledged by developer (Re: bug#27505: LC_CTYPE affects tutorial language)
  2017-08-05  2:06       ` npostavs
  2017-08-05  5:59         ` Leonard Lausen
@ 2017-08-05  7:10         ` Eli Zaretskii
  1 sibling, 0 replies; 18+ messages in thread
From: Eli Zaretskii @ 2017-08-05  7:10 UTC (permalink / raw)
  To: npostavs; +Cc: leonard, 27505

> From: npostavs@users.sourceforge.net
> Date: Fri, 04 Aug 2017 22:06:30 -0400
> Cc: 27505@debbugs.gnu.org
> 
> Or should LANG should take precedence over LC_CTYPE perhaps?

That'd go against the Posix semantics of these variables, so we
shouldn't do that, because it might not be what is expected by users
who set both LANG and other LC_* variables.

As I wrote previously, I don't really understand the exact problem we
are asked to solve here.  I don't think we should be discussing
solutions before we understand the actual problem.  Right now, I
believe that Emacs already provides features to resolve any such
problems.





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#27505: acknowledged by developer (Re: bug#27505: LC_CTYPE affects tutorial language)
  2017-08-05  7:06       ` Eli Zaretskii
@ 2017-08-05  8:17         ` Leonard Lausen
  2017-08-05  9:17           ` Eli Zaretskii
  2017-08-05  8:18         ` bug#27505: acknowledged by developer (Re: bug#27505: LC_CTYPE affects tutorial language) Leonard Lausen
  1 sibling, 1 reply; 18+ messages in thread
From: Leonard Lausen @ 2017-08-05  8:17 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 27505

Hey Eli,

thanks for your reply.

> Yes, there should be such a way, and in fact it is already, and
> always was, implemented in Emacs.  The values of LC_CTYPE etc.
> environment variables are only used to set up the _defaults_; users
> can use commands and options to override those defaults in many ways.
> For example, "C-h t" can be invoked with a numeric argument ("C-u C-h
> t") in which case Emacs will ask you in what language to display the 
> tutorial.  As another example, input method of your choosing can be 
> invoked at any moment with "C-u C-\"; then you can switch it back
> off as soon as you've finished typing characters that are not
> directly accessible from your system keyboard.  Finally, the
> language environment of your choosing can be set with "C-x RET l",
> and doing that will set many other defaults according to the
> language environment you select.

I was not aware of the feature to change the tutorial language via "C-u
C-h t". Thanks for pointing that out.

> Given all these facilities, I'm not sure I understand what exactly
> is your problem.  The original report was about the tutorial
> language, but you never explained why did you set LC_CTYPE to the
> value that specified Chinese.  If you did that for some reason other
> than for using Chinese in your programs, then perhaps you shouldn't
> set LC_CTYPE, and instead should use the above-mentioned, more
> focused, Emacs features to specify Chinese where you want it?

Sorry for not being clear about it. To input Chinese, Japanese or Korean
(CJK) on Linux people usually rely on tools such as fcitx or ibus, which
allow
inputting CJK characters in any application. They are also supported by
emacs via the X Input Method (XIM) protocol.

Unfortunately XIM is only supported in emacs when LC_CTYPE is set to a
CJK locale (#10867: must export LC_CTYPE to zh_CN.UTF-8 or similar CJK
locale to use X input method).

Compared to using emacs input methods, fcitx provides the same
experience for all desktop applications and arguably better statistical
matching methods to match the user input (Latin characters) to the
target CJK Characters, so it is preferable over the emacs input methods
 ("C-u C-\").

I would be more than happy to not set LC_CTYPE to Chinese, if #10867
gets fixed. Until then it seems the only way to get XIM working. If I
remember correctly though, #10867 is intended behavior and won't be
fixed (which is not sensible IMO).

My problem is, that just because I would like to use XIM doesn't mean
that I would like to see any of the emacs interface in the LC_CTYPE
language. So given that #10867 seems to be intended behavior at least
emacs shouldn't rely on LC_CTYPE to change the
interface language in any user-visible way. From my perspective it would
make more sense to fix #10867 though.

> That'd go against the Posix semantics of these variables, so we 
> shouldn't do that, because it might not be what is expected by users 
> who set both LANG and other LC_* variables.
> 
> As I wrote previously, I don't really understand the exact problem
> we are asked to solve here.  I don't think we should be discussing 
> solutions before we understand the actual problem.  Right now, I 
> believe that Emacs already provides features to resolve any such 
> problems.

As far as I understand the current behavior of emacs to change the
interface language based on LC_CTYPE is application defined behavior
that is not part of Posix. Posix only says:

> This variable determines the locale category for character handling
> functions, such as tolower(), toupper() and isalpha(). This
> environment variable determines the interpretation of sequences of
> bytes of text data as characters (for example, single- as opposed to
> multi-byte characters), the classification of characters (for
> example, alpha, digit, graph) and the behaviour of character classes.
> Additional semantics of this variable, if any, are
> implementation-dependent.

So I see no problem with LANG defining the interface language and
LC_CTYPE taking care of the character handling..

Best regards
Leonard

PS: Besides emacs bug 10867 there is also an Ubuntu bug from 2009
https://bugs.launchpad.net/ubuntu/+source/emacs-snapshot/+bug/434730
Or in Chinese forums https://emacs-china.org/t/emacs-gui/1271





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#27505: acknowledged by developer (Re: bug#27505: LC_CTYPE affects tutorial language)
  2017-08-05  7:06       ` Eli Zaretskii
  2017-08-05  8:17         ` Leonard Lausen
@ 2017-08-05  8:18         ` Leonard Lausen
  1 sibling, 0 replies; 18+ messages in thread
From: Leonard Lausen @ 2017-08-05  8:18 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 27505

Hey Eli,

thanks for your reply.

> Yes, there should be such a way, and in fact it is already, and
> always was, implemented in Emacs.  The values of LC_CTYPE etc.
> environment variables are only used to set up the _defaults_; users
> can use commands and options to override those defaults in many ways.
> For example, "C-h t" can be invoked with a numeric argument ("C-u C-h
> t") in which case Emacs will ask you in what language to display the 
> tutorial.  As another example, input method of your choosing can be 
> invoked at any moment with "C-u C-\"; then you can switch it back
> off as soon as you've finished typing characters that are not
> directly accessible from your system keyboard.  Finally, the
> language environment of your choosing can be set with "C-x RET l",
> and doing that will set many other defaults according to the
> language environment you select.

I was not aware of the feature to change the tutorial language via "C-u
C-h t". Thanks for pointing that out.

> Given all these facilities, I'm not sure I understand what exactly
> is your problem.  The original report was about the tutorial
> language, but you never explained why did you set LC_CTYPE to the
> value that specified Chinese.  If you did that for some reason other
> than for using Chinese in your programs, then perhaps you shouldn't
> set LC_CTYPE, and instead should use the above-mentioned, more
> focused, Emacs features to specify Chinese where you want it?

Sorry for not being clear about it. To input Chinese, Japanese or Korean
(CJK) on Linux people usually rely on tools such as fcitx or ibus, which
allow
inputting CJK characters in any application. They are also supported by
emacs via the X Input Method (XIM) protocol.

Unfortunately XIM is only supported in emacs when LC_CTYPE is set to a
CJK locale (#10867: must export LC_CTYPE to zh_CN.UTF-8 or similar CJK
locale to use X input method).

Compared to using emacs input methods, fcitx provides the same
experience for all desktop applications and arguably better statistical
matching methods to match the user input (Latin characters) to the
target CJK Characters, so it is preferable over the emacs input methods
 ("C-u C-\").

I would be more than happy to not set LC_CTYPE to Chinese, if #10867
gets fixed. Until then it seems the only way to get XIM working. If I
remember correctly though, #10867 is intended behavior and won't be
fixed (which is not sensible IMO).

My problem is, that just because I would like to use XIM doesn't mean
that I would like to see any of the emacs interface in the LC_CTYPE
language. So given that #10867 seems to be intended behavior at least
emacs shouldn't rely on LC_CTYPE to change the
interface language in any user-visible way. From my perspective it would
make more sense to fix #10867 though.

> That'd go against the Posix semantics of these variables, so we 
> shouldn't do that, because it might not be what is expected by users 
> who set both LANG and other LC_* variables.
> 
> As I wrote previously, I don't really understand the exact problem
> we are asked to solve here.  I don't think we should be discussing 
> solutions before we understand the actual problem.  Right now, I 
> believe that Emacs already provides features to resolve any such 
> problems.

As far as I understand the current behavior of emacs to change the
interface language based on LC_CTYPE is application defined behavior
that is not part of Posix. At least according to
http://pubs.opengroup.org/onlinepubs/7908799/xbd/envvar.html :

> This variable determines the locale category for character handling
> functions, such as tolower(), toupper() and isalpha(). This
> environment variable determines the interpretation of sequences of
> bytes of text data as characters (for example, single- as opposed to
> multi-byte characters), the classification of characters (for
> example, alpha, digit, graph) and the behaviour of character classes.
> Additional semantics of this variable, if any, are
> implementation-dependent.

So I see no problem with LANG defining the interface language and
LC_CTYPE taking care of the character handling..

Best regards
Leonard

PS: Besides emacs bug 10867 there is also an Ubuntu bug from 2009
https://bugs.launchpad.net/ubuntu/+source/emacs-snapshot/+bug/434730
Or in Chinese forums https://emacs-china.org/t/emacs-gui/1271





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#27505: acknowledged by developer (Re: bug#27505: LC_CTYPE affects tutorial language)
  2017-08-05  8:17         ` Leonard Lausen
@ 2017-08-05  9:17           ` Eli Zaretskii
  2017-08-05  9:52             ` Leonard Lausen
  0 siblings, 1 reply; 18+ messages in thread
From: Eli Zaretskii @ 2017-08-05  9:17 UTC (permalink / raw)
  To: Leonard Lausen; +Cc: 27505

> Cc: 27505@debbugs.gnu.org
> From: Leonard Lausen <leonard@lausen.nl>
> Date: Sat, 5 Aug 2017 17:17:47 +0900
> 
> I would be more than happy to not set LC_CTYPE to Chinese, if #10867
> gets fixed. Until then it seems the only way to get XIM working. If I
> remember correctly though, #10867 is intended behavior and won't be
> fixed (which is not sensible IMO).
> 
> My problem is, that just because I would like to use XIM doesn't mean
> that I would like to see any of the emacs interface in the LC_CTYPE
> language. So given that #10867 seems to be intended behavior at least
> emacs shouldn't rely on LC_CTYPE to change the
> interface language in any user-visible way. From my perspective it would
> make more sense to fix #10867 though.

I don't see any experts we have who could fix that, unfortunately.

But I don't see why that would be a problem for you: if you don't want
that Emacs language environment be Chinese when you use XIM, you
should be able to invoke set-language-environment inside Emacs after
starting it, to set the language environment to something other than
Chinese.  Does that work for you?

> As far as I understand the current behavior of emacs to change the
> interface language based on LC_CTYPE is application defined behavior
> that is not part of Posix. Posix only says:
> 
> > This variable determines the locale category for character handling
> > functions, such as tolower(), toupper() and isalpha(). This
> > environment variable determines the interpretation of sequences of
> > bytes of text data as characters (for example, single- as opposed to
> > multi-byte characters), the classification of characters (for
> > example, alpha, digit, graph) and the behaviour of character classes.
> > Additional semantics of this variable, if any, are
> > implementation-dependent.

See the "interpretation of sequences of bytes of text data as
characters" parts: that's what causes Emacs to use LC_CTYPE to setup
the language environment.  So we do follow Posix, AFAIU.





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#27505: acknowledged by developer (Re: bug#27505: LC_CTYPE affects tutorial language)
  2017-08-05  9:17           ` Eli Zaretskii
@ 2017-08-05  9:52             ` Leonard Lausen
  2017-08-05 10:15               ` Eli Zaretskii
  0 siblings, 1 reply; 18+ messages in thread
From: Leonard Lausen @ 2017-08-05  9:52 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 27505

> I don't see any experts we have who could fix that, unfortunately.

> But I don't see why that would be a problem for you: if you don't want
> that Emacs language environment be Chinese when you use XIM, you
> should be able to invoke set-language-environment inside Emacs after
> starting it, to set the language environment to something other than
> Chinese.  Does that work for you?

That is a good workaround. I created this bug report, as I would expect
this as default behavior though.

Unfortunately XIM currently does not work for me at all. So I can't
confirm that changing set-language-environment won't stop XIM from
working. (Though XIM  worked for me before making a switch from
Debian-based to Gentoo.. Bug
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=27312 ).

>> As far as I understand the current behavior of emacs to change the
>> interface language based on LC_CTYPE is application defined behavior
>> that is not part of Posix. Posix only says:
>>
>>> This variable determines the locale category for character handling
>>> functions, such as tolower(), toupper() and isalpha(). This
>>> environment variable determines the interpretation of sequences of
>>> bytes of text data as characters (for example, single- as opposed to
>>> multi-byte characters), the classification of characters (for
>>> example, alpha, digit, graph) and the behaviour of character classes.
>>> Additional semantics of this variable, if any, are
>>> implementation-dependent.
> 
> See the "interpretation of sequences of bytes of text data as
> characters" parts: that's what causes Emacs to use LC_CTYPE to setup
> the language environment.  So we do follow Posix, AFAIU

Hm, as long as LANG and LC_CTYPE both are UTF-8 locales, the
interpretation of bytes would be the same. In principle the interface
language is independent from the interpretation of bytes right? One
could just parse the first part of LANG (i.e. "en_EN") do decide the
display language but follow LC_CTYPE for the interpretation of bytes.
This seems also to be what the majority of applications are doing, given
that I set LC_CTYPE to Chinese system wide, but only emacs (and Dropbox)
are changing their interface language (more specifically the tutorial
language).





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#27505: acknowledged by developer (Re: bug#27505: LC_CTYPE affects tutorial language)
  2017-08-05  9:52             ` Leonard Lausen
@ 2017-08-05 10:15               ` Eli Zaretskii
  2017-08-05 10:50                 ` Leonard Lausen
  2022-04-17 19:44                 ` bug#27505: LC_CTYPE affects tutorial language Lars Ingebrigtsen
  0 siblings, 2 replies; 18+ messages in thread
From: Eli Zaretskii @ 2017-08-05 10:15 UTC (permalink / raw)
  To: Leonard Lausen; +Cc: 27505

> Cc: 27505@debbugs.gnu.org
> From: Leonard Lausen <leonard@lausen.nl>
> Date: Sat, 5 Aug 2017 18:52:38 +0900
> 
> > But I don't see why that would be a problem for you: if you don't want
> > that Emacs language environment be Chinese when you use XIM, you
> > should be able to invoke set-language-environment inside Emacs after
> > starting it, to set the language environment to something other than
> > Chinese.  Does that work for you?
> 
> That is a good workaround. I created this bug report, as I would expect
> this as default behavior though.

The default behavior is very unlikely to change, sorry.  It took us
many years to arrive at the current behavior, so changing that for a
single use case, even if it's deemed important, makes little sense to
me.

> > See the "interpretation of sequences of bytes of text data as
> > characters" parts: that's what causes Emacs to use LC_CTYPE to setup
> > the language environment.  So we do follow Posix, AFAIU
> 
> Hm, as long as LANG and LC_CTYPE both are UTF-8 locales, the
> interpretation of bytes would be the same.

Yes, but LANG is the fallback in case LC_* are not defined, so I don't
think how LANG set to a different language than LC_CTYPE could be
according to Posix.

> In principle the interface
> language is independent from the interpretation of bytes right? One
> could just parse the first part of LANG (i.e. "en_EN") do decide the
> display language but follow LC_CTYPE for the interpretation of bytes.
> This seems also to be what the majority of applications are doing, given
> that I set LC_CTYPE to Chinese system wide, but only emacs (and Dropbox)
> are changing their interface language (more specifically the tutorial
> language).

In Emacs, "display language" is just one aspect of the multi-lingual
environment.  So I'm afraid if the default is not to your liking, you
will have to customize the individual aspects of the language
environment separately, as you see fit.  That's why those variables
exist in the first place -- to tailor the Emacs operation to even rare
and non-typical use cases.





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#27505: acknowledged by developer (Re: bug#27505: LC_CTYPE affects tutorial language)
  2017-08-05 10:15               ` Eli Zaretskii
@ 2017-08-05 10:50                 ` Leonard Lausen
  2017-08-05 11:09                   ` Andreas Schwab
  2022-04-17 19:44                 ` bug#27505: LC_CTYPE affects tutorial language Lars Ingebrigtsen
  1 sibling, 1 reply; 18+ messages in thread
From: Leonard Lausen @ 2017-08-05 10:50 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 27505

>>> See the "interpretation of sequences of bytes of text data as
>>> characters" parts: that's what causes Emacs to use LC_CTYPE to setup
>>> the language environment.  So we do follow Posix, AFAIU
>>
>> Hm, as long as LANG and LC_CTYPE both are UTF-8 locales, the
>> interpretation of bytes would be the same.
> 
> Yes, but LANG is the fallback in case LC_* are not defined, so I don't
> think how LANG set to a different language than LC_CTYPE could be
> according to Posix.

Well, it's a fallback for the things that the respectively undefined
LC_* variable would define. So the argument here is that LC_CTYPE
according to POSIX does not define the interface language.

The current behavior of emacs can only be justified by the "Additional
semantics of this variable, if any, are implementation-dependent."
clause for the LC_CTYPE variable. Note though that I have besides
Dropbox not found a single program which uses LC_CTYPE to set the
interface language. Instead those other programs rely on LANG. You may
try "LANG=zh_CN.utf8 vim" compared to  "LC_CTYPE=zh_CN.utf8 vim" as
example. Also the name of LANG compared to LC_CTYPE does somewhat
suggest to me that it should define the interface language whereas CTYPE
should define the "character types" (?) ;)

So I agree with the previous comment that LANG should take precedence
over LC_CTYPE with regards to the interface language. Not sure if the
current emacs implementation allows that change without affecting the
settings where LC_CTYPE does change precedence over LANG.

But nevermind if you prefer to keep the current behavior. You taught me
how to overwrite the language variable manually, so while I still am
unhappy about emacs behaving differently to most applications, my
immediate concern is resolved ;)





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#27505: acknowledged by developer (Re: bug#27505: LC_CTYPE affects tutorial language)
  2017-08-05 10:50                 ` Leonard Lausen
@ 2017-08-05 11:09                   ` Andreas Schwab
  2017-08-05 11:20                     ` Leonard Lausen
  0 siblings, 1 reply; 18+ messages in thread
From: Andreas Schwab @ 2017-08-05 11:09 UTC (permalink / raw)
  To: Leonard Lausen; +Cc: 27505

On Aug 05 2017, Leonard Lausen <leonard@lausen.nl> wrote:

> So I agree with the previous comment that LANG should take precedence
> over LC_CTYPE with regards to the interface language. Not sure if the
> current emacs implementation allows that change without affecting the
> settings where LC_CTYPE does change precedence over LANG.

LANG never takes precedence over other LC_* values, it only serves as
the default for them.  An interface that uses LC_CTYPE must ignore LANG
when LC_CTYPE is set.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#27505: acknowledged by developer (Re: bug#27505: LC_CTYPE affects tutorial language)
  2017-08-05 11:09                   ` Andreas Schwab
@ 2017-08-05 11:20                     ` Leonard Lausen
  2017-08-05 11:22                       ` Leonard Lausen
  0 siblings, 1 reply; 18+ messages in thread
From: Leonard Lausen @ 2017-08-05 11:20 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: 27505

>> So I agree with the previous comment that LANG should take precedence
>> over LC_CTYPE with regards to the interface language. Not sure if the
>> current emacs implementation allows that change without affecting the
>> settings where LC_CTYPE does change precedence over LANG.
> 
> LANG never takes precedence over other LC_* values, it only serves as
> the default for them.  An interface that uses LC_CTYPE must ignore LANG
> when LC_CTYPE is set.

I agree that LC_CTYPE always takes precedence for the things that
LC_CTYPE defines according to the POSIX standard. However, as far as I
understand the display language is not defined by LC_CTYPE. LC_CTYPE
defines "Character classification and case conversion".

The closest would be LC_MESSAGES ("Formats of informative and diagnostic
messages and interactive responses."). I just tried, and for example vim
uses indeed LC_MESSAGES to decide on the interface language. So does
Chromium and KDE applications such as okular..






^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#27505: acknowledged by developer (Re: bug#27505: LC_CTYPE affects tutorial language)
  2017-08-05 11:20                     ` Leonard Lausen
@ 2017-08-05 11:22                       ` Leonard Lausen
  0 siblings, 0 replies; 18+ messages in thread
From: Leonard Lausen @ 2017-08-05 11:22 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: 27505

On 08/05/2017 08:20 PM, Leonard Lausen wrote:
>>> So I agree with the previous comment that LANG should take precedence
>>> over LC_CTYPE with regards to the interface language. Not sure if the
>>> current emacs implementation allows that change without affecting the
>>> settings where LC_CTYPE does change precedence over LANG.
>>
>> LANG never takes precedence over other LC_* values, it only serves as
>> the default for them.  An interface that uses LC_CTYPE must ignore LANG
>> when LC_CTYPE is set.
> 
> I agree that LC_CTYPE always takes precedence for the things that
> LC_CTYPE defines according to the POSIX standard. However, as far as I
> understand the display language is not defined by LC_CTYPE. LC_CTYPE
> defines "Character classification and case conversion".
> 
> The closest would be LC_MESSAGES ("Formats of informative and diagnostic
> messages and interactive responses."). I just tried, and for example vim
> uses indeed LC_MESSAGES to decide on the interface language. So does
> Chromium and KDE applications such as okular..

So what I mean to say is that LANG should take precedence over LC_CTYPE
with respect to the interface language. At least as long as LC_MESSAGES
is not defined. Of course you can argue if the interface language is the
same as "Formats of informative and diagnostic messages and interactive
responses.". But if you disagree with that, then LANG should always take
precedence.





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#27505: LC_CTYPE affects tutorial language
  2017-08-05 10:15               ` Eli Zaretskii
  2017-08-05 10:50                 ` Leonard Lausen
@ 2022-04-17 19:44                 ` Lars Ingebrigtsen
  1 sibling, 0 replies; 18+ messages in thread
From: Lars Ingebrigtsen @ 2022-04-17 19:44 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Leonard Lausen, 27505

Eli Zaretskii <eliz@gnu.org> writes:

> The default behavior is very unlikely to change, sorry.  It took us
> many years to arrive at the current behavior, so changing that for a
> single use case, even if it's deemed important, makes little sense to
> me.

(I'm going through old bug reports that unfortunately weren't resolved
at the time.)

Skimming this bug report, I think the conclusion was that we don't want
to change how these variables work in Emacs at this point, so I'm
therefore closing this bug report.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-04-17 19:44 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <871soq7pyr.fsf@users.sourceforge.net>
2017-06-27 14:48 ` bug#27505: LC_CTYPE affects tutorial language Leonard Lausen
2017-06-27 15:05   ` Eli Zaretskii
2017-06-27 15:13   ` Andreas Schwab
     [not found]   ` <handler.27505.C.150189707129878.notifdonectrl.0@debbugs.gnu.org>
2017-08-05  1:54     ` bug#27505: acknowledged by developer (Re: bug#27505: LC_CTYPE affects tutorial language) Leonard Lausen
2017-08-05  2:06       ` npostavs
2017-08-05  5:59         ` Leonard Lausen
2017-08-05  7:10         ` Eli Zaretskii
2017-08-05  7:06       ` Eli Zaretskii
2017-08-05  8:17         ` Leonard Lausen
2017-08-05  9:17           ` Eli Zaretskii
2017-08-05  9:52             ` Leonard Lausen
2017-08-05 10:15               ` Eli Zaretskii
2017-08-05 10:50                 ` Leonard Lausen
2017-08-05 11:09                   ` Andreas Schwab
2017-08-05 11:20                     ` Leonard Lausen
2017-08-05 11:22                       ` Leonard Lausen
2022-04-17 19:44                 ` bug#27505: LC_CTYPE affects tutorial language Lars Ingebrigtsen
2017-08-05  8:18         ` bug#27505: acknowledged by developer (Re: bug#27505: LC_CTYPE affects tutorial language) Leonard Lausen

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).