unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#51832: Piping unicode text in `shell-command'
@ 2021-11-14  3:10 Tor Kringeland
  2021-11-14  7:26 ` Eli Zaretskii
  0 siblings, 1 reply; 24+ messages in thread
From: Tor Kringeland @ 2021-11-14  3:10 UTC (permalink / raw)
  To: 51832

Running

  (shell-command "echo -n '悟' | pbcopy")

or

  (shell-command "echo -n 'øøøø' | pbcopy")

fills the clipboard with `ÊÇü' and `√∏', respectively, while if I run
the same commands in a terminal emulator outside Emacs I get back the
original input.  The same happens if I run the same shell commands in
`eshell'.  This happens when I run a recent build of Emacs 29 with `-Q'
on macOS Catalina.





^ permalink raw reply	[flat|nested] 24+ messages in thread

* bug#51832: Piping unicode text in `shell-command'
  2021-11-14  3:10 bug#51832: Piping unicode text in `shell-command' Tor Kringeland
@ 2021-11-14  7:26 ` Eli Zaretskii
  2021-11-14  7:53   ` Lars Ingebrigtsen
  0 siblings, 1 reply; 24+ messages in thread
From: Eli Zaretskii @ 2021-11-14  7:26 UTC (permalink / raw)
  To: Tor Kringeland; +Cc: 51832

> From: Tor Kringeland <tor.a.s.kringeland@ntnu.no>
> Date: Sun, 14 Nov 2021 04:10:10 +0100
> 
> Running
> 
>   (shell-command "echo -n '悟' | pbcopy")
> 
> or
> 
>   (shell-command "echo -n 'øøøø' | pbcopy")
> 
> fills the clipboard with `ÊÇü' and `√∏', respectively, while if I run
> the same commands in a terminal emulator outside Emacs I get back the
> original input.  The same happens if I run the same shell commands in
> `eshell'.  This happens when I run a recent build of Emacs 29 with `-Q'
> on macOS Catalina.

Please be specific about the "recent build" part: which commit are you
using?  There were some problems with the clipboard that were recently
fixed.

Also, do older versions of Emacs behave differently with that command?





^ permalink raw reply	[flat|nested] 24+ messages in thread

* bug#51832: Piping unicode text in `shell-command'
  2021-11-14  7:26 ` Eli Zaretskii
@ 2021-11-14  7:53   ` Lars Ingebrigtsen
  2021-11-14  8:13     ` Eli Zaretskii
  0 siblings, 1 reply; 24+ messages in thread
From: Lars Ingebrigtsen @ 2021-11-14  7:53 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Tor Kringeland, 51832

Eli Zaretskii <eliz@gnu.org> writes:

>> Running
>> 
>>   (shell-command "echo -n '悟' | pbcopy")
>> 
>> or
>> 
>>   (shell-command "echo -n 'øøøø' | pbcopy")
>> 
>> fills the clipboard with `ÊÇü' and `√∏', respectively, while if I run
>> the same commands in a terminal emulator outside Emacs I get back the
>> original input.  The same happens if I run the same shell commands in
>> `eshell'.  This happens when I run a recent build of Emacs 29 with `-Q'
>> on macOS Catalina.
>
> Please be specific about the "recent build" part: which commit are you
> using?

I'm seeing the same issue with the current tree on Macos.

> There were some problems with the clipboard that were recently fixed.

This doesn't involve Emacs' interactions with the clipboard, though --
the pbcopy command is what's putting things on the clipboard.  But
pbcopy's apparently misinterpreting the bytes it's getting over the pipe
somehow, which is surprising, because I assumed shell-command just sent
the entire string to a shell for execution.  (But I haven't read the
code.)

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 24+ messages in thread

* bug#51832: Piping unicode text in `shell-command'
  2021-11-14  7:53   ` Lars Ingebrigtsen
@ 2021-11-14  8:13     ` Eli Zaretskii
  2021-11-14  8:18       ` Lars Ingebrigtsen
  0 siblings, 1 reply; 24+ messages in thread
From: Eli Zaretskii @ 2021-11-14  8:13 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: tor.a.s.kringeland, 51832

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: Tor Kringeland <tor.a.s.kringeland@ntnu.no>,  51832@debbugs.gnu.org
> Date: Sun, 14 Nov 2021 08:53:51 +0100
> 
> >>   (shell-command "echo -n '悟' | pbcopy")
> >> 
> >> or
> >> 
> >>   (shell-command "echo -n 'øøøø' | pbcopy")
> >> 
> >> fills the clipboard with `ÊÇü' and `√∏', respectively, while if I run
> >> the same commands in a terminal emulator outside Emacs I get back the
> >> original input.  The same happens if I run the same shell commands in
> >> `eshell'.  This happens when I run a recent build of Emacs 29 with `-Q'
> >> on macOS Catalina.
> >
> > Please be specific about the "recent build" part: which commit are you
> > using?
> 
> I'm seeing the same issue with the current tree on Macos.
> 
> > There were some problems with the clipboard that were recently fixed.
> 
> This doesn't involve Emacs' interactions with the clipboard, though --
> the pbcopy command is what's putting things on the clipboard.  But
> pbcopy's apparently misinterpreting the bytes it's getting over the pipe
> somehow, which is surprising, because I assumed shell-command just sent
> the entire string to a shell for execution.  (But I haven't read the
> code.)

It could be useful to replace the pipe with redirection to a file, and
see what you get when invoking the command from Emacs and from a shell
prompt outside Emacs.





^ permalink raw reply	[flat|nested] 24+ messages in thread

* bug#51832: Piping unicode text in `shell-command'
  2021-11-14  8:13     ` Eli Zaretskii
@ 2021-11-14  8:18       ` Lars Ingebrigtsen
  2021-11-14  8:25         ` Eli Zaretskii
  0 siblings, 1 reply; 24+ messages in thread
From: Lars Ingebrigtsen @ 2021-11-14  8:18 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: tor.a.s.kringeland, 51832, Alan Third

Eli Zaretskii <eliz@gnu.org> writes:

> It could be useful to replace the pipe with redirection to a file, and
> see what you get when invoking the command from Emacs and from a shell
> prompt outside Emacs.

Good point.  I tried that now (with "| cat > /tmp/" to get a pipe in
there), and the contents that were written to file were correct utf-8.

Mysterious.  Could the problem be in pbcopy -- that's assuming something
about the coding system when run from inside Emacs somehow?  That
doesn't sound very likely, but...

I've added Alan to the CCs; perhaps he has some insights here.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 24+ messages in thread

* bug#51832: Piping unicode text in `shell-command'
  2021-11-14  8:18       ` Lars Ingebrigtsen
@ 2021-11-14  8:25         ` Eli Zaretskii
  2021-11-14  9:19           ` Lars Ingebrigtsen
  2021-11-14  9:32           ` Lars Ingebrigtsen
  0 siblings, 2 replies; 24+ messages in thread
From: Eli Zaretskii @ 2021-11-14  8:25 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: tor.a.s.kringeland, 51832, alan

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: tor.a.s.kringeland@ntnu.no,  51832@debbugs.gnu.org, Alan Third
>  <alan@idiocy.org>
> Date: Sun, 14 Nov 2021 09:18:08 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > It could be useful to replace the pipe with redirection to a file, and
> > see what you get when invoking the command from Emacs and from a shell
> > prompt outside Emacs.
> 
> Good point.  I tried that now (with "| cat > /tmp/" to get a pipe in
> there), and the contents that were written to file were correct utf-8.
> 
> Mysterious.  Could the problem be in pbcopy -- that's assuming something
> about the coding system when run from inside Emacs somehow?  That
> doesn't sound very likely, but...

Maybe we set some locale-related environment variable, and that was
confuses pbcopy when it is run from Emacs?





^ permalink raw reply	[flat|nested] 24+ messages in thread

* bug#51832: Piping unicode text in `shell-command'
  2021-11-14  8:25         ` Eli Zaretskii
@ 2021-11-14  9:19           ` Lars Ingebrigtsen
  2021-11-14  9:32           ` Lars Ingebrigtsen
  1 sibling, 0 replies; 24+ messages in thread
From: Lars Ingebrigtsen @ 2021-11-14  9:19 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: tor.a.s.kringeland, 51832, alan

Eli Zaretskii <eliz@gnu.org> writes:

> Maybe we set some locale-related environment variable, and that was
> confuses pbcopy when it is run from Emacs?

I've now followed the call tree, and we end up doing:

(call-process-region (point) (point) shell-file-name nil
                     (current-buffer) nil shell-command-switch
                     "echo foo😀bar | pbcopy")

And that fails, too.  So it's not something that shell-command sets up
(if it's a locale-related thing).

Hm...  Oh!  I thought the original report said that this worked if run
under M-x shell.  But it doesn't -- I get the same garbled selection.
(And it works fine in a shell outside Emacs.)

So it could indeed be a locale setting in Emacs that's making pbcopy do
the wrong thing. 

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 24+ messages in thread

* bug#51832: Piping unicode text in `shell-command'
  2021-11-14  8:25         ` Eli Zaretskii
  2021-11-14  9:19           ` Lars Ingebrigtsen
@ 2021-11-14  9:32           ` Lars Ingebrigtsen
  2021-11-14  9:46             ` Lars Ingebrigtsen
  1 sibling, 1 reply; 24+ messages in thread
From: Lars Ingebrigtsen @ 2021-11-14  9:32 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: tor.a.s.kringeland, 51832, alan

It's a bug in...  the locale settings.  Testing in the console,

echo fóo | LANG=en_US.utf-8 pbcopy

works fine, but

echo fóo | LANG=en_NO.utf-8 pbcopy

doesn't.  And that's the setting in Emacs for me.  It's correct that I
am in Norway and that I'm using the English locale, but there's no such
locale as en_NO.utf-8.

Didn't Emacs on Macos recently get some locale-related changes?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no






^ permalink raw reply	[flat|nested] 24+ messages in thread

* bug#51832: Piping unicode text in `shell-command'
  2021-11-14  9:32           ` Lars Ingebrigtsen
@ 2021-11-14  9:46             ` Lars Ingebrigtsen
  2021-11-14 10:31               ` Eli Zaretskii
  0 siblings, 1 reply; 24+ messages in thread
From: Lars Ingebrigtsen @ 2021-11-14  9:46 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: tor.a.s.kringeland, 51832, alan

Lars Ingebrigtsen <larsi@gnus.org> writes:

> doesn't.  And that's the setting in Emacs for me.  It's correct that I
> am in Norway and that I'm using the English locale, but there's no such
> locale as en_NO.utf-8.
>
> Didn't Emacs on Macos recently get some locale-related changes?

It's this code, I guess, from 2016, so it's not recent:

  NSLocale *locale = [NSLocale currentLocale];

  NSTRACE ("ns_init_locale");

  @try
    {
      /* It seems macOS should probably use UTF-8 everywhere.
         'localeIdentifier' does not specify the encoding, and I can't
         find any way to get the OS to tell us which encoding to use,
         so hard-code '.UTF-8'.  */
      NSString *localeID = [NSString stringWithFormat:@"%@.UTF-8",
                                     [locale localeIdentifier]];

      /* Set LANG to locale, but not if LANG is already set.  */
      setenv("LANG", [localeID UTF8String], 0);
    }

And...  it's a Macos bug?  Googling a bit seems to say that this does
indeed return invalid locale identifiers -- just language glued together
with the country, resulting in identifiers that doesn't match any
locales the OS knows about.

So...  I don't know what to do about that.  Is there a way to check that
the identifier is valid?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 24+ messages in thread

* bug#51832: Piping unicode text in `shell-command'
  2021-11-14  9:46             ` Lars Ingebrigtsen
@ 2021-11-14 10:31               ` Eli Zaretskii
  2021-11-14 10:41                 ` Philipp
  0 siblings, 1 reply; 24+ messages in thread
From: Eli Zaretskii @ 2021-11-14 10:31 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: tor.a.s.kringeland, 51832, alan

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: tor.a.s.kringeland@ntnu.no,  51832@debbugs.gnu.org,  alan@idiocy.org
> Date: Sun, 14 Nov 2021 10:46:05 +0100
> 
>   NSLocale *locale = [NSLocale currentLocale];
> 
>   NSTRACE ("ns_init_locale");
> 
>   @try
>     {
>       /* It seems macOS should probably use UTF-8 everywhere.
>          'localeIdentifier' does not specify the encoding, and I can't
>          find any way to get the OS to tell us which encoding to use,
>          so hard-code '.UTF-8'.  */
>       NSString *localeID = [NSString stringWithFormat:@"%@.UTF-8",
>                                      [locale localeIdentifier]];
> 
>       /* Set LANG to locale, but not if LANG is already set.  */
>       setenv("LANG", [localeID UTF8String], 0);
>     }
> 
> And...  it's a Macos bug?  Googling a bit seems to say that this does
> indeed return invalid locale identifiers -- just language glued together
> with the country, resulting in identifiers that doesn't match any
> locales the OS knows about.
> 
> So...  I don't know what to do about that.  Is there a way to check that
> the identifier is valid?

I asked once why we push LANG into the environment, instead of calling
setlocale, which would only affect Emacs.  I don't think I saw an
answer to that question, or did I miss it?





^ permalink raw reply	[flat|nested] 24+ messages in thread

* bug#51832: Piping unicode text in `shell-command'
  2021-11-14 10:31               ` Eli Zaretskii
@ 2021-11-14 10:41                 ` Philipp
  2021-11-14 10:56                   ` Eli Zaretskii
  0 siblings, 1 reply; 24+ messages in thread
From: Philipp @ 2021-11-14 10:41 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 51832, Lars Ingebrigtsen, alan, tor.a.s.kringeland



> Am 14.11.2021 um 11:31 schrieb Eli Zaretskii <eliz@gnu.org>:
> 
>> From: Lars Ingebrigtsen <larsi@gnus.org>
>> Cc: tor.a.s.kringeland@ntnu.no,  51832@debbugs.gnu.org,  alan@idiocy.org
>> Date: Sun, 14 Nov 2021 10:46:05 +0100
>> 
>>  NSLocale *locale = [NSLocale currentLocale];
>> 
>>  NSTRACE ("ns_init_locale");
>> 
>>  @try
>>    {
>>      /* It seems macOS should probably use UTF-8 everywhere.
>>         'localeIdentifier' does not specify the encoding, and I can't
>>         find any way to get the OS to tell us which encoding to use,
>>         so hard-code '.UTF-8'.  */
>>      NSString *localeID = [NSString stringWithFormat:@"%@.UTF-8",
>>                                     [locale localeIdentifier]];
>> 
>>      /* Set LANG to locale, but not if LANG is already set.  */
>>      setenv("LANG", [localeID UTF8String], 0);
>>    }
>> 
>> And...  it's a Macos bug?  Googling a bit seems to say that this does
>> indeed return invalid locale identifiers -- just language glued together
>> with the country, resulting in identifiers that doesn't match any
>> locales the OS knows about.
>> 
>> So...  I don't know what to do about that.  Is there a way to check that
>> the identifier is valid?
> 
> I asked once why we push LANG into the environment, instead of calling
> setlocale, which would only affect Emacs.  I don't think I saw an
> answer to that question, or did I miss it?
> 

AIUI the intention is that this should affect subprocesses started from Emacs.  At least that's how I interpret the comment

/* macOS doesn't set any environment variables for the locale when run
   from the GUI. Get the locale from the OS and set LANG.  */






^ permalink raw reply	[flat|nested] 24+ messages in thread

* bug#51832: Piping unicode text in `shell-command'
  2021-11-14 10:41                 ` Philipp
@ 2021-11-14 10:56                   ` Eli Zaretskii
  2021-11-14 11:20                     ` Lars Ingebrigtsen
  2021-11-14 12:31                     ` Alan Third
  0 siblings, 2 replies; 24+ messages in thread
From: Eli Zaretskii @ 2021-11-14 10:56 UTC (permalink / raw)
  To: Philipp; +Cc: 51832, larsi, alan, tor.a.s.kringeland

> From: Philipp <p.stephani2@gmail.com>
> Date: Sun, 14 Nov 2021 11:41:38 +0100
> Cc: Lars Ingebrigtsen <larsi@gnus.org>,
>  tor.a.s.kringeland@ntnu.no,
>  51832@debbugs.gnu.org,
>  alan@idiocy.org
> 
> > I asked once why we push LANG into the environment, instead of calling
> > setlocale, which would only affect Emacs.  I don't think I saw an
> > answer to that question, or did I miss it?
> > 
> 
> AIUI the intention is that this should affect subprocesses started from Emacs.  At least that's how I interpret the comment
> 
> /* macOS doesn't set any environment variables for the locale when run
>    from the GUI. Get the locale from the OS and set LANG.  */

Why is that needed?

And if it is needed, how come we are setting LANG to an invalid locale
and the system somehow sets it to the correct locale?





^ permalink raw reply	[flat|nested] 24+ messages in thread

* bug#51832: Piping unicode text in `shell-command'
  2021-11-14 10:56                   ` Eli Zaretskii
@ 2021-11-14 11:20                     ` Lars Ingebrigtsen
  2021-11-14 11:48                       ` Philipp
  2021-11-14 12:16                       ` Eli Zaretskii
  2021-11-14 12:31                     ` Alan Third
  1 sibling, 2 replies; 24+ messages in thread
From: Lars Ingebrigtsen @ 2021-11-14 11:20 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 51832, Philipp, alan, tor.a.s.kringeland

Eli Zaretskii <eliz@gnu.org> writes:

> And if it is needed, how come we are setting LANG to an invalid locale
> and the system somehow sets it to the correct locale?

LANG outside of Emacs is "" for me on Macos.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 24+ messages in thread

* bug#51832: Piping unicode text in `shell-command'
  2021-11-14 11:20                     ` Lars Ingebrigtsen
@ 2021-11-14 11:48                       ` Philipp
  2021-11-14 12:16                       ` Eli Zaretskii
  1 sibling, 0 replies; 24+ messages in thread
From: Philipp @ 2021-11-14 11:48 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 51832, alan, tor.a.s.kringeland



> Am 14.11.2021 um 12:20 schrieb Lars Ingebrigtsen <larsi@gnus.org>:
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
>> And if it is needed, how come we are setting LANG to an invalid locale
>> and the system somehow sets it to the correct locale?
> 
> LANG outside of Emacs is "" for me on Macos.

For reference, on my Monterey system, only the following variables are initially set when launching Emacs from Finder:

__CF_USER_TEXT_ENCODING=0x1F5:0x0:0x3
__CFBundleIdentifier=org.gnu.Emacs
COMMAND_MODE=unix2003
DISPLAY=/private/tmp/com.apple.launchd.[...]/org.macosforge.xquartz:0
HOME=/Users/p
LOGNAME=p
PATH=/usr/bin:/bin:/usr/sbin:/sbin
SHELL=/opt/homebrew/bin/bash
SSH_AUTH_SOCK=/private/tmp/com.apple.launchd.[...]/Listeners
TMPDIR=/var/folders/hw/[...]/T/
USER=p
XPC_FLAGS=0x0
XPC_SERVICE_NAME=application.org.gnu.Emacs.[...]






^ permalink raw reply	[flat|nested] 24+ messages in thread

* bug#51832: Piping unicode text in `shell-command'
  2021-11-14 11:20                     ` Lars Ingebrigtsen
  2021-11-14 11:48                       ` Philipp
@ 2021-11-14 12:16                       ` Eli Zaretskii
  1 sibling, 0 replies; 24+ messages in thread
From: Eli Zaretskii @ 2021-11-14 12:16 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 51832, p.stephani2, alan, tor.a.s.kringeland

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: Philipp <p.stephani2@gmail.com>,  tor.a.s.kringeland@ntnu.no,
>   51832@debbugs.gnu.org,  alan@idiocy.org
> Date: Sun, 14 Nov 2021 12:20:03 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > And if it is needed, how come we are setting LANG to an invalid locale
> > and the system somehow sets it to the correct locale?
> 
> LANG outside of Emacs is "" for me on Macos.

And that doesn't work when running applications from inside Emacs?  If
it does work, why do we set LANG in Emacs?





^ permalink raw reply	[flat|nested] 24+ messages in thread

* bug#51832: Piping unicode text in `shell-command'
  2021-11-14 10:56                   ` Eli Zaretskii
  2021-11-14 11:20                     ` Lars Ingebrigtsen
@ 2021-11-14 12:31                     ` Alan Third
  2021-11-14 13:41                       ` Lars Ingebrigtsen
  1 sibling, 1 reply; 24+ messages in thread
From: Alan Third @ 2021-11-14 12:31 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, 51832, Philipp, tor.a.s.kringeland

[-- Attachment #1: Type: text/plain, Size: 1923 bytes --]

On Sun, Nov 14, 2021 at 12:56:14PM +0200, Eli Zaretskii wrote:
> > From: Philipp <p.stephani2@gmail.com>
> > Date: Sun, 14 Nov 2021 11:41:38 +0100
> > Cc: Lars Ingebrigtsen <larsi@gnus.org>,
> >  tor.a.s.kringeland@ntnu.no,
> >  51832@debbugs.gnu.org,
> >  alan@idiocy.org
> > 
> > > I asked once why we push LANG into the environment, instead of calling
> > > setlocale, which would only affect Emacs.  I don't think I saw an
> > > answer to that question, or did I miss it?
> > > 
> > 
> > AIUI the intention is that this should affect subprocesses started from Emacs.  At least that's how I interpret the comment
> > 
> > /* macOS doesn't set any environment variables for the locale when run
> >    from the GUI. Get the locale from the OS and set LANG.  */
> 
> Why is that needed?
> 
> And if it is needed, how come we are setting LANG to an invalid locale
> and the system somehow sets it to the correct locale?

macOS itself doesn't set any locale related environment variables, any
application that is running UNIX style commands is expected to set
them itself. The UNIX commands don't themselves pick up the locale
from the system, they rely on the environment variables.

In other words, as with anything UNIXy on macOS, it's a badly thought
out mess.

It seems suspicious to me that we've had this code since Emacs 26, but
only in the last few weeks we've had two complaints about it. Having
dug out my Mac I can't convince it to show any of the errors that have
been reported, so I suspect either the latest version of macOS has
made the locale handling much more strict or has removed a lot of
locales.

I've attached a patch that may do something towards preventing this
problem but ultimately this is a convenience to give a best guess at
choosing the correct dictionary, date format, etc. If we can't easily
fix it then we can drop it and tell people to set it in their init.el
themselves.

-- 
Alan Third

[-- Attachment #2: 0001-Only-set-LANG-if-the-ID-is-valid.patch --]
[-- Type: text/x-diff, Size: 2059 bytes --]

From ff67f1cbee3c0b1fd5b1a0d725e40158190cfe55 Mon Sep 17 00:00:00 2001
From: Alan Third <alan@idiocy.org>
Date: Sun, 14 Nov 2021 11:32:54 +0000
Subject: [PATCH] Only set LANG if the ID is valid

* src/nsterm.m (ns_init_locale): Check the provided locale identifier
is available before trying to use it.
---
 src/nsterm.m | 32 ++++++++++++++++++--------------
 1 file changed, 18 insertions(+), 14 deletions(-)

diff --git a/src/nsterm.m b/src/nsterm.m
index 1f17a30272..566537e8a1 100644
--- a/src/nsterm.m
+++ b/src/nsterm.m
@@ -535,21 +535,25 @@ - (NSColor *)colorUsingDefaultColorSpace
 
   NSTRACE ("ns_init_locale");
 
-  @try
+  if ([[NSLocale availableLocaleIdentifiers]
+        containsObject:[locale localeIdentifier]])
     {
-      /* It seems macOS should probably use UTF-8 everywhere.
-         'localeIdentifier' does not specify the encoding, and I can't
-         find any way to get the OS to tell us which encoding to use,
-         so hard-code '.UTF-8'.  */
-      NSString *localeID = [NSString stringWithFormat:@"%@.UTF-8",
-                                     [locale localeIdentifier]];
-
-      /* Set LANG to locale, but not if LANG is already set.  */
-      setenv("LANG", [localeID UTF8String], 0);
-    }
-  @catch (NSException *e)
-    {
-      NSLog (@"Locale detection failed: %@: %@", [e name], [e reason]);
+      @try
+        {
+          /* It seems macOS should probably use UTF-8 everywhere.
+             'localeIdentifier' does not specify the encoding, and I can't
+             find any way to get the OS to tell us which encoding to use,
+             so hard-code '.UTF-8'.  */
+          NSString *localeID = [NSString stringWithFormat:@"%@.UTF-8",
+                                         [locale localeIdentifier]];
+
+          /* Set LANG to locale, but not if LANG is already set.  */
+          setenv("LANG", [localeID UTF8String], 0);
+        }
+      @catch (NSException *e)
+        {
+          NSLog (@"Locale detection failed: %@: %@", [e name], [e reason]);
+        }
     }
 }
 
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* bug#51832: Piping unicode text in `shell-command'
  2021-11-14 12:31                     ` Alan Third
@ 2021-11-14 13:41                       ` Lars Ingebrigtsen
  2021-11-14 14:23                         ` Philipp
  0 siblings, 1 reply; 24+ messages in thread
From: Lars Ingebrigtsen @ 2021-11-14 13:41 UTC (permalink / raw)
  To: Alan Third; +Cc: Philipp, 51832, tor.a.s.kringeland

Alan Third <alan@idiocy.org> writes:

> I've attached a patch that may do something towards preventing this
> problem but ultimately this is a convenience to give a best guess at
> choosing the correct dictionary, date format, etc. If we can't easily
> fix it then we can drop it and tell people to set it in their init.el
> themselves.

That didn't fix the issue for me, I'm afraid -- with that patch, LANG is
still the invalid en_NO.UTF-8 for me.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 24+ messages in thread

* bug#51832: Piping unicode text in `shell-command'
  2021-11-14 13:41                       ` Lars Ingebrigtsen
@ 2021-11-14 14:23                         ` Philipp
  2021-11-14 14:28                           ` Lars Ingebrigtsen
  2021-11-14 15:01                           ` Daniel Martín via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 2 replies; 24+ messages in thread
From: Philipp @ 2021-11-14 14:23 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 51832, Alan Third, tor.a.s.kringeland



> Am 14.11.2021 um 14:41 schrieb Lars Ingebrigtsen <larsi@gnus.org>:
> 
> Alan Third <alan@idiocy.org> writes:
> 
>> I've attached a patch that may do something towards preventing this
>> problem but ultimately this is a convenience to give a best guess at
>> choosing the correct dictionary, date format, etc. If we can't easily
>> fix it then we can drop it and tell people to set it in their init.el
>> themselves.
> 
> That didn't fix the issue for me, I'm afraid -- with that patch, LANG is
> still the invalid en_NO.UTF-8 for me.

Maybe we should add similar logic as iTerm2 (https://github.com/gnachman/iTerm2/blob/79aff4d59fd591e7628649bcabe5f27541740bf6/sources/PTYSession.m#L7107): create the locale identifier from language code and country code instead of the current locale identifier, and use setlocale (or better, newlocale) to check whether it's valid, and fall back to en_US.UTF-8 otherwise?




^ permalink raw reply	[flat|nested] 24+ messages in thread

* bug#51832: Piping unicode text in `shell-command'
  2021-11-14 14:23                         ` Philipp
@ 2021-11-14 14:28                           ` Lars Ingebrigtsen
  2021-11-14 15:20                             ` Alan Third
  2021-11-14 15:01                           ` Daniel Martín via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 1 reply; 24+ messages in thread
From: Lars Ingebrigtsen @ 2021-11-14 14:28 UTC (permalink / raw)
  To: Philipp; +Cc: 51832, Alan Third, tor.a.s.kringeland

Philipp <p.stephani2@gmail.com> writes:

>> That didn't fix the issue for me, I'm afraid -- with that patch, LANG is
>> still the invalid en_NO.UTF-8 for me.
>
> Maybe we should add similar logic as iTerm2
> (https://github.com/gnachman/iTerm2/blob/79aff4d59fd591e7628649bcabe5f27541740bf6/sources/PTYSession.m#L7107):
> create the locale identifier from language code and country code
> instead of the current locale identifier,

I think that's what's Macos is returning -- it's just concatenating
those two codes to get a locale identifier.  (Which is wrong, of
course.)

> and use setlocale (or better, newlocale) to check whether it's valid,

Yes, that sounds good.

> and fall back to en_US.UTF-8 otherwise?

Hm...  I'd rather just leave LANG unset in that case -- it'll probably
lead to fewer glitches, I think.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 24+ messages in thread

* bug#51832: Piping unicode text in `shell-command'
  2021-11-14 14:23                         ` Philipp
  2021-11-14 14:28                           ` Lars Ingebrigtsen
@ 2021-11-14 15:01                           ` Daniel Martín via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 0 replies; 24+ messages in thread
From: Daniel Martín via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-11-14 15:01 UTC (permalink / raw)
  To: Philipp; +Cc: Lars Ingebrigtsen, 51832, tor.a.s.kringeland, Alan Third

Philipp <p.stephani2@gmail.com> writes:

>> Am 14.11.2021 um 14:41 schrieb Lars Ingebrigtsen <larsi@gnus.org>:
>> 
>> Alan Third <alan@idiocy.org> writes:
>> 
>>> I've attached a patch that may do something towards preventing this
>>> problem but ultimately this is a convenience to give a best guess at
>>> choosing the correct dictionary, date format, etc. If we can't easily
>>> fix it then we can drop it and tell people to set it in their init.el
>>> themselves.
>> 
>> That didn't fix the issue for me, I'm afraid -- with that patch, LANG is
>> still the invalid en_NO.UTF-8 for me.
>
> Maybe we should add similar logic as iTerm2
> (https://github.com/gnachman/iTerm2/blob/79aff4d59fd591e7628649bcabe5f27541740bf6/sources/PTYSession.m#L7107):
> create the locale identifier from language code and country code
> instead of the current locale identifier, and use setlocale (or
> better, newlocale) to check whether it's valid, and fall back to
> en_US.UTF-8 otherwise?

Native macOS Terminal also has similar logic that calls setlocale.  It
tries to setlocale on LC_ALL (first argument 0) with these locale
identifiers in turn, until one of them succeeds:

- "localeIdentifier.UTF-8"
- "languageCode_countryCode.UTF-8"
- "languageCode_countryCode"

So they seem to give preference to [[NSLocale currentLocale]
localeIdentifier] and only use "languageCode_countryCode" as fallback.





^ permalink raw reply	[flat|nested] 24+ messages in thread

* bug#51832: Piping unicode text in `shell-command'
  2021-11-14 14:28                           ` Lars Ingebrigtsen
@ 2021-11-14 15:20                             ` Alan Third
  2021-11-14 15:29                               ` Lars Ingebrigtsen
  0 siblings, 1 reply; 24+ messages in thread
From: Alan Third @ 2021-11-14 15:20 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 51832, Philipp, tor.a.s.kringeland

[-- Attachment #1: Type: text/plain, Size: 1283 bytes --]

On Sun, Nov 14, 2021 at 03:28:02PM +0100, Lars Ingebrigtsen wrote:
> Philipp <p.stephani2@gmail.com> writes:
> 
> >> That didn't fix the issue for me, I'm afraid -- with that patch, LANG is
> >> still the invalid en_NO.UTF-8 for me.
> >
> > Maybe we should add similar logic as iTerm2

I tried to find how iTerm2 does it. Your search-fu is better than
mine, apparently. :)

> > (https://github.com/gnachman/iTerm2/blob/79aff4d59fd591e7628649bcabe5f27541740bf6/sources/PTYSession.m#L7107):
> > create the locale identifier from language code and country code
> > instead of the current locale identifier,
> 
> I think that's what's Macos is returning -- it's just concatenating
> those two codes to get a locale identifier.  (Which is wrong, of
> course.)

Yeah, I don't think there's any advantage to building them up
manually.

> > and use setlocale (or better, newlocale) to check whether it's valid,
> 
> Yes, that sounds good.
> 
> > and fall back to en_US.UTF-8 otherwise?
> 
> Hm...  I'd rather just leave LANG unset in that case -- it'll probably
> lead to fewer glitches, I think.

I proposed something similar before:

https://debbugs.gnu.org/cgi/bugreport.cgi?bug=51321#90

but it didn't look like we needed it then. We know better now.

New patch attached.
-- 
Alan Third

[-- Attachment #2: v2-0001-Only-set-LANG-if-the-ID-is-valid.patch --]
[-- Type: text/plain, Size: 1590 bytes --]

From 3a2e20c659d8732b11d30cdb27e36610e87a0315 Mon Sep 17 00:00:00 2001
From: Alan Third <alan@idiocy.org>
Date: Sun, 14 Nov 2021 15:09:43 +0000
Subject: [PATCH v2] Only set LANG if the ID is valid

* src/nsterm.m (ns_init_locale): Check the provided locale identifier
is available before trying to use it.
---
 src/nsterm.m | 19 ++++++++-----------
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/src/nsterm.m b/src/nsterm.m
index 1f17a30272..983e5eb8ac 100644
--- a/src/nsterm.m
+++ b/src/nsterm.m
@@ -535,21 +535,18 @@ - (NSColor *)colorUsingDefaultColorSpace
 
   NSTRACE ("ns_init_locale");
 
-  @try
+  if (!isatty (STDIN_FILENO))
     {
-      /* It seems macOS should probably use UTF-8 everywhere.
-         'localeIdentifier' does not specify the encoding, and I can't
-         find any way to get the OS to tell us which encoding to use,
-         so hard-code '.UTF-8'.  */
+      char *oldLocale = setlocale (LC_ALL, NULL);
       NSString *localeID = [NSString stringWithFormat:@"%@.UTF-8",
                                      [locale localeIdentifier]];
 
-      /* Set LANG to locale, but not if LANG is already set.  */
-      setenv("LANG", [localeID UTF8String], 0);
-    }
-  @catch (NSException *e)
-    {
-      NSLog (@"Locale detection failed: %@: %@", [e name], [e reason]);
+      /* Check the locale ID is valid and if so set LANG, but not if
+         it is already set.  */
+      if (setlocale (LC_ALL, [localeID UTF8String]))
+        setenv("LANG", [localeID UTF8String], 0);
+
+      setlocale (LC_ALL, oldLocale);
     }
 }
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* bug#51832: Piping unicode text in `shell-command'
  2021-11-14 15:20                             ` Alan Third
@ 2021-11-14 15:29                               ` Lars Ingebrigtsen
  2021-11-16 20:52                                 ` Alan Third
  0 siblings, 1 reply; 24+ messages in thread
From: Lars Ingebrigtsen @ 2021-11-14 15:29 UTC (permalink / raw)
  To: Alan Third; +Cc: 51832, Philipp, tor.a.s.kringeland

Alan Third <alan@idiocy.org> writes:

> New patch attached.

Yup; that fixes the issue here -- LANG is unset in Emacs, and I can now
pipe in non-ASCII into pbcopy successfully.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 24+ messages in thread

* bug#51832: Piping unicode text in `shell-command'
  2021-11-14 15:29                               ` Lars Ingebrigtsen
@ 2021-11-16 20:52                                 ` Alan Third
  2022-09-20 13:24                                   ` Lars Ingebrigtsen
  0 siblings, 1 reply; 24+ messages in thread
From: Alan Third @ 2021-11-16 20:52 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 51832, Philipp, tor.a.s.kringeland

On Sun, Nov 14, 2021 at 04:29:09PM +0100, Lars Ingebrigtsen wrote:
> Alan Third <alan@idiocy.org> writes:
> 
> > New patch attached.
> 
> Yup; that fixes the issue here -- LANG is unset in Emacs, and I can now
> pipe in non-ASCII into pbcopy successfully.

Thanks. I've pushed to master.
-- 
Alan Third





^ permalink raw reply	[flat|nested] 24+ messages in thread

* bug#51832: Piping unicode text in `shell-command'
  2021-11-16 20:52                                 ` Alan Third
@ 2022-09-20 13:24                                   ` Lars Ingebrigtsen
  0 siblings, 0 replies; 24+ messages in thread
From: Lars Ingebrigtsen @ 2022-09-20 13:24 UTC (permalink / raw)
  To: Alan Third; +Cc: 51832, Eli Zaretskii, Philipp, tor.a.s.kringeland

Alan Third <alan@idiocy.org> writes:

>> Yup; that fixes the issue here -- LANG is unset in Emacs, and I can now
>> pipe in non-ASCII into pbcopy successfully.
>
> Thanks. I've pushed to master.

The bug report was left open, so I'm closing it now.  (I only lightly
skimmed this long bug report thread -- if there were other issues here
that need fixing, please respond to the debbugs address, and we'll
reopen.  Or even better -- open a new bug report.)





^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2022-09-20 13:24 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-14  3:10 bug#51832: Piping unicode text in `shell-command' Tor Kringeland
2021-11-14  7:26 ` Eli Zaretskii
2021-11-14  7:53   ` Lars Ingebrigtsen
2021-11-14  8:13     ` Eli Zaretskii
2021-11-14  8:18       ` Lars Ingebrigtsen
2021-11-14  8:25         ` Eli Zaretskii
2021-11-14  9:19           ` Lars Ingebrigtsen
2021-11-14  9:32           ` Lars Ingebrigtsen
2021-11-14  9:46             ` Lars Ingebrigtsen
2021-11-14 10:31               ` Eli Zaretskii
2021-11-14 10:41                 ` Philipp
2021-11-14 10:56                   ` Eli Zaretskii
2021-11-14 11:20                     ` Lars Ingebrigtsen
2021-11-14 11:48                       ` Philipp
2021-11-14 12:16                       ` Eli Zaretskii
2021-11-14 12:31                     ` Alan Third
2021-11-14 13:41                       ` Lars Ingebrigtsen
2021-11-14 14:23                         ` Philipp
2021-11-14 14:28                           ` Lars Ingebrigtsen
2021-11-14 15:20                             ` Alan Third
2021-11-14 15:29                               ` Lars Ingebrigtsen
2021-11-16 20:52                                 ` Alan Third
2022-09-20 13:24                                   ` Lars Ingebrigtsen
2021-11-14 15:01                           ` Daniel Martín via Bug reports for GNU Emacs, the Swiss army knife of text editors

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).