* CSV parsing and other issues (Re: LC_NUMERIC)
2021-06-02 18:54 LC_NUMERIC formatting [FEATURE REQUEST] Boruch Baum
@ 2021-06-03 14:44 ` Maxim Nikulin
2021-06-03 15:01 ` Eli Zaretskii
0 siblings, 1 reply; 33+ messages in thread
From: Maxim Nikulin @ 2021-06-03 14:44 UTC (permalink / raw)
To: emacs-devel; +Cc: Utkarsh Singh
On 03/06/2021 01:54, Boruch Baum wrote:
> Please consider having the elisp 'format' function adopt the
> single-quote and 'I' flags. Each is already implemented in both the GNU
> C printf command and the linux printf command. The single-quote flag is
> part of the 'Single UNIX Specification' and the 'I' flag has been part
> of glibc since version 2.2 [ref: man(3) printf].
>
> If function 'format' uses 'printf' as its backend, this would seem to be
> a matter of exposing an existing feature.
I do not know the story why Emacs does not support locale-aware number
formats, but I suspect that relying on libc is opening a can of worms.
Once setlocale(LC_NUMERIC, "") is invoked, one is never sure if printf-
and scanf-like functions deal with default "C" representation or with
formatted accordingly to current locale numbers. Some numbers related to
communication protocols must be always formatted using "C" locale. I do
not remember if it happened with XFree86 or with Xorg, but at certain
moment users experienced problems. X11 could not start at all due to
invalid configs. The source of problem was "," as decimal separator in
some locales and wrong expectations concerning numbers in config files.
Recently I found the following fixup_locale function:
http://git.savannah.gnu.org/cgit/emacs.git/tree/src/emacs.c#n2861
setlocale (LC_NUMERIC, "C");
I was surprised that impossible to determine current decimal separator
from elisp. At the same time e.g. `string-collate-lessp' has LOCALE
argument.
A month ago some patches were submitted to Org mode with intention to
improve import of tables, see https://debbugs.gnu.org/47885 A part of
discussion is missed in the bug tracker:
https://lists.gnu.org/archive/html/emacs-orgmode/2021-05/msg00693.html
Org mode has a piece of code that tries to guess if the file has commas
or tabs as field separator (CSV or TSV format). The suggested change
adds e.g. semicolon. (Sidenote: probably csv-mode is a better place than
org-mode for such code.)
The problem is that office software uses semicolon for locales where
comma serves as decimal separator for floating-point numbers (e.g.
de_DE, es_ES, fr_FR, ru_RU, etc.):
A;1,2;3,4
So semicolon should be tried with higher priority than comma if in
current locale numbers are represented as e.g 1,2. Unfortunately the
only way to get such information from Emacs is to call some external
application. Maintaining own mapping of locale to separator is
unnecessary burden.
Besides office software, there are some equipment that always use "C"
number formatting, so a user can have a mix of files with various
dialects of CSV. Thus locale info is not enough, some heuristics is
required anyway.
More subtle questions rise on the next step. Org allows to perform
calculations on table cells (and there is calc). Should numbers be
converted to "C" locale representation during import? Should conversion
happen when passing cell content as argument and the result converted
back to current locale? I anticipate that buffer-local setting will be
requested. There was even discussion of mixed-language documents in
emacs-orgmode mail list, however numbers were not mentioned.
So locale-aware number formatting would be a great improvement for
Emacs. On the other hand, it should be implemented with great care to
avoid localized numbers in some cases. Maybe locale argument should be
passed to functions that deal with numbers. Formatting of integer
numbers is not enough, floating point numbers should be handled as well.
Parsing numbers formatted accordingly to locale rules should be
addressed too. A function similar to `locale-info' is highly desired to
get properties of locale (e.g. decimal_point from result of localeconv).
Some decision is required whether calc & Co should operate with
localized numbers.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC)
2021-06-03 14:44 ` CSV parsing and other issues (Re: LC_NUMERIC) Maxim Nikulin
@ 2021-06-03 15:01 ` Eli Zaretskii
2021-06-04 16:31 ` Maxim Nikulin
0 siblings, 1 reply; 33+ messages in thread
From: Eli Zaretskii @ 2021-06-03 15:01 UTC (permalink / raw)
To: Maxim Nikulin; +Cc: utkarsh190601, emacs-devel
> From: Maxim Nikulin <manikulin@gmail.com>
> Date: Thu, 3 Jun 2021 21:44:08 +0700
> Cc: Utkarsh Singh <utkarsh190601@gmail.com>
>
> So locale-aware number formatting would be a great improvement for
> Emacs. On the other hand, it should be implemented with great care to
> avoid localized numbers in some cases. Maybe locale argument should be
> passed to functions that deal with numbers. Formatting of integer
> numbers is not enough, floating point numbers should be handled as well.
> Parsing numbers formatted accordingly to locale rules should be
> addressed too. A function similar to `locale-info' is highly desired to
> get properties of locale (e.g. decimal_point from result of localeconv).
> Some decision is required whether calc & Co should operate with
> localized numbers.
Setting a locale globally in Emacs is a non-starter, for the reasons
that you point out and others. Text processing in Emacs is generally
separate from the current locale's rules, mainly to have Emacs work
the same in any locale. So passing a locale argument to functions
that produce output, with the intent to request some behavior to be
tailored to that locale, is the only reasonable way to have this kind
of functionalities in Emacs. The problem with that, of course, is
that not every supported platform can dynamically change the locale,
let alone do that efficiently.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC)
2021-06-03 15:01 ` Eli Zaretskii
@ 2021-06-04 16:31 ` Maxim Nikulin
2021-06-04 19:17 ` Eli Zaretskii
0 siblings, 1 reply; 33+ messages in thread
From: Maxim Nikulin @ 2021-06-04 16:31 UTC (permalink / raw)
To: emacs-devel; +Cc: utkarsh190601
On 03/06/2021 22:01, Eli Zaretskii wrote:
>> From: Maxim Nikulin
>> Date: Thu, 3 Jun 2021 21:44:08 +0700
>>
>> So locale-aware number formatting would be a great improvement for
>> Emacs. On the other hand, it should be implemented with great care to
>> avoid localized numbers in some cases. Maybe locale argument should be
>> passed to functions that deal with numbers. Formatting of integer
>> numbers is not enough, floating point numbers should be handled as well.
>> Parsing numbers formatted accordingly to locale rules should be
>> addressed too. A function similar to `locale-info' is highly desired to
>> get properties of locale (e.g. decimal_point from result of localeconv).
>> Some decision is required whether calc & Co should operate with
>> localized numbers.
>
> Setting a locale globally in Emacs is a non-starter, for the reasons
> that you point out and others. Text processing in Emacs is generally
> separate from the current locale's rules, mainly to have Emacs work
> the same in any locale. So passing a locale argument to functions
> that produce output, with the intent to request some behavior to be
> tailored to that locale, is the only reasonable way to have this kind
> of functionalities in Emacs. The problem with that, of course, is
> that not every supported platform can dynamically change the locale,
> let alone do that efficiently.
I do not think it is efficient to require from users to fight with
number formatting themselves. Some links from my browser history when I
was trying to figure out how to get locale-specific decimal separator in
elisp:
https://stackoverflow.com/questions/35661173/how-to-format-table-fields-as-currency-in-org-mode
https://www.emacswiki.org/emacs/AddCommasToNumbers
https://www.reddit.com/r/emacs/comments/61mhyx/creating_a_function_to_add_commasseparators_to/
Do you mean that it is necessary to create new implementation of number
formatter specially for Emacs? Something like
https://unicode.org/reports/tr35/tr35-numbers.html
Unicode Locale Data Markup Language (LDML) Part 3: Numbers
Actually it is an almost random link. I do not know which source is
currently considered as the best collection of wisdom related to number
formatting. Outside of Emacs world, when I needed numbers formatted
accordingly to various locales previous time, I was lucky enough to use
code similar to the following one and did not care concerning details:
#include <cstdio>
#include <QLocale>
#include <QTextStream>
void test(QTextStream& stream, const char *loc_name) {
QLocale loc(QString::fromLocal8Bit(loc_name));
stream << "point: " << loc.decimalPoint()
<< " " << loc.toString(12345.67)
<< " " << loc.toString(1234567890) << "\n";
}
int main(int argc, char *argv[]) {
QTextStream stream(stdout);
for (int i = 1; i < argc; ++i) {
test(stream, argv[i]);
}
return 0;
}
./qtloc de_DE en_GB fa_IR
point: , 12.345,7 1.234.567.890
point: . 12,345.7 1,234,567,890
point: ٫ ۱۲٬۳۴۵٫۷ ۱٬۲۳۴٬۵۶۷٬۸۹۰
Surprisingly it works even despite I have not generated de and fa locales.
On linux I see that Emacs is linked with ICU
ldd /usr/bin/emacs | grep -i icu
libicuuc.so.66 => /usr/lib/x86_64-linux-gnu/libicuuc.so.66
(0x00007f457c799000)
libicudata.so.66 => /usr/lib/x86_64-linux-gnu/libicudata.so.66
(0x00007f457a61c000)
I am not familiar with ICU API but I expect that it may be utilized
https://github.com/unicode-org/icu/blob/main/icu4c/source/samples/numfmt/capi.c
Do you have a bright idea concerning implementation of parser-formatter
for numbers with reasonable efforts?
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC)
2021-06-04 16:31 ` Maxim Nikulin
@ 2021-06-04 19:17 ` Eli Zaretskii
0 siblings, 0 replies; 33+ messages in thread
From: Eli Zaretskii @ 2021-06-04 19:17 UTC (permalink / raw)
To: Maxim Nikulin; +Cc: utkarsh190601, emacs-devel
> Cc: utkarsh190601@gmail.com
> From: Maxim Nikulin <manikulin@gmail.com>
> Date: Fri, 4 Jun 2021 23:31:13 +0700
>
> > Setting a locale globally in Emacs is a non-starter, for the reasons
> > that you point out and others. Text processing in Emacs is generally
> > separate from the current locale's rules, mainly to have Emacs work
> > the same in any locale. So passing a locale argument to functions
> > that produce output, with the intent to request some behavior to be
> > tailored to that locale, is the only reasonable way to have this kind
> > of functionalities in Emacs. The problem with that, of course, is
> > that not every supported platform can dynamically change the locale,
> > let alone do that efficiently.
>
> I do not think it is efficient to require from users to fight with
> number formatting themselves.
I didn't suggest that. I was talking about the design of the APIs
that need to be able to provide locale-specific formatting. The
implementation should be provided by Emacs core, of course.
> Do you mean that it is necessary to create new implementation of number
> formatter specially for Emacs?
Either that, or use the underlying C library if it can accept a locale
specifier, or if it supports efficient dynamic change of the locale,
like we do in some of the implementations of string-collate-lessp.
> On linux I see that Emacs is linked with ICU
It isn't. It's either HarfBuzz or maybe libc that pulls in the ICU
library. Emacs doesn't use it directly.
> Do you have a bright idea concerning implementation of parser-formatter
> for numbers with reasonable efforts?
See above.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC)
@ 2021-06-06 23:36 Boruch Baum
2021-06-07 12:28 ` Eli Zaretskii
0 siblings, 1 reply; 33+ messages in thread
From: Boruch Baum @ 2021-06-06 23:36 UTC (permalink / raw)
To: Emacs-Devel List; +Cc: Maxim Nikulin, Eli Zaretskii
I wasn't cc'ed (and I don't subscribe to the list), so I only now saw
the continuation of my post.
1] @Maxim: You seemed to indicate that the default emacs locale is 'C'.
That may be true, and I may be mixing up two separate things, but my
observation is that I get 'nil' when I check for any related
environment variable using function `getenv', and in practice I need
to temporarily manually use function setenv to set LC_COLLATE=C in
order to offer several sorting options in package diredc. Note though
that feature isn't performing the sort within emacs; it's temporarily
setting a shell environment and having the external ls program
perform the sort for emacs-core dired. Thus, my experience has been
that the default has been something other than C, at least for
LC_COLLATE. I suspect that's true for ALL emacs users.
2] @Eli: You wrote
> > The problem with that, of course, is that not every supported
> > platform can dynamically change the locale, let alone do that
> > efficiently.
I have no idea to what actual supported platform you're referring.
3] @ELi: Your wrote
> > Text processing in Emacs is generally separate from the current
> > locale's rules,
> > ...
> > So passing a locale argument to functions that produce output,
> > with the intent to request some behavior to be tailored to that
> > locale, is the only reasonable way to have this kind
Agreed. My input here is that there should be clear documentation of
how to retrieve a value for that argument from a buffer's context,
(maybe the same way that flyspell does?).
I see also that I created room for confusion in asking actually for TWO
features (single-quote and upper-case I) because the two will behave
differently in an expected default condition. The single quote format
(for the thousands separator) can be expected to produce a result always
for all conditions of locale, while I expect most locale cases won't
produce any special output for the upper-case I format option.
--
hkp://keys.gnupg.net
CA45 09B5 5351 7C11 A9D1 7286 0036 9E45 1595 8BC0
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC)
2021-06-06 23:36 CSV parsing and other issues (Re: LC_NUMERIC) Boruch Baum
@ 2021-06-07 12:28 ` Eli Zaretskii
2021-06-08 0:45 ` Boruch Baum
0 siblings, 1 reply; 33+ messages in thread
From: Eli Zaretskii @ 2021-06-07 12:28 UTC (permalink / raw)
To: Boruch Baum; +Cc: manikulin, emacs-devel
> Date: Sun, 6 Jun 2021 19:36:38 -0400
> From: Boruch Baum <boruch_baum@gmx.com>
> Cc: Maxim Nikulin <manikulin@gmail.com>, Eli Zaretskii <eliz@gnu.org>
>
> 1] @Maxim: You seemed to indicate that the default emacs locale is 'C'.
> That may be true
That's only true for LC_NUMERIC category.
> 2] @Eli: You wrote
>
> > > The problem with that, of course, is that not every supported
> > > platform can dynamically change the locale, let alone do that
> > > efficiently.
>
> I have no idea to what actual supported platform you're referring.
GNU/Linux is the only one I know of that can efficiently switch
locales dynamically (and even that in recent versions of libc, AFAIR).
> > > Text processing in Emacs is generally separate from the current
> > > locale's rules,
> > > ...
> > > So passing a locale argument to functions that produce output,
> > > with the intent to request some behavior to be tailored to that
> > > locale, is the only reasonable way to have this kind
>
> Agreed. My input here is that there should be clear documentation of
> how to retrieve a value for that argument from a buffer's context,
> (maybe the same way that flyspell does?).
Sorry, I don't see the relevance. I was talking about calling
functions, so how does some buffer enter this picture? Buffers don't
have anything to do with the locale used by library functions called
by Emacs.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC)
2021-06-07 12:28 ` Eli Zaretskii
@ 2021-06-08 0:45 ` Boruch Baum
2021-06-08 2:35 ` Eli Zaretskii
0 siblings, 1 reply; 33+ messages in thread
From: Boruch Baum @ 2021-06-08 0:45 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: manikulin, emacs-devel
On 2021-06-07 15:28, Eli Zaretskii wrote:
> Sorry, I don't see the relevance. I was talking about calling
> functions, so how does some buffer enter this picture? Buffers don't
> have anything to do with the locale used by library functions called
> by Emacs.
No? If an Emacs user has two buffers in two separate languages, the
buffer-local settings aren't / won't be respected?
--
hkp://keys.gnupg.net
CA45 09B5 5351 7C11 A9D1 7286 0036 9E45 1595 8BC0
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC)
2021-06-08 0:45 ` Boruch Baum
@ 2021-06-08 2:35 ` Eli Zaretskii
2021-06-08 15:35 ` Stefan Monnier
2021-06-08 16:35 ` Maxim Nikulin
0 siblings, 2 replies; 33+ messages in thread
From: Eli Zaretskii @ 2021-06-08 2:35 UTC (permalink / raw)
To: Boruch Baum; +Cc: manikulin, emacs-devel
> Date: Mon, 7 Jun 2021 20:45:10 -0400
> From: Boruch Baum <boruch_baum@gmx.com>
> Cc: emacs-devel@gnu.org, manikulin@gmail.com
>
> On 2021-06-07 15:28, Eli Zaretskii wrote:
> > Sorry, I don't see the relevance. I was talking about calling
> > functions, so how does some buffer enter this picture? Buffers don't
> > have anything to do with the locale used by library functions called
> > by Emacs.
>
> No? If an Emacs user has two buffers in two separate languages, the
> buffer-local settings aren't / won't be respected?
First, language is different from locale. And second, we don't even
have a buffer-local notion of language yet. What we can support (but
seldom if ever do) is to have buffer-local case-conversion table,
which is a very small part of language- or locale-dependent settings.
So no, buffer-local aspects in general don't affect what you have in
mind, not yet anyway.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC)
2021-06-08 2:35 ` Eli Zaretskii
@ 2021-06-08 15:35 ` Stefan Monnier
2021-06-08 16:35 ` Maxim Nikulin
1 sibling, 0 replies; 33+ messages in thread
From: Stefan Monnier @ 2021-06-08 15:35 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Boruch Baum, manikulin, emacs-devel
> First, language is different from locale. And second, we don't even
> have a buffer-local notion of language yet. What we can support (but
> seldom if ever do) is to have buffer-local case-conversion table,
> which is a very small part of language- or locale-dependent settings.
>
> So no, buffer-local aspects in general don't affect what you have in
> mind, not yet anyway.
Worse: it's not uncommon to run code which doesn't really care about its
current-buffer, so it's not always correct to presume that the settings
of the current-buffer should be used. We already suffer from such
problems in some corner cases with code that uses `\<` or `\_<` in
regexps matching on strings (rather than buffer content) where the
result can unexpectedly depend on the buffer which happens to
be current.
Stefan
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC)
2021-06-08 2:35 ` Eli Zaretskii
2021-06-08 15:35 ` Stefan Monnier
@ 2021-06-08 16:35 ` Maxim Nikulin
2021-06-08 18:52 ` Eli Zaretskii
1 sibling, 1 reply; 33+ messages in thread
From: Maxim Nikulin @ 2021-06-08 16:35 UTC (permalink / raw)
To: emacs-devel; +Cc: emacs-devel
On 08/06/2021 09:35, Eli Zaretskii wrote:
> From: Boruch Baum
>> No? If an Emacs user has two buffers in two separate languages, the
>> buffer-local settings aren't / won't be respected?
>
> First, language is different from locale. And second, we don't even
> have a buffer-local notion of language yet.
Certainly locale is more precise than just language since it includes
region and other variants, moreover it can be granularly tuned (date,
numbers, sorting can be adjusted independently), but I still think that
all these properties can be sometimes broadly referred to as language.
Does not we discuss a feature request? Low level functions can accept
explicit locale. Higher level API can obtain it implicitly from
buffer-local variables and global locale. For example the LOCALE
argument of `string-collate-lessp' is optional one. I can even
anticipate that locale may be stored in text properties some times. A
random message from recent "About multilingual documents" thread at
emacs-orgmode mail list:
https://lists.gnu.org/archive/html/emacs-orgmode/2021-05/msg00252.html
At first basic functionality may be implemented. The problem is to
choose extensible API.
On 07/06/2021 06:36, Boruch Baum wrote:
> I get 'nil' when I check for any related
> environment variable using function `getenv'
Do not confuse setlocale and setenv. setenv affects
later calls to setlocale (with NULL as locale argument)
and child processes. setlocale deals with current
processes it can take into account or override values
of environment variables. setlocale is not exposed to elisp.
> The single quote format (for the thousands separator) can be expected
> to produce a result always for all conditions of locale, while I
> expect most locale cases won't produce any special output for the
> upper-case I format option.
I still think that "'" and "I" formats are tightly bound. Grouping style
is locale-dependent. So representation of digits is just another
property of locale.
LC_NUMERIC=C.UTF-8 /usr/bin/printf "%'d\n" 1234567890
1234567890
LC_NUMERIC=en_US.UTF-8 /usr/bin/printf "%'d\n" 1234567890
1,234,567,890
LC_NUMERIC=es_ES.UTF-8 /usr/bin/printf "%'d\n" 1234567890
1.234.567.890
LC_NUMERIC=ru_RU.UTF-8 /usr/bin/printf "%'d\n" 1234567890
1 234 567 890
Even group size is not always 3
: LC_NUMERIC=en_IN.UTF-8 /usr/bin/printf "%'d\n" 1234567890
: 1,23,45,67,890
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/NumberFormat
"India uses thousands/lakh/crore separators"
I just have realized that nl_langinfo(3) (and nl_langinfo_l(3) as well)
from libc accepts RADIXCHAR (decimal dot) and THOUSEP (group separator)
arguments. They are good candidates for `locale-info' extension.
On 05/06/2021 02:17, Eli Zaretskii wrote:
>> From: Maxim Nikulin Date: Fri, 4 Jun 2021 23:31:13 +0700
>> On linux I see that Emacs is linked with ICU
>
> It isn't. It's either HarfBuzz or maybe libc that pulls in the ICU
> library. Emacs doesn't use it directly.
Actually Qt links my example with other libraries from ICU. My point was
that since Emacs anyway (indirectly) links with this library, the
dependency may be not so heavy. My personal requirements for number
formatting were quite modest so far, I expect that other languages (CJK,
right-to-left scripts, etc.) may require quite special treatment, so
implementation in Emacs (and further maintenance) may require a lot of
work. At least API of ICU should be studied to get some inspiration what
features will be necessary for users from other regions.
E.g. I was completely unaware that negative sign may be represented by
parenthesis (JavaScript, may be executed in browser developer tools)
new Intl.NumberFormat('en-GB', {
style: 'currency',
currency: 'USD',
currencySign: 'accounting',
signDisplay: 'always'
}).format(-3500);
"(US$3,500.00)"
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/NumberFormat/NumberFormat
I do not know if Intl API is really convenient. I see that there is no
direct way to get decimal separator. However it can serve as
another source for inspiration.
I expect enough surprises and unexpected "discoveries" during
implementation of better locale support. That is why I would consider
adapting some more or less established API for this purpose.
P.S.
On 07/06/2021 06:36, Boruch Baum wrote:
> and in practice I need to temporarily manually use function setenv to
> set LC_COLLATE=C in order to offer several sorting options in package
> diredc.
Ideally you should avoid this and use envp argument of execve(2) system
call. Otherwise it could interfere with other packages, especially if
threads are involved. Unsure that Emacs currently provides such API option.
> it's temporarily setting a shell environment and having the external
> ls program perform the sort for emacs-core dired.
I am unsure if "ls" may be reliably used at all. File names may have
e.g. newlines, various control characters, part that looks rather
similar to ls output. I am not familiar with dired internals. At first
by intention was to create an issue for diredc but skimming though its
code I did not found direct "ls" invocation. Some problems with ls:
https://mywiki.wooledge.org/BashPitfalls?highlight=%28%5CbCategoryShell%5Cb%29#for_f_in_.24.28ls_.2A.mp3.29
Bash Pitfalls: item #1
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC)
2021-06-08 16:35 ` Maxim Nikulin
@ 2021-06-08 18:52 ` Eli Zaretskii
2021-06-10 16:28 ` Maxim Nikulin
0 siblings, 1 reply; 33+ messages in thread
From: Eli Zaretskii @ 2021-06-08 18:52 UTC (permalink / raw)
To: Maxim Nikulin; +Cc: boruch_baum, emacs-devel
> Cc: emacs-devel@gnu.org
> From: Maxim Nikulin <manikulin@gmail.com>
> Date: Tue, 8 Jun 2021 23:35:51 +0700
>
> On 08/06/2021 09:35, Eli Zaretskii wrote:
> > From: Boruch Baum
> >> No? If an Emacs user has two buffers in two separate languages, the
> >> buffer-local settings aren't / won't be respected?
> >
> > First, language is different from locale. And second, we don't even
> > have a buffer-local notion of language yet.
>
> Certainly locale is more precise than just language since it includes
> region and other variants, moreover it can be granularly tuned (date,
> numbers, sorting can be adjusted independently), but I still think that
> all these properties can be sometimes broadly referred to as language.
No, they cannot, not in general. A locale comes with a whole database
of different settings: language, encoding (a.k.a. "codeset"), formats
of date and time, names of days of the week and of the months, rules
for collation and capitalization, etc. etc. You can easily find
several locales whose language is English, but some/many/all of the
other locale-dependent settings are different. It isn't a coincidence
that a locale's name includes more than just the language part.
> Low level functions can accept explicit locale.
Which ones? Most libc routines don't, they use the locale as a global
identifier. And many libc's (with the prominent exception of glibc)
don't support efficient change of a locale in the middle of a program,
they assume that the program's locale is set once at program startup.
> Higher level API can obtain it implicitly from
> buffer-local variables and global locale. For example the LOCALE
> argument of `string-collate-lessp' is optional one. I can even
> anticipate that locale may be stored in text properties some times. A
> random message from recent "About multilingual documents" thread at
> emacs-orgmode mail list:
> https://lists.gnu.org/archive/html/emacs-orgmode/2021-05/msg00252.html
That's mostly about input methods and org-export, I don't see how it's
relevant to what Boruch asked.
> At first basic functionality may be implemented. The problem is to
> choose extensible API.
No, the problem is to have a design that would allow an efficient
implementation. Given what the underlying libc does, it isn't easy.
And then we have conceptual problems. For example, in a multilingual
editor such as Emacs, the notion of a "buffer language" not always
makes sense, you'd need to support portions of text that have
different language properties. Imagine switching locales as Emacs
processes adjacent stretches of text and other complications. For
example, changing letter-case for a stretch or Turkish text is
supposed to be different from the English or German text. I'm all
ears for ideas how to design such "language support". It definitely
isn't easy, so if you have ideas, please voice them!
> I just have realized that nl_langinfo(3) (and nl_langinfo_l(3) as well)
> from libc accepts RADIXCHAR (decimal dot) and THOUSEP (group separator)
> arguments. They are good candidates for `locale-info' extension.
We already use nl_langinfo in locale-info, so what exactly are you
suggesting here? adding more items? You don't really expect Lisp
programs to format numbers such as 123,456 by hand after learning from
locale-info that the thousands separator is a comma, do you?
> Actually Qt links my example with other libraries from ICU. My point was
> that since Emacs anyway (indirectly) links with this library, the
> dependency may be not so heavy.
If you are suggesting that we introduce ICU as a dependency, we could
discuss the pros and cons. It isn't a simple decision, because ICU
comes with a lot of baggage that we already have implemented in Emacs,
so whether we throw away what we have and use ICU instead, or just add
what we miss without depending on ICU, requires good thought and good
acquaintance with the ICU internals (to make sure it does what we want
in Emacs, and doesn't break existing features).
> My personal requirements for number
> formatting were quite modest so far, I expect that other languages (CJK,
> right-to-left scripts, etc.) may require quite special treatment, so
> implementation in Emacs (and further maintenance) may require a lot of
> work. At least API of ICU should be studied to get some inspiration what
> features will be necessary for users from other regions.
I don't think the problem is the API.
> E.g. I was completely unaware that negative sign may be represented by
> parenthesis
Really? it's standard in financial applications.
> I expect enough surprises and unexpected "discoveries" during
> implementation of better locale support. That is why I would consider
> adapting some more or less established API for this purpose.
I don't think "consider" cuts it. We have already a lot of stuff in
Emacs; what we don't have needs serious design and comparison of
available implementation options. Emacs's needs are quite special and
unlike those of most other programs.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC)
2021-06-08 18:52 ` Eli Zaretskii
@ 2021-06-10 16:28 ` Maxim Nikulin
2021-06-10 16:57 ` Eli Zaretskii
2021-06-10 21:10 ` Stefan Monnier
0 siblings, 2 replies; 33+ messages in thread
From: Maxim Nikulin @ 2021-06-10 16:28 UTC (permalink / raw)
To: emacs-devel; +Cc: boruch_baum
On 09/06/2021 01:52, Eli Zaretskii wrote:
> From: Maxim Nikulin Date: Tue, 8 Jun 2021 23:35:51 +0700
I have reordered some parts of discussion.
>> I just have realized that nl_langinfo(3) (and
>> nl_langinfo_l(3) as well) from libc accepts RADIXCHAR
>> (decimal dot) and THOUSEP (group separator)
>> arguments. They are good candidates for `locale-info'
>> extension.
>
> We already use nl_langinfo in locale-info, so what exactly
> are you suggesting here? adding more items? You don't
> really expect Lisp programs to format numbers such as
> 123,456 by hand after learning from locale-info that the
> thousands separator is a comma, do you?
I have hijacked Boruch's thread and changed the subject to "CSV
parsing". There are plenty of CSV dialects. If decimal separator is ","
then office software uses ";" instead of comma as cell (field)
separator. So to parse CSV file it is necessary to know decimal
separator in a specified locale. RADIXCHAR as argument of nl_langinfo(3)
is a first step to better user experience with CSV files.
Unfortunately it allows only to get reasonable visual representation.
Taking advantage of Org spreadsheet calculations require parsing cell
contents thus parsing of numbers (and maybe dates).
I mentioned earlier https://debbugs.gnu.org/47885 and a part of
discussion that is missed in the bug tracker:
https://lists.gnu.org/archive/html/emacs-orgmode/2021-05/msg00693.html
I have seen nl_langinfo without RADIXCHAR in emacs sources
http://git.savannah.gnu.org/cgit/emacs.git/tree/src/w32proc.c#n3258
http://git.savannah.gnu.org/cgit/emacs.git/tree/lib-src/ntlib.c#n520
Originally during discussion in emacs-orgmode I did not plan to raise
the question concerning number formatting and parsing since I had no
hope for any positive outcome without consistent proposal. Accidentally
I notices Borich's message and decided to add another use case.
>> On 08/06/2021 09:35, Eli Zaretskii wrote:
>> > From: Boruch Baum
>> >> No? If an Emacs user has two buffers in two separate languages, the
>> >> buffer-local settings aren't / won't be respected?
>> >
>> > First, language is different from locale. And second, we don't even
>> > have a buffer-local notion of language yet.
>>
>> Certainly locale is more precise than just language since it includes
>> region and other variants, moreover it can be granularly tuned (date,
>> numbers, sorting can be adjusted independently), but I still think that
>> all these properties can be sometimes broadly referred to as language.
>
> No, they cannot, not in general. A locale comes with a whole database
> of different settings: language, encoding (a.k.a. "codeset"), formats
> of date and time, names of days of the week and of the months, rules
> for collation and capitalization, etc. etc. You can easily find
> several locales whose language is English, but some/many/all of the
> other locale-dependent settings are different. It isn't a coincidence
> that a locale's name includes more than just the language part.
I wrote almost the same concerning locale variants and components, so I
feel some sort of confusion and can not get its origin. I was trying to
support Boruch that buffer-local variables may be important part of
locale context, more precise than global settings, and a fallback if
locale is not specified for particular span of text. In respect to such
hierarchy language vs. locale difference does not matter.
>> Low level functions can accept explicit locale.
>
> Which ones? Most libc routines don't, they use the locale
> as a global identifier. And many libc's (with the prominent
> exception of glibc) don't support efficient change of a
> locale in the middle of a program, they assume that the
> program's locale is set once at program startup.
Hypothetical functions in new elisp API, maybe relying on some external
libraries. I believed, you agreed that global LC_NUMERIC must be "C" to
avoid various sort of problems with data exchange. I am not aware of
libc functions for number formatting or parsing that can take explicit
locale (I have seen such feature in C++ standard library, Qt, other
languages). Totalitarian approach of libc with the only locale facet,
the only timezone imposes too hard limitations to consider some libc
functions as useful and reliable in more or less complex application.
Its API is suitable for simple tools that can quickly do their work and
do not assume any conversion. More flexible base layer is required when
mix of environments is expected. Full support of locale features
requires a lot of work, that is why I am asking if some external library
can be used instead.
>> Higher level API can obtain it implicitly from
>> buffer-local variables and global locale. For example the
>> LOCALE argument of `string-collate-lessp' is optional
>> one. I can even anticipate that locale may be stored in
>> text properties some times. A random message from recent
>> "About multilingual documents" thread at emacs-orgmode
>> mail list:
>> https://lists.gnu.org/archive/html/emacs-orgmode/2021-05/msg00252.html
>
> That's mostly about input methods and org-export, I don't
> see how it's relevant to what Boruch asked.
I added this link to show you that demand for multilanguage documents is
real. Notice that problems with spell checking were mentioned in that
discussion. Earlier I saw suggestions to switch ispell language with
input method. In my opinion it is ridiculous. Personally I rather need
combined dictionary then explicitly marked text regions.
I expect that new features will be wider utilized when possibility to
use them will appear.
>> At first basic functionality may be implemented. The
>> problem is to choose extensible API.
>
> No, the problem is to have a design that would allow an
> efficient implementation. Given what the underlying libc
> does, it isn't easy.
That is why I looking for an alternative to libc. Previously you wrote
"locale switching". I would rather say constructing and destroying
locales on demand. Switching may behave not so well when thread are
involved.
> And then we have conceptual problems. For example, in a
> multilingual editor such as Emacs, the notion of a "buffer
> language" not always makes sense, you'd need to support
> portions of text that have different language properties.
> Imagine switching locales as Emacs processes adjacent
> stretches of text and other complications. For example,
> changing letter-case for a stretch or Turkish text is
> supposed to be different from the English or German text.
> I'm all ears for ideas how to design such "language
> support". It definitely isn't easy, so if you have ideas,
> please voice them!
I never have a consistent vision nor see a conceptual problem.
Buffer-local settings are just more specific than global ones. That is
I mentioned text properties as even more precise in my previous message.
Maybe even current mode can help to build proper hierarchy of locale
contexts. HTML has "lang" attribute, there is "\foreignlanguage" in
LaTeX, etc.
I have heard that special case exists in Turkish, but I was not curious
enough to find details and rules when and how it should be applied.
> If you are suggesting that we introduce ICU as a dependency,
> we could discuss the pros and cons.
I consider it as the most complete available implementation. Do you
know a comparable alternative?
I have realized that since Emacs has support of dynamic modules, it is
possible to create a prototype with bindings to external library without
rebuilding of Emacs.
> I don't think the problem is the API.
I think, introducing features gradually will be more headache for
developers of external packages than absence of support at all. API
determines the scope of such features.
>> E.g. I was completely unaware that negative sign may be
>> represented by parenthesis
>
> Really? it's standard in financial applications.
Is it really so standard? Maybe I have seen such format, even guessed
from some context that e.g. table column with such numbers should assume
negative values, or e.g. in discount entry. At least I did not
recognize such format as some general rule.
new Intl.NumberFormat('de-DE', {style: 'currency', currency: 'USD',
currencySign: 'accounting', signDisplay: 'always'}).format(-3500);
"-3.500,00 $"
new Intl.NumberFormat('es-ES', {style: 'currency', currency: 'USD',
currencySign: 'accounting', signDisplay: 'always'}).format(-3500);
"-3500,00 US$"
new Intl.NumberFormat('fr-FR', {style: 'currency', currency: 'USD',
currencySign: 'accounting', signDisplay: 'always'}).format(-3500);
"(3 500,00 $US)"
new Intl.NumberFormat('ru-RU', {style: 'currency', currency: 'USD',
currencySign: 'accounting', signDisplay: 'always'}).format(-3500);
"-3 500,00 $"
>> I expect enough surprises and unexpected "discoveries"
>> during implementation of better locale support. That is
>> why I would consider adapting some more or less
>> established API for this purpose.
>
> I don't think "consider" cuts it. We have already a lot of
> stuff in Emacs; what we don't have needs serious design and
> comparison of available implementation options. Emacs's
> needs are quite special and unlike those of most other
> programs.
I still think that expectation of users around the globe are more
special than Emacs' needs at least in respect to format of numbers.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC)
2021-06-10 16:28 ` Maxim Nikulin
@ 2021-06-10 16:57 ` Eli Zaretskii
2021-06-10 18:01 ` Boruch Baum
2021-06-11 16:58 ` Maxim Nikulin
2021-06-10 21:10 ` Stefan Monnier
1 sibling, 2 replies; 33+ messages in thread
From: Eli Zaretskii @ 2021-06-10 16:57 UTC (permalink / raw)
To: Maxim Nikulin; +Cc: boruch_baum, emacs-devel
> From: Maxim Nikulin <manikulin@gmail.com>
> Date: Thu, 10 Jun 2021 23:28:59 +0700
> Cc: boruch_baum@gmx.com
>
> > We already use nl_langinfo in locale-info, so what exactly
> > are you suggesting here? adding more items? You don't
> > really expect Lisp programs to format numbers such as
> > 123,456 by hand after learning from locale-info that the
> > thousands separator is a comma, do you?
>
> I have hijacked Boruch's thread and changed the subject to "CSV
> parsing".
That explains part of my confusion. Please try not to hijack
discussions; instead, start a separate thread, to avoid such
confusion.
For processing CSV, if there's a need to know whether the locale uses
the comma as a decimal separator, we could indeed extend locale-info.
But such an extension is almost trivial and doesn't even touch on the
significant problems in the rest of the discussion.
> I was trying to support Boruch that buffer-local variables may be
> important part of locale context, more precise than global settings,
They are more precise, but they don't support mixed languages in the
same buffer, something that happens in Emacs very frequently. Which
means they are not precise enough. So my POV is that we should look
for a way to be able to specify the language of some span of text, in
which case buffers that use a single language will be a special case.
> > And then we have conceptual problems. For example, in a
> > multilingual editor such as Emacs, the notion of a "buffer
> > language" not always makes sense, you'd need to support
> > portions of text that have different language properties.
> > Imagine switching locales as Emacs processes adjacent
> > stretches of text and other complications. For example,
> > changing letter-case for a stretch or Turkish text is
> > supposed to be different from the English or German text.
> > I'm all ears for ideas how to design such "language
> > support". It definitely isn't easy, so if you have ideas,
> > please voice them!
>
> I never have a consistent vision nor see a conceptual problem.
Here's a trivial example:
(insert (downcase (buffer-substring POS1 POS2)))
Contrast with
(insert (downcase "FOO"))
The function 'downcase' gets a Lisp string, but it has no way of
knowing whether the string is actually a portion of current buffer's
text. So how can it apply the correct letter-case conversions, even
if some buffer-local setting specifies that this should be done using
some specific language's rules?
IOW, one of the non-trivial problems is how to process Lisp strings
correctly for these purposes. Buffers can have local variables, but
what about strings?
> > If you are suggesting that we introduce ICU as a dependency,
> > we could discuss the pros and cons.
>
> I consider it as the most complete available implementation. Do you
> know a comparable alternative?
Yes: what we have already in Emacs. That covers a lot of the same
Unicode turf that ICU handles, because we import and use the same
Unicode files and tables. The question is: what is best for the
future development of Emacs in this area: depend on ICU (which would
mean we need to rewrite lots of code that is working well), or extend
what we have to support more Unicode features? One not-so-trivial
aspect of this is efficiency of fetching character properties (Emacs
has char-tables for that, which are efficient both CPU- and
memory-wise). Another aspect is support for raw bytes in buffers and
strings. And there are probably some others.
It is not a simple decision.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC)
2021-06-10 16:57 ` Eli Zaretskii
@ 2021-06-10 18:01 ` Boruch Baum
2021-06-10 18:50 ` Eli Zaretskii
2021-06-11 16:58 ` Maxim Nikulin
1 sibling, 1 reply; 33+ messages in thread
From: Boruch Baum @ 2021-06-10 18:01 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Maxim Nikulin, emacs-devel
On 2021-06-10 19:57, Eli Zaretskii wrote:
>
> It is not a simple decision.
My request at the beginning of the (original) thread was much more
limited in scope and still seems to me in fact to be a simple decision,
and with no side effects. Paraphrased:
Please consider exposing to the elisp `format' function the
single-quote and upper-case 'I' format specifiers of the libc (or
other) `printf' command.
Doing this will just offer an elisp programmer a new option.
--
hkp://keys.gnupg.net
CA45 09B5 5351 7C11 A9D1 7286 0036 9E45 1595 8BC0
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC)
2021-06-10 18:01 ` Boruch Baum
@ 2021-06-10 18:50 ` Eli Zaretskii
2021-06-10 19:04 ` Boruch Baum
0 siblings, 1 reply; 33+ messages in thread
From: Eli Zaretskii @ 2021-06-10 18:50 UTC (permalink / raw)
To: Boruch Baum; +Cc: manikulin, emacs-devel
> Date: Thu, 10 Jun 2021 14:01:45 -0400
> From: Boruch Baum <boruch_baum@gmx.com>
> Cc: Maxim Nikulin <manikulin@gmail.com>, emacs-devel@gnu.org
>
> Please consider exposing to the elisp `format' function the
> single-quote and upper-case 'I' format specifiers of the libc (or
> other) `printf' command.
>
> Doing this will just offer an elisp programmer a new option.
That would make the output of 'format' dependent on the current
locale, unless we do something else to allow Lisp programs to take
control on what those specifiers produce. That "something else" is
what I was talking about. It is true that I was talking about larger
range of issues, but still, even this small feature touches on some of
them. And I don't think you had any ideas for how to resolve those
issues, or did I miss something?
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC)
2021-06-10 18:50 ` Eli Zaretskii
@ 2021-06-10 19:04 ` Boruch Baum
2021-06-10 19:23 ` Eli Zaretskii
0 siblings, 1 reply; 33+ messages in thread
From: Boruch Baum @ 2021-06-10 19:04 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: manikulin, emacs-devel
On 2021-06-10 21:50, Eli Zaretskii wrote:
> > Date: Thu, 10 Jun 2021 14:01:45 -0400
> > From: Boruch Baum <boruch_baum@gmx.com>
> > Cc: Maxim Nikulin <manikulin@gmail.com>, emacs-devel@gnu.org
> >
> > Please consider exposing to the elisp `format' function the
> > single-quote and upper-case 'I' format specifiers of the libc (or
> > other) `printf' command.
> >
> > Doing this will just offer an elisp programmer a new option.
>
> That would make the output of 'format' dependent on the current
> locale
That's the elisp programmer's business, not your responsibilty.
> ...
> And I don't think you had any ideas for how to resolve those issues,
> or did I miss something?
Yes, that I haven't invested in responding about those issues because I
don't see any of them as relevant.
+ Elisp function `format' exists.
+ Elsip function `format' uses `printf' format specifiers.
+ Elisp function `format' doesn't expose two of them.
--
hkp://keys.gnupg.net
CA45 09B5 5351 7C11 A9D1 7286 0036 9E45 1595 8BC0
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC)
2021-06-10 19:04 ` Boruch Baum
@ 2021-06-10 19:23 ` Eli Zaretskii
2021-06-10 20:20 ` Boruch Baum
2021-06-11 13:56 ` Filipp Gunbin
0 siblings, 2 replies; 33+ messages in thread
From: Eli Zaretskii @ 2021-06-10 19:23 UTC (permalink / raw)
To: Boruch Baum; +Cc: manikulin, emacs-devel
> Date: Thu, 10 Jun 2021 15:04:53 -0400
> From: Boruch Baum <boruch_baum@gmx.com>
> Cc: manikulin@gmail.com, emacs-devel@gnu.org
>
> > That would make the output of 'format' dependent on the current
> > locale
>
> That's the elisp programmer's business, not your responsibilty.
What could the Lisp programmer do in this situation?
> + Elsip function `format' uses `printf' format specifiers.
Only for some of the 'format's capabilities, not for all of them.
> + Elisp function `format' doesn't expose two of them.
I don't think it's TRT for Emacs to expose locale-dependent features
that cannot be controlled from Lisp, sorry. We need to find a better
way. For example, there could be a Lisp variable that specifies the
group separator character, and then 'format' could use that character
when the format spec includes %'. Which means we'd need to implement
that in our own code; patches welcome.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC)
2021-06-10 19:23 ` Eli Zaretskii
@ 2021-06-10 20:20 ` Boruch Baum
2021-06-11 6:19 ` Eli Zaretskii
2021-06-11 13:56 ` Filipp Gunbin
1 sibling, 1 reply; 33+ messages in thread
From: Boruch Baum @ 2021-06-10 20:20 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: manikulin, emacs-devel
On 2021-06-10 22:23, Eli Zaretskii wrote:
> > Date: Thu, 10 Jun 2021 15:04:53 -0400
> > From: Boruch Baum <boruch_baum@gmx.com>
> > Cc: manikulin@gmail.com, emacs-devel@gnu.org
> >
> > > That would make the output of 'format' dependent on the current
> > > locale
> >
> > That's the elisp programmer's business, not your responsibilty.
>
> What could the Lisp programmer do in this situation?
It's not your responsibilty.
I can say that in the use-case that prompted my request, I'm confident
it will *never* be an issue. I ask format to give me a string and I
display it. End of story. Whether just 99% or 99.99%, the overwhelming
majority of cases will be the same. Your concerns are total non-issues.
> > + Elsip function `format' uses `printf' format specifiers.
>
> Only for some of the 'format's capabilities, not for all of them.
[Commentary: 'Some' isn't a number or a percentage.]
[Commentary: I see all format specifiers supported but the two
requested.]
> > + Elisp function `format' doesn't expose two of them.
>
> I don't think it's TRT for Emacs to expose locale-dependent features
> that cannot be controlled from Lisp
Then don't make them locale specific. Implement the single-quote
specifier the same way you currently handle the floating-point specifier
'%f', a locale-specific format that has existed in emacs without
complaint since ...
--
hkp://keys.gnupg.net
CA45 09B5 5351 7C11 A9D1 7286 0036 9E45 1595 8BC0
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC)
2021-06-10 16:28 ` Maxim Nikulin
2021-06-10 16:57 ` Eli Zaretskii
@ 2021-06-10 21:10 ` Stefan Monnier
2021-06-12 14:41 ` Maxim Nikulin
1 sibling, 1 reply; 33+ messages in thread
From: Stefan Monnier @ 2021-06-10 21:10 UTC (permalink / raw)
To: Maxim Nikulin; +Cc: emacs-devel, boruch_baum
> There are plenty of CSV dialects. If decimal separator is "," then office
> software uses ";" instead of comma as cell (field) separator.
But there's no reason to presume that a given CSV file was generated in
the same locale as the one we're currently using.
So the locale could be one ingredient in the machinery used to guess
which separator was used, but I'm not sure it would be of much help.
[ BTW, I'll take the opportunity to advocate for the use of TSV
instead, which is slightly less ill-defined. ]
Stefan
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC)
2021-06-10 20:20 ` Boruch Baum
@ 2021-06-11 6:19 ` Eli Zaretskii
2021-06-11 8:18 ` Boruch Baum
2021-06-11 16:51 ` Maxim Nikulin
0 siblings, 2 replies; 33+ messages in thread
From: Eli Zaretskii @ 2021-06-11 6:19 UTC (permalink / raw)
To: Boruch Baum; +Cc: manikulin, emacs-devel
> Date: Thu, 10 Jun 2021 16:20:45 -0400
> From: Boruch Baum <boruch_baum@gmx.com>
> Cc: manikulin@gmail.com, emacs-devel@gnu.org
>
> On 2021-06-10 22:23, Eli Zaretskii wrote:
> > > Date: Thu, 10 Jun 2021 15:04:53 -0400
> > > From: Boruch Baum <boruch_baum@gmx.com>
> > > Cc: manikulin@gmail.com, emacs-devel@gnu.org
> > >
> > > > That would make the output of 'format' dependent on the current
> > > > locale
> > >
> > > That's the elisp programmer's business, not your responsibilty.
> >
> > What could the Lisp programmer do in this situation?
>
> It's not your responsibilty.
It is my responsibility to make sure we don't add to Emacs features
that are not very useful, or are against the Emacs philosophy and/or
design principles.
> I can say that in the use-case that prompted my request, I'm confident
> it will *never* be an issue. I ask format to give me a string and I
> display it. End of story. Whether just 99% or 99.99%, the overwhelming
> majority of cases will be the same. Your concerns are total non-issues.
You can always write a module to implement this feature, if you want
it for your own purposes. Or you could change Emacs to support that
directly and maintain that change locally. There's no need to
introduce into Emacs features that are useful for a few people.
> [Commentary: I see all format specifiers supported but the two
> requested.]
You are overlooking some aspects of the code if that is your
conclusion.
> > I don't think it's TRT for Emacs to expose locale-dependent features
> > that cannot be controlled from Lisp
>
> Then don't make them locale specific. Implement the single-quote
> specifier the same way you currently handle the floating-point specifier
> '%f', a locale-specific format that has existed in emacs without
> complaint since ...
That was my suggestion, more or less. Patches are welcome to
implement that.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC)
2021-06-11 6:19 ` Eli Zaretskii
@ 2021-06-11 8:18 ` Boruch Baum
2021-06-11 16:51 ` Maxim Nikulin
1 sibling, 0 replies; 33+ messages in thread
From: Boruch Baum @ 2021-06-11 8:18 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: manikulin, emacs-devel
On 2021-06-11 09:19, Eli Zaretskii wrote:
> You can always write a module to implement this feature, if you want
> it for your own purposes.
Done and published and on MELPA before my first post here. And I wasn't
the first; there are other code examples available elsewhere.
> There's no need to introduce into Emacs features that are useful for a
> few people.
??? But it's clear that your set in your decision. I think I've done
more than enogh to try and benefit others on this one.
--
hkp://keys.gnupg.net
CA45 09B5 5351 7C11 A9D1 7286 0036 9E45 1595 8BC0
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC)
2021-06-10 19:23 ` Eli Zaretskii
2021-06-10 20:20 ` Boruch Baum
@ 2021-06-11 13:56 ` Filipp Gunbin
2021-06-11 14:10 ` Eli Zaretskii
1 sibling, 1 reply; 33+ messages in thread
From: Filipp Gunbin @ 2021-06-11 13:56 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: manikulin, Boruch Baum, emacs-devel
On 10/06/2021 22:23 +0300, Eli Zaretskii wrote:
> I don't think it's TRT for Emacs to expose locale-dependent features
> that cannot be controlled from Lisp, sorry. We need to find a better
> way. For example, there could be a Lisp variable that specifies the
> group separator character, and then 'format' could use that character
> when the format spec includes %'. Which means we'd need to implement
> that in our own code; patches welcome.
Maybe an alternative set of specifiers, which output data in
locale-specific format. Then a single variable to let-bound around
format, which instructs what locale to use. Very simple...
Filipp
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC)
2021-06-11 13:56 ` Filipp Gunbin
@ 2021-06-11 14:10 ` Eli Zaretskii
2021-06-11 18:52 ` Filipp Gunbin
0 siblings, 1 reply; 33+ messages in thread
From: Eli Zaretskii @ 2021-06-11 14:10 UTC (permalink / raw)
To: Filipp Gunbin; +Cc: manikulin, boruch_baum, emacs-devel
> From: Filipp Gunbin <fgunbin@fastmail.fm>
> Cc: Boruch Baum <boruch_baum@gmx.com>, manikulin@gmail.com,
> emacs-devel@gnu.org
> Date: Fri, 11 Jun 2021 16:56:34 +0300
>
> On 10/06/2021 22:23 +0300, Eli Zaretskii wrote:
>
> > I don't think it's TRT for Emacs to expose locale-dependent features
> > that cannot be controlled from Lisp, sorry. We need to find a better
> > way. For example, there could be a Lisp variable that specifies the
> > group separator character, and then 'format' could use that character
> > when the format spec includes %'. Which means we'd need to implement
> > that in our own code; patches welcome.
>
> Maybe an alternative set of specifiers, which output data in
> locale-specific format. Then a single variable to let-bound around
> format, which instructs what locale to use. Very simple...
Sorry, I don't think I understand what you propose. Please elaborate
on the "alternative set of specifiers, which output data in
locale-specific format".
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC)
2021-06-11 6:19 ` Eli Zaretskii
2021-06-11 8:18 ` Boruch Baum
@ 2021-06-11 16:51 ` Maxim Nikulin
1 sibling, 0 replies; 33+ messages in thread
From: Maxim Nikulin @ 2021-06-11 16:51 UTC (permalink / raw)
To: emacs-devel
Eli, Boruch, you are overreacting (both).
On 11/06/2021 13:19, Eli Zaretskii wrote:
> There's no need to
> introduce into Emacs features that are useful for a few people.
I think that expectation of users and developers in respect to support
of locales evolves in time. Proper formatting of numbers is useful more
widely then for a few people.
Boruch, till your last messages, I believed that you were convinced that
adding support of "'" and "I" is not so easy.
Support of locale-dependent format specifiers through printf looks
attractive but it can not be directly used by `format' or other elisp
functions in a safe way.
Some code calling `format' implicitly expects that it generates
locale-independent numbers, so changing its behavior is not backward
compatible.
libc can only work with single global locale at any moment. I expect
that attempt to "temporary" call setlocale(LC_NUMERIC, "") will be
permanent source of bugs: forgotten reverting call, call of a function
that needs universal format in locale-specific context, threads started
at inappropriate moment, etc.
Another implementation of locale functions is necessary with ability to
perform parsing and formatting without touching of global variables.
Personally I expect basic level functions with explicit locale context
(random names):
(locale-format-number-with-ctx
(locale-get-current-context :group-separator 'suppress)
1234567890)
or with explicit locale instead of `locale-get-current-context'. It is
better to add some convenience helpers that inspect text properties,
buffer-local and global settings to determine context:
(locale-format-number 1234567890)
and maybe even `locale-format[-with-ctx]' that accepts printf-like
format string.
On 11/06/2021 03:20, Boruch Baum wrote:
> Then don't make them locale specific. Implement the
> single-quote specifier the same way you currently handle the
> floating-point specifier '%f', a locale-specific format that
> has existed in emacs without complaint since ...
You are confusing something. "%f" is not locale-specific inside Emacs,
it uses "universal" format with dot "." as decimal separator even in
locales with "," in this role. At the same time "'" is highly
locale-dependent in libc. Group sizes and group separator widely
vary. I posted this example earlier:
LC_NUMERIC=C.UTF-8 /usr/bin/printf "%'d\n" 1234567890
1234567890
LC_NUMERIC=en_US.UTF-8 /usr/bin/printf "%'d\n" 1234567890
1,234,567,890
LC_NUMERIC=es_ES.UTF-8 /usr/bin/printf "%'d\n" 1234567890
1.234.567.890
LC_NUMERIC=ru_RU.UTF-8 /usr/bin/printf "%'d\n" 1234567890
1 234 567 890
LC_NUMERIC=en_IN.UTF-8 /usr/bin/printf "%'d\n" 1234567890
1,23,45,67,890
> It's not your responsibilty.
>
> I can say that in the use-case that prompted my request, I'm
> confident it will *never* be an issue. I ask format to give
> me a string and I display it. End of story. Whether just 99%
> or 99.99%, the overwhelming majority of cases will be the
> same. Your concerns are total non-issues.
I would prefer to avoid idiosyncrasy when "%'d" is locale-dependent but
"%f" is not.
P.S.
With some limitation (printf binary is available and you do not need to
work with floating point numbers), you can leverage libc formatting
facilities with the following crutch:
(shell-command-to-string (format "/usr/bin/printf \"%%'d\" %d"
1234567890))
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC)
2021-06-10 16:57 ` Eli Zaretskii
2021-06-10 18:01 ` Boruch Baum
@ 2021-06-11 16:58 ` Maxim Nikulin
2021-06-11 18:04 ` Eli Zaretskii
1 sibling, 1 reply; 33+ messages in thread
From: Maxim Nikulin @ 2021-06-11 16:58 UTC (permalink / raw)
To: emacs-devel; +Cc: boruch_baum
On 10/06/2021 23:57, Eli Zaretskii wrote:
>> From: Maxim Nikulin Date: Thu, 10 Jun 2021 23:28:59 +0700
>
> For processing CSV, if there's a need to know whether the
> locale uses the comma as a decimal separator, we could
> indeed extend locale-info. But such an extension is almost
> trivial and doesn't even touch on the significant problems
> in the rest of the discussion.
>
You forgot `setlocale(LC_NUMERIC, "C")', didn't you?
#include <langinfo.h>
#include <locale.h>
#include <stdio.h>
int main() {
setlocale(LC_ALL, "");
printf("%c", *nl_langinfo(RADIXCHAR));
setlocale(LC_NUMERIC, "C");
printf("%c\n", *nl_langinfo(RADIXCHAR));
return 0;
}
Output is ",.". There is nl_langinfo_l(3), but it requires more work.
After parsing of rows to cells, it may be necessary to parse numbers
("2,34" to 2.34). That is why quality of CSV file import is tightly
related to handling of number formats.
>> I was trying to support Boruch that buffer-local variables
>> may be important part of locale context, more precise than
>> global settings,
>
> They are more precise, but they don't support mixed
> languages in the same buffer, something that happens in
> Emacs very frequently.
In some cases I would prefer to have uniform format of numbers and dates
despite alternating language in the buffer, e.g. for my private notes.
> Here's a trivial example:
>
> (insert (downcase (buffer-substring POS1 POS2)))
>
> Contrast with
>
> (insert (downcase "FOO"))
Either `set-text-properties' should be called on "FOO" before passing it
to `downcase' or `locale-downcase' with LOCALE first argument should be
added. Moreover, such `locale-downcase' function may be used to
implement higher level functions working with implicit locales. LOCALE
may assume some hierarchy with user overrides for particular call, text
properties, buffer variables, global settings.
> Yes: what we have already in Emacs. That covers a lot of
> the same Unicode turf that ICU handles, because we import
> and use the same Unicode files and tables.
There are plenty of xml files in cldr-common-39.0.zip
(common/main/*.xml) https://www.unicode.org/Public/cldr/39/ in addition
to Unicode data in Emacs sources. They include rules for number
formatting https://unicode.org/reports/tr35/tr35-numbers.html
Of course, human-style number formatting, currencies, financial style,
etc. may be discarded and implementation may be limited to grouping and
decimal separators (leaving other features to further requests). There
is newlocale(3) function in glibc to obtain minimal subset of
properties. I am not familiar with other platforms.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC)
2021-06-11 16:58 ` Maxim Nikulin
@ 2021-06-11 18:04 ` Eli Zaretskii
2021-06-14 16:38 ` Maxim Nikulin
0 siblings, 1 reply; 33+ messages in thread
From: Eli Zaretskii @ 2021-06-11 18:04 UTC (permalink / raw)
To: Maxim Nikulin; +Cc: boruch_baum, emacs-devel
> From: Maxim Nikulin <manikulin@gmail.com>
> Date: Fri, 11 Jun 2021 23:58:24 +0700
> Cc: boruch_baum@gmx.com
>
> On 10/06/2021 23:57, Eli Zaretskii wrote:
> >> From: Maxim Nikulin Date: Thu, 10 Jun 2021 23:28:59 +0700
> >
> > For processing CSV, if there's a need to know whether the
> > locale uses the comma as a decimal separator, we could
> > indeed extend locale-info. But such an extension is almost
> > trivial and doesn't even touch on the significant problems
> > in the rest of the discussion.
> >
>
> You forgot `setlocale(LC_NUMERIC, "C")', didn't you?
No, I didn't. Adding a call to setlocale to locale-info, even if we
want to add an argument for the caller to control the locale, is
trivial.
> > Here's a trivial example:
> >
> > (insert (downcase (buffer-substring POS1 POS2)))
> >
> > Contrast with
> >
> > (insert (downcase "FOO"))
>
> Either `set-text-properties' should be called on "FOO" before passing it
> to `downcase'
Which property will help here? we don't have such properties. they
need to be designed and implemented.
> or `locale-downcase' with LOCALE first argument should be
> added.
How would you implement locale-downcase? Are you familiar with how
Emacs case tables work?
And even if we had locale-downcase, which locale would you pass to it
in any given use case?
Please note that I'm not saying these issues cannot be solved -- they
can. I'm saying that designing them requires non-trivial thought,
something we didn't yet do.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC)
2021-06-11 14:10 ` Eli Zaretskii
@ 2021-06-11 18:52 ` Filipp Gunbin
2021-06-11 19:34 ` Eli Zaretskii
0 siblings, 1 reply; 33+ messages in thread
From: Filipp Gunbin @ 2021-06-11 18:52 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: manikulin, boruch_baum, emacs-devel
On 11/06/2021 17:10 +0300, Eli Zaretskii wrote:
>> From: Filipp Gunbin <fgunbin@fastmail.fm>
>> Cc: Boruch Baum <boruch_baum@gmx.com>, manikulin@gmail.com,
>> emacs-devel@gnu.org
>> Date: Fri, 11 Jun 2021 16:56:34 +0300
>>
>> On 10/06/2021 22:23 +0300, Eli Zaretskii wrote:
>>
>> > I don't think it's TRT for Emacs to expose locale-dependent features
>> > that cannot be controlled from Lisp, sorry. We need to find a better
>> > way. For example, there could be a Lisp variable that specifies the
>> > group separator character, and then 'format' could use that character
>> > when the format spec includes %'. Which means we'd need to implement
>> > that in our own code; patches welcome.
>>
>> Maybe an alternative set of specifiers, which output data in
>> locale-specific format. Then a single variable to let-bound around
>> format, which instructs what locale to use. Very simple...
>
> Sorry, I don't think I understand what you propose. Please elaborate
> on the "alternative set of specifiers, which output data in
> locale-specific format".
I mean that for every specifier which could be affected by locale (but
isn't), there could be additional specifier, which takes locale into
account. Less awkward, there could be an explicit modifier which says
"use locale for this specifier in format". Something like `O' or `E'
modifier in "format-time-string".
This way only given format call is affected, without surprises somewhere
below in the call stack.
Then, a locale to use could be let-bound around this format call, thus
overriding the default which came from env vars or from somewhere else.
Filipp
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC)
2021-06-11 18:52 ` Filipp Gunbin
@ 2021-06-11 19:34 ` Eli Zaretskii
0 siblings, 0 replies; 33+ messages in thread
From: Eli Zaretskii @ 2021-06-11 19:34 UTC (permalink / raw)
To: Filipp Gunbin; +Cc: manikulin, boruch_baum, emacs-devel
> From: Filipp Gunbin <fgunbin@fastmail.fm>
> Cc: boruch_baum@gmx.com, manikulin@gmail.com, emacs-devel@gnu.org
> Date: Fri, 11 Jun 2021 21:52:57 +0300
>
> I mean that for every specifier which could be affected by locale (but
> isn't), there could be additional specifier, which takes locale into
> account. Less awkward, there could be an explicit modifier which says
> "use locale for this specifier in format". Something like `O' or `E'
> modifier in "format-time-string".
That could work, but if we rely on libc functions for the
locale-dependent behavior, it could be slow, because switching a
locale could be expensive.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC)
2021-06-10 21:10 ` Stefan Monnier
@ 2021-06-12 14:41 ` Maxim Nikulin
0 siblings, 0 replies; 33+ messages in thread
From: Maxim Nikulin @ 2021-06-12 14:41 UTC (permalink / raw)
To: emacs-devel
On 11/06/2021 04:10, Stefan Monnier wrote:
>> There are plenty of CSV dialects. If decimal separator is
>> "," then office software uses ";" instead of comma as cell
>> (field) separator.
>
> But there's no reason to presume that a given CSV file was
> generated in the same locale as the one we're currently
> using.
>
> So the locale could be one ingredient in the machinery used
> to guess which separator was used, but I'm not sure it would
> be of much help.
You are right. My expectation is still that ";" is mostly used for
locales with comma as decimal separator, and in such cases it must be
tried with higher priority due to records that have enough amount of
both characters.
1,2;3,45;56,789
Originally the question raised exactly in the context of attempt to
improve guessing of separator:
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=47885 The patches have
however other problems. Advanced options for table import are likely
more suitable e.g. for csv-mode and may become unnecessary burden in
org-mode (especially if kill-yank would work well in both directions).
Certainly users should have opportunity to explicitly specify the
dialect of the files they are going to import.
> [ BTW, I'll take the opportunity to advocate for the use of
> TSV instead, which is slightly less ill-defined. ]
In real world one often does have full control of file formats he has to
deal with. In simple cases I can use space separated columns of numbers
having fixed width. On the other hand downloaded bank statements are
namely CSV with ";" as delimiter and in legacy windows 8-bit encoding
(and such files have a kind of header with varying column number
distinct from the following table).
So ability to get decimal separator for current locale may slightly
improve user experience with import of CSV files at least in Org mode.
However it is just an aspect of support of locale-aware number formats
in Emacs.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC)
2021-06-11 18:04 ` Eli Zaretskii
@ 2021-06-14 16:38 ` Maxim Nikulin
2021-06-14 17:19 ` Eli Zaretskii
0 siblings, 1 reply; 33+ messages in thread
From: Maxim Nikulin @ 2021-06-14 16:38 UTC (permalink / raw)
To: emacs-devel
On 12/06/2021 01:04, Eli Zaretskii wrote:
>> From: Maxim Nikulin Date: Fri, 11 Jun 2021 23:58:24 +0700
>> On 10/06/2021 23:57, Eli Zaretskii wrote:
>> >> From: Maxim Nikulin Date: Thu, 10 Jun 2021 23:28:59 +0700
>> >
>> > For processing CSV, if there's a need to know whether the
>> > locale uses the comma as a decimal separator, we could
>> > indeed extend locale-info. But such an extension is almost
>> > trivial and doesn't even touch on the significant problems
>> > in the rest of the discussion.
>>
>> You forgot `setlocale(LC_NUMERIC, "C")', didn't you?
>
> No, I didn't. Adding a call to setlocale to locale-info, even if we
> want to add an argument for the caller to control the locale, is
> trivial.
I would avoid such manipulations and the reason is not efficiency of
particular implementation. Locale is not thread local, so changing it in
*getter* is a source rare but really obscure hardly reproducible
problems. I do not like such output
1234.567890
1234,567890
1234.567890
of the following program changing locale in a parallel thread
#include <locale.h>
#include <pthread.h>
#include <stdio.h>
#include <time.h>
#define DELAY_NS 40000000
void* other_thread(void *arg) {
struct timespec delay = { 0, DELAY_NS/2 };
nanosleep(&delay, NULL);
printf("%f\n", 1234.56789);
delay.tv_nsec = DELAY_NS;
nanosleep(&delay, NULL);
printf("%f\n", 1234.56789);
nanosleep(&delay, NULL);
printf("%f\n", 1234.56789);
return NULL;
}
int main() {
setlocale(LC_NUMERIC, "C");
pthread_t thread_id;
pthread_create(&thread_id, NULL, &other_thread, NULL);
struct timespec delay = { 0, DELAY_NS };
nanosleep(&delay, NULL);
setlocale(LC_NUMERIC, "");
nanosleep(&delay, NULL);
setlocale(LC_NUMERIC, "C");
void *res;
pthread_join(thread_id, &res);
return 0;
}
Explicit locale objects decoupled from application-wide global
preferences are safer and more flexible.
>> > Here's a trivial example:
>> >
>> > (insert (downcase (buffer-substring POS1 POS2)))
>> >
>> > Contrast with
>> >
>> > (insert (downcase "FOO"))
>>
>> Either `set-text-properties' should be called on "FOO" before passing it
>> to `downcase'
>
> Which property will help here? we don't have such properties. they
> need to be designed and implemented.
Let's name it "locale". Its value is some object that represents either
a "solid" locale such as de_DE or combined LC_NUMERIC=en_GB +
LC_TIME=de_DE + default fr_FR. Data required for particular operations
may be loaded on demand.
>> or `locale-downcase' with LOCALE first argument should be
>> added.
>
> How would you implement locale-downcase? Are you familiar with how
> Emacs case tables work?
No, I am not familiar with Emacs internals dealing with case conversion.
I already wrote I am even unaware how to properly handle Turkish. For
the scripts I am familiar with, it is enough to have default table for
normalizing and conversion. I can admit that sometimes conversion may
depend on language and the language can not be determined from code
point. In such cases I expect additional override table that has higher
priority than the default one.
> And even if we had locale-downcase, which locale would you
> pass to it in any given use case?
I already mentioned responsibility chain: explicit value or set of
overrides passed by user, text property for particular span of
characters, buffer-local variables, global environment variables. Locale
may be instantiated from its name "it_IT". Convenience functions to
obtain locale at point likely will be useful as well. (Actually I am
assuming number parsing-formatting rather than case conversion.)
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC)
2021-06-14 16:38 ` Maxim Nikulin
@ 2021-06-14 17:19 ` Eli Zaretskii
2021-06-16 17:27 ` Maxim Nikulin
0 siblings, 1 reply; 33+ messages in thread
From: Eli Zaretskii @ 2021-06-14 17:19 UTC (permalink / raw)
To: Maxim Nikulin; +Cc: emacs-devel
> From: Maxim Nikulin <manikulin@gmail.com>
> Date: Mon, 14 Jun 2021 23:38:19 +0700
>
> >> You forgot `setlocale(LC_NUMERIC, "C")', didn't you?
> >
> > No, I didn't. Adding a call to setlocale to locale-info, even if we
> > want to add an argument for the caller to control the locale, is
> > trivial.
>
> I would avoid such manipulations and the reason is not efficiency of
> particular implementation.
But we already do that in locale-info, for locale categories other
than LC_NUMERIC.
> >> > Here's a trivial example:
> >> >
> >> > (insert (downcase (buffer-substring POS1 POS2)))
> >> >
> >> > Contrast with
> >> >
> >> > (insert (downcase "FOO"))
> >>
> >> Either `set-text-properties' should be called on "FOO" before passing it
> >> to `downcase'
> >
> > Which property will help here? we don't have such properties. they
> > need to be designed and implemented.
> Let's name it "locale". Its value is some object that represents either
> a "solid" locale such as de_DE or combined LC_NUMERIC=en_GB +
> LC_TIME=de_DE + default fr_FR. Data required for particular operations
> may be loaded on demand.
How do you associate such an object with text of a buffer or a string
such that different parts of the text could have different "locales"
(as required for a multi-lingual editor such as Emacs)?
> > How would you implement locale-downcase? Are you familiar with how
> > Emacs case tables work?
>
> No, I am not familiar with Emacs internals dealing with case conversion.
> I already wrote I am even unaware how to properly handle Turkish. For
> the scripts I am familiar with, it is enough to have default table for
> normalizing and conversion. I can admit that sometimes conversion may
> depend on language and the language can not be determined from code
> point. In such cases I expect additional override table that has higher
> priority than the default one.
>
> > And even if we had locale-downcase, which locale would you
> > pass to it in any given use case?
>
> I already mentioned responsibility chain: explicit value or set of
> overrides passed by user, text property for particular span of
> characters, buffer-local variables, global environment variables. Locale
> may be instantiated from its name "it_IT". Convenience functions to
> obtain locale at point likely will be useful as well. (Actually I am
> assuming number parsing-formatting rather than case conversion.)
What you describe doesn't exist, not even in its design stage. We are
back where we started: I said at the very beginning that this
infrastructure is missing. It is futile to discuss solutions which
rely on infrastructure that doesn't exist.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC)
2021-06-14 17:19 ` Eli Zaretskii
@ 2021-06-16 17:27 ` Maxim Nikulin
2021-06-16 17:36 ` Eli Zaretskii
0 siblings, 1 reply; 33+ messages in thread
From: Maxim Nikulin @ 2021-06-16 17:27 UTC (permalink / raw)
To: emacs-devel
On 15/06/2021 00:19, Eli Zaretskii wrote:
>> From: Maxim Nikulin Date: Mon, 14 Jun 2021 23:38:19 +0700
>>>> You forgot `setlocale(LC_NUMERIC, "C")', didn't you?
>>>
>>> No, I didn't. Adding a call to setlocale to locale-info, even if we
>>> want to add an argument for the caller to control the locale, is
>>> trivial.
>>
>> I would avoid such manipulations and the reason is not efficiency of
>> particular implementation.
>
> But we already do that in locale-info, for locale categories other
> than LC_NUMERIC.
I have seen it call for collation. It may be reasonable in past (e.g. as
quick plumbing), but I thunk such things should be avoided for the sake
of thread safety. Moreover, you are crying that implementations other
than glibc are inefficient.
Proper instruments for concurrency and parallel execution may alleviate
issues like the following:
https://lists.gnu.org/archive/html/emacs-devel/2021-05/msg01297.html
> I hear quite a few people run at least two instances of
> Emacs, for example if they don't want Gnus fetching new
> articles and email to freeze the interactive session for
> prolonged times.
>>> Which property will help here? we don't have such properties. they
>>> need to be designed and implemented.
>> Let's name it "locale". Its value is some object that represents either
>> a "solid" locale such as de_DE or combined LC_NUMERIC=en_GB +
>> LC_TIME=de_DE + default fr_FR. Data required for particular operations
>> may be loaded on demand.
>
> How do you associate such an object with text of a buffer or a string
> such that different parts of the text could have different "locales"
> (as required for a multi-lingual editor such as Emacs)?
I already suggested some variants and you did not argue.
Technically it can be done through `set-text-properties'. If there are
no such text properties than it may be assumed that no fine grain tuning
is requires, so buffer-local variables or global environment are used.
Language may be guessed from code points of characters. Particular modes
may either inhibit localization for program code or extract necessary
information from HTML lang attributes, arguments of LaTeX
\foreignlanguage macro, etc.
In my opinion, Emacs is not really multi-lingual yet due to limitations
and inconveniences. Some other software demonstrated significantly
greater progress during last decade. Maybe achieving current level was
so painful that you are prefer to avoid touching of related code for any
reason, not to speak of various improvements.
> > And even if we had locale-downcase, which locale would you
> > pass to it in any given use case?
>
> I already mentioned responsibility chain: explicit value or set of
> overrides passed by user, text property for particular span of
> characters, buffer-local variables, global environment variables. Locale
> may be instantiated from its name "it_IT". Convenience functions to
> obtain locale at point likely will be useful as well. (Actually I am
> assuming number parsing-formatting rather than case conversion.)
I am aware that such features do not exist yet. Only libc is available,
but we consider it as inappropriate (you due to performance issues, me
due to thread safety and possible bugs due to missed calls restoring old
state). You are against using of CLDR detailed info for locales through
ICU due to alternative implementation of Unicode character tables
(another part of ICU) already exists in Emacs. At the same time you are
refusing any attempts to discuss possible extensions from any side: low
level base functions taking locale as explicit argument or high level
requirements what interface can be useful to "implicitly" derive locale
of particular part of text (actually text prepared for intelligent
handling of locales).
Certainly with position "locale-aware formatting can not be implemented
because Emacs has no necessary infrastructure and such feature is needed
by only a handful of user" there is no way to improve anything.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC)
2021-06-16 17:27 ` Maxim Nikulin
@ 2021-06-16 17:36 ` Eli Zaretskii
0 siblings, 0 replies; 33+ messages in thread
From: Eli Zaretskii @ 2021-06-16 17:36 UTC (permalink / raw)
To: Maxim Nikulin; +Cc: emacs-devel
> From: Maxim Nikulin <manikulin@gmail.com>
> Date: Thu, 17 Jun 2021 00:27:49 +0700
>
> Certainly with position "locale-aware formatting can not be implemented
> because Emacs has no necessary infrastructure and such feature is needed
> by only a handful of user" there is no way to improve anything.
Please see how many changes I committed over the years to Emacs, some
of them quite revolutionary (bidirectional editing comes to mind), and
I'm sure you will realize that the above is a gross misunderstanding
of what I meant.
^ permalink raw reply [flat|nested] 33+ messages in thread
end of thread, other threads:[~2021-06-16 17:36 UTC | newest]
Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-06-06 23:36 CSV parsing and other issues (Re: LC_NUMERIC) Boruch Baum
2021-06-07 12:28 ` Eli Zaretskii
2021-06-08 0:45 ` Boruch Baum
2021-06-08 2:35 ` Eli Zaretskii
2021-06-08 15:35 ` Stefan Monnier
2021-06-08 16:35 ` Maxim Nikulin
2021-06-08 18:52 ` Eli Zaretskii
2021-06-10 16:28 ` Maxim Nikulin
2021-06-10 16:57 ` Eli Zaretskii
2021-06-10 18:01 ` Boruch Baum
2021-06-10 18:50 ` Eli Zaretskii
2021-06-10 19:04 ` Boruch Baum
2021-06-10 19:23 ` Eli Zaretskii
2021-06-10 20:20 ` Boruch Baum
2021-06-11 6:19 ` Eli Zaretskii
2021-06-11 8:18 ` Boruch Baum
2021-06-11 16:51 ` Maxim Nikulin
2021-06-11 13:56 ` Filipp Gunbin
2021-06-11 14:10 ` Eli Zaretskii
2021-06-11 18:52 ` Filipp Gunbin
2021-06-11 19:34 ` Eli Zaretskii
2021-06-11 16:58 ` Maxim Nikulin
2021-06-11 18:04 ` Eli Zaretskii
2021-06-14 16:38 ` Maxim Nikulin
2021-06-14 17:19 ` Eli Zaretskii
2021-06-16 17:27 ` Maxim Nikulin
2021-06-16 17:36 ` Eli Zaretskii
2021-06-10 21:10 ` Stefan Monnier
2021-06-12 14:41 ` Maxim Nikulin
-- strict thread matches above, loose matches on Subject: below --
2021-06-02 18:54 LC_NUMERIC formatting [FEATURE REQUEST] Boruch Baum
2021-06-03 14:44 ` CSV parsing and other issues (Re: LC_NUMERIC) Maxim Nikulin
2021-06-03 15:01 ` Eli Zaretskii
2021-06-04 16:31 ` Maxim Nikulin
2021-06-04 19:17 ` Eli Zaretskii
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.