* LC_NUMERIC formatting [FEATURE REQUEST] @ 2021-06-02 18:54 Boruch Baum 2021-06-03 14:44 ` CSV parsing and other issues (Re: LC_NUMERIC) Maxim Nikulin 0 siblings, 1 reply; 5+ messages in thread From: Boruch Baum @ 2021-06-02 18:54 UTC (permalink / raw) To: Emacs-Devel List Please consider having the elisp 'format' function adopt the single-quote and 'I' flags. Each is already implemented in both the GNU C printf command and the linux printf command. The single-quote flag is part of the 'Single UNIX Specification' and the 'I' flag has been part of glibc since version 2.2 [ref: man(3) printf]. If function 'format' uses 'printf' as its backend, this would seem to be a matter of exposing an existing feature. The single-quote flag applies a locale's thousands' grouping characters if appropriate, which I only find currently implemented as user-defined functions, ie. outside of emacs-core. $ printf "%'d\n" 1234 1,234 The 'I' flag [from man(3) printf]: uses the locale's alternative output digits, if any. For example, since glibc 2.2.3 this will give Arabic-Indic digits in the Persian ("fa_IR") locale. $ LC_ALL=fa_IR.utf8 /usr/bin/printf "%Id\n" 1234 ۱۲۳۴ $ LC_ALL=fa_IR.utf8 /usr/bin/printf "%'Id\n" 1234 ۱٬۲۳۴ -- hkp://keys.gnupg.net CA45 09B5 5351 7C11 A9D1 7286 0036 9E45 1595 8BC0 ^ permalink raw reply [flat|nested] 5+ messages in thread
* CSV parsing and other issues (Re: LC_NUMERIC) 2021-06-02 18:54 LC_NUMERIC formatting [FEATURE REQUEST] Boruch Baum @ 2021-06-03 14:44 ` Maxim Nikulin 2021-06-03 15:01 ` Eli Zaretskii 0 siblings, 1 reply; 5+ messages in thread From: Maxim Nikulin @ 2021-06-03 14:44 UTC (permalink / raw) To: emacs-devel; +Cc: Utkarsh Singh On 03/06/2021 01:54, Boruch Baum wrote: > Please consider having the elisp 'format' function adopt the > single-quote and 'I' flags. Each is already implemented in both the GNU > C printf command and the linux printf command. The single-quote flag is > part of the 'Single UNIX Specification' and the 'I' flag has been part > of glibc since version 2.2 [ref: man(3) printf]. > > If function 'format' uses 'printf' as its backend, this would seem to be > a matter of exposing an existing feature. I do not know the story why Emacs does not support locale-aware number formats, but I suspect that relying on libc is opening a can of worms. Once setlocale(LC_NUMERIC, "") is invoked, one is never sure if printf- and scanf-like functions deal with default "C" representation or with formatted accordingly to current locale numbers. Some numbers related to communication protocols must be always formatted using "C" locale. I do not remember if it happened with XFree86 or with Xorg, but at certain moment users experienced problems. X11 could not start at all due to invalid configs. The source of problem was "," as decimal separator in some locales and wrong expectations concerning numbers in config files. Recently I found the following fixup_locale function: http://git.savannah.gnu.org/cgit/emacs.git/tree/src/emacs.c#n2861 setlocale (LC_NUMERIC, "C"); I was surprised that impossible to determine current decimal separator from elisp. At the same time e.g. `string-collate-lessp' has LOCALE argument. A month ago some patches were submitted to Org mode with intention to improve import of tables, see https://debbugs.gnu.org/47885 A part of discussion is missed in the bug tracker: https://lists.gnu.org/archive/html/emacs-orgmode/2021-05/msg00693.html Org mode has a piece of code that tries to guess if the file has commas or tabs as field separator (CSV or TSV format). The suggested change adds e.g. semicolon. (Sidenote: probably csv-mode is a better place than org-mode for such code.) The problem is that office software uses semicolon for locales where comma serves as decimal separator for floating-point numbers (e.g. de_DE, es_ES, fr_FR, ru_RU, etc.): A;1,2;3,4 So semicolon should be tried with higher priority than comma if in current locale numbers are represented as e.g 1,2. Unfortunately the only way to get such information from Emacs is to call some external application. Maintaining own mapping of locale to separator is unnecessary burden. Besides office software, there are some equipment that always use "C" number formatting, so a user can have a mix of files with various dialects of CSV. Thus locale info is not enough, some heuristics is required anyway. More subtle questions rise on the next step. Org allows to perform calculations on table cells (and there is calc). Should numbers be converted to "C" locale representation during import? Should conversion happen when passing cell content as argument and the result converted back to current locale? I anticipate that buffer-local setting will be requested. There was even discussion of mixed-language documents in emacs-orgmode mail list, however numbers were not mentioned. So locale-aware number formatting would be a great improvement for Emacs. On the other hand, it should be implemented with great care to avoid localized numbers in some cases. Maybe locale argument should be passed to functions that deal with numbers. Formatting of integer numbers is not enough, floating point numbers should be handled as well. Parsing numbers formatted accordingly to locale rules should be addressed too. A function similar to `locale-info' is highly desired to get properties of locale (e.g. decimal_point from result of localeconv). Some decision is required whether calc & Co should operate with localized numbers. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC) 2021-06-03 14:44 ` CSV parsing and other issues (Re: LC_NUMERIC) Maxim Nikulin @ 2021-06-03 15:01 ` Eli Zaretskii 2021-06-04 16:31 ` Maxim Nikulin 0 siblings, 1 reply; 5+ messages in thread From: Eli Zaretskii @ 2021-06-03 15:01 UTC (permalink / raw) To: Maxim Nikulin; +Cc: utkarsh190601, emacs-devel > From: Maxim Nikulin <manikulin@gmail.com> > Date: Thu, 3 Jun 2021 21:44:08 +0700 > Cc: Utkarsh Singh <utkarsh190601@gmail.com> > > So locale-aware number formatting would be a great improvement for > Emacs. On the other hand, it should be implemented with great care to > avoid localized numbers in some cases. Maybe locale argument should be > passed to functions that deal with numbers. Formatting of integer > numbers is not enough, floating point numbers should be handled as well. > Parsing numbers formatted accordingly to locale rules should be > addressed too. A function similar to `locale-info' is highly desired to > get properties of locale (e.g. decimal_point from result of localeconv). > Some decision is required whether calc & Co should operate with > localized numbers. Setting a locale globally in Emacs is a non-starter, for the reasons that you point out and others. Text processing in Emacs is generally separate from the current locale's rules, mainly to have Emacs work the same in any locale. So passing a locale argument to functions that produce output, with the intent to request some behavior to be tailored to that locale, is the only reasonable way to have this kind of functionalities in Emacs. The problem with that, of course, is that not every supported platform can dynamically change the locale, let alone do that efficiently. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC) 2021-06-03 15:01 ` Eli Zaretskii @ 2021-06-04 16:31 ` Maxim Nikulin 2021-06-04 19:17 ` Eli Zaretskii 0 siblings, 1 reply; 5+ messages in thread From: Maxim Nikulin @ 2021-06-04 16:31 UTC (permalink / raw) To: emacs-devel; +Cc: utkarsh190601 On 03/06/2021 22:01, Eli Zaretskii wrote: >> From: Maxim Nikulin >> Date: Thu, 3 Jun 2021 21:44:08 +0700 >> >> So locale-aware number formatting would be a great improvement for >> Emacs. On the other hand, it should be implemented with great care to >> avoid localized numbers in some cases. Maybe locale argument should be >> passed to functions that deal with numbers. Formatting of integer >> numbers is not enough, floating point numbers should be handled as well. >> Parsing numbers formatted accordingly to locale rules should be >> addressed too. A function similar to `locale-info' is highly desired to >> get properties of locale (e.g. decimal_point from result of localeconv). >> Some decision is required whether calc & Co should operate with >> localized numbers. > > Setting a locale globally in Emacs is a non-starter, for the reasons > that you point out and others. Text processing in Emacs is generally > separate from the current locale's rules, mainly to have Emacs work > the same in any locale. So passing a locale argument to functions > that produce output, with the intent to request some behavior to be > tailored to that locale, is the only reasonable way to have this kind > of functionalities in Emacs. The problem with that, of course, is > that not every supported platform can dynamically change the locale, > let alone do that efficiently. I do not think it is efficient to require from users to fight with number formatting themselves. Some links from my browser history when I was trying to figure out how to get locale-specific decimal separator in elisp: https://stackoverflow.com/questions/35661173/how-to-format-table-fields-as-currency-in-org-mode https://www.emacswiki.org/emacs/AddCommasToNumbers https://www.reddit.com/r/emacs/comments/61mhyx/creating_a_function_to_add_commasseparators_to/ Do you mean that it is necessary to create new implementation of number formatter specially for Emacs? Something like https://unicode.org/reports/tr35/tr35-numbers.html Unicode Locale Data Markup Language (LDML) Part 3: Numbers Actually it is an almost random link. I do not know which source is currently considered as the best collection of wisdom related to number formatting. Outside of Emacs world, when I needed numbers formatted accordingly to various locales previous time, I was lucky enough to use code similar to the following one and did not care concerning details: #include <cstdio> #include <QLocale> #include <QTextStream> void test(QTextStream& stream, const char *loc_name) { QLocale loc(QString::fromLocal8Bit(loc_name)); stream << "point: " << loc.decimalPoint() << " " << loc.toString(12345.67) << " " << loc.toString(1234567890) << "\n"; } int main(int argc, char *argv[]) { QTextStream stream(stdout); for (int i = 1; i < argc; ++i) { test(stream, argv[i]); } return 0; } ./qtloc de_DE en_GB fa_IR point: , 12.345,7 1.234.567.890 point: . 12,345.7 1,234,567,890 point: ٫ ۱۲٬۳۴۵٫۷ ۱٬۲۳۴٬۵۶۷٬۸۹۰ Surprisingly it works even despite I have not generated de and fa locales. On linux I see that Emacs is linked with ICU ldd /usr/bin/emacs | grep -i icu libicuuc.so.66 => /usr/lib/x86_64-linux-gnu/libicuuc.so.66 (0x00007f457c799000) libicudata.so.66 => /usr/lib/x86_64-linux-gnu/libicudata.so.66 (0x00007f457a61c000) I am not familiar with ICU API but I expect that it may be utilized https://github.com/unicode-org/icu/blob/main/icu4c/source/samples/numfmt/capi.c Do you have a bright idea concerning implementation of parser-formatter for numbers with reasonable efforts? ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC) 2021-06-04 16:31 ` Maxim Nikulin @ 2021-06-04 19:17 ` Eli Zaretskii 0 siblings, 0 replies; 5+ messages in thread From: Eli Zaretskii @ 2021-06-04 19:17 UTC (permalink / raw) To: Maxim Nikulin; +Cc: utkarsh190601, emacs-devel > Cc: utkarsh190601@gmail.com > From: Maxim Nikulin <manikulin@gmail.com> > Date: Fri, 4 Jun 2021 23:31:13 +0700 > > > Setting a locale globally in Emacs is a non-starter, for the reasons > > that you point out and others. Text processing in Emacs is generally > > separate from the current locale's rules, mainly to have Emacs work > > the same in any locale. So passing a locale argument to functions > > that produce output, with the intent to request some behavior to be > > tailored to that locale, is the only reasonable way to have this kind > > of functionalities in Emacs. The problem with that, of course, is > > that not every supported platform can dynamically change the locale, > > let alone do that efficiently. > > I do not think it is efficient to require from users to fight with > number formatting themselves. I didn't suggest that. I was talking about the design of the APIs that need to be able to provide locale-specific formatting. The implementation should be provided by Emacs core, of course. > Do you mean that it is necessary to create new implementation of number > formatter specially for Emacs? Either that, or use the underlying C library if it can accept a locale specifier, or if it supports efficient dynamic change of the locale, like we do in some of the implementations of string-collate-lessp. > On linux I see that Emacs is linked with ICU It isn't. It's either HarfBuzz or maybe libc that pulls in the ICU library. Emacs doesn't use it directly. > Do you have a bright idea concerning implementation of parser-formatter > for numbers with reasonable efforts? See above. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-06-04 19:17 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-06-02 18:54 LC_NUMERIC formatting [FEATURE REQUEST] Boruch Baum 2021-06-03 14:44 ` CSV parsing and other issues (Re: LC_NUMERIC) Maxim Nikulin 2021-06-03 15:01 ` Eli Zaretskii 2021-06-04 16:31 ` Maxim Nikulin 2021-06-04 19:17 ` Eli Zaretskii
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).