* LC_NUMERIC formatting [FEATURE REQUEST] @ 2021-06-02 18:54 Boruch Baum 2021-06-03 14:44 ` CSV parsing and other issues (Re: LC_NUMERIC) Maxim Nikulin 0 siblings, 1 reply; 34+ messages in thread From: Boruch Baum @ 2021-06-02 18:54 UTC (permalink / raw) To: Emacs-Devel List Please consider having the elisp 'format' function adopt the single-quote and 'I' flags. Each is already implemented in both the GNU C printf command and the linux printf command. The single-quote flag is part of the 'Single UNIX Specification' and the 'I' flag has been part of glibc since version 2.2 [ref: man(3) printf]. If function 'format' uses 'printf' as its backend, this would seem to be a matter of exposing an existing feature. The single-quote flag applies a locale's thousands' grouping characters if appropriate, which I only find currently implemented as user-defined functions, ie. outside of emacs-core. $ printf "%'d\n" 1234 1,234 The 'I' flag [from man(3) printf]: uses the locale's alternative output digits, if any. For example, since glibc 2.2.3 this will give Arabic-Indic digits in the Persian ("fa_IR") locale. $ LC_ALL=fa_IR.utf8 /usr/bin/printf "%Id\n" 1234 ۱۲۳۴ $ LC_ALL=fa_IR.utf8 /usr/bin/printf "%'Id\n" 1234 ۱٬۲۳۴ -- hkp://keys.gnupg.net CA45 09B5 5351 7C11 A9D1 7286 0036 9E45 1595 8BC0 ^ permalink raw reply [flat|nested] 34+ messages in thread
* CSV parsing and other issues (Re: LC_NUMERIC) 2021-06-02 18:54 LC_NUMERIC formatting [FEATURE REQUEST] Boruch Baum @ 2021-06-03 14:44 ` Maxim Nikulin 2021-06-03 15:01 ` Eli Zaretskii 0 siblings, 1 reply; 34+ messages in thread From: Maxim Nikulin @ 2021-06-03 14:44 UTC (permalink / raw) To: emacs-devel; +Cc: Utkarsh Singh On 03/06/2021 01:54, Boruch Baum wrote: > Please consider having the elisp 'format' function adopt the > single-quote and 'I' flags. Each is already implemented in both the GNU > C printf command and the linux printf command. The single-quote flag is > part of the 'Single UNIX Specification' and the 'I' flag has been part > of glibc since version 2.2 [ref: man(3) printf]. > > If function 'format' uses 'printf' as its backend, this would seem to be > a matter of exposing an existing feature. I do not know the story why Emacs does not support locale-aware number formats, but I suspect that relying on libc is opening a can of worms. Once setlocale(LC_NUMERIC, "") is invoked, one is never sure if printf- and scanf-like functions deal with default "C" representation or with formatted accordingly to current locale numbers. Some numbers related to communication protocols must be always formatted using "C" locale. I do not remember if it happened with XFree86 or with Xorg, but at certain moment users experienced problems. X11 could not start at all due to invalid configs. The source of problem was "," as decimal separator in some locales and wrong expectations concerning numbers in config files. Recently I found the following fixup_locale function: http://git.savannah.gnu.org/cgit/emacs.git/tree/src/emacs.c#n2861 setlocale (LC_NUMERIC, "C"); I was surprised that impossible to determine current decimal separator from elisp. At the same time e.g. `string-collate-lessp' has LOCALE argument. A month ago some patches were submitted to Org mode with intention to improve import of tables, see https://debbugs.gnu.org/47885 A part of discussion is missed in the bug tracker: https://lists.gnu.org/archive/html/emacs-orgmode/2021-05/msg00693.html Org mode has a piece of code that tries to guess if the file has commas or tabs as field separator (CSV or TSV format). The suggested change adds e.g. semicolon. (Sidenote: probably csv-mode is a better place than org-mode for such code.) The problem is that office software uses semicolon for locales where comma serves as decimal separator for floating-point numbers (e.g. de_DE, es_ES, fr_FR, ru_RU, etc.): A;1,2;3,4 So semicolon should be tried with higher priority than comma if in current locale numbers are represented as e.g 1,2. Unfortunately the only way to get such information from Emacs is to call some external application. Maintaining own mapping of locale to separator is unnecessary burden. Besides office software, there are some equipment that always use "C" number formatting, so a user can have a mix of files with various dialects of CSV. Thus locale info is not enough, some heuristics is required anyway. More subtle questions rise on the next step. Org allows to perform calculations on table cells (and there is calc). Should numbers be converted to "C" locale representation during import? Should conversion happen when passing cell content as argument and the result converted back to current locale? I anticipate that buffer-local setting will be requested. There was even discussion of mixed-language documents in emacs-orgmode mail list, however numbers were not mentioned. So locale-aware number formatting would be a great improvement for Emacs. On the other hand, it should be implemented with great care to avoid localized numbers in some cases. Maybe locale argument should be passed to functions that deal with numbers. Formatting of integer numbers is not enough, floating point numbers should be handled as well. Parsing numbers formatted accordingly to locale rules should be addressed too. A function similar to `locale-info' is highly desired to get properties of locale (e.g. decimal_point from result of localeconv). Some decision is required whether calc & Co should operate with localized numbers. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC) 2021-06-03 14:44 ` CSV parsing and other issues (Re: LC_NUMERIC) Maxim Nikulin @ 2021-06-03 15:01 ` Eli Zaretskii 2021-06-04 16:31 ` Maxim Nikulin 0 siblings, 1 reply; 34+ messages in thread From: Eli Zaretskii @ 2021-06-03 15:01 UTC (permalink / raw) To: Maxim Nikulin; +Cc: utkarsh190601, emacs-devel > From: Maxim Nikulin <manikulin@gmail.com> > Date: Thu, 3 Jun 2021 21:44:08 +0700 > Cc: Utkarsh Singh <utkarsh190601@gmail.com> > > So locale-aware number formatting would be a great improvement for > Emacs. On the other hand, it should be implemented with great care to > avoid localized numbers in some cases. Maybe locale argument should be > passed to functions that deal with numbers. Formatting of integer > numbers is not enough, floating point numbers should be handled as well. > Parsing numbers formatted accordingly to locale rules should be > addressed too. A function similar to `locale-info' is highly desired to > get properties of locale (e.g. decimal_point from result of localeconv). > Some decision is required whether calc & Co should operate with > localized numbers. Setting a locale globally in Emacs is a non-starter, for the reasons that you point out and others. Text processing in Emacs is generally separate from the current locale's rules, mainly to have Emacs work the same in any locale. So passing a locale argument to functions that produce output, with the intent to request some behavior to be tailored to that locale, is the only reasonable way to have this kind of functionalities in Emacs. The problem with that, of course, is that not every supported platform can dynamically change the locale, let alone do that efficiently. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC) 2021-06-03 15:01 ` Eli Zaretskii @ 2021-06-04 16:31 ` Maxim Nikulin 2021-06-04 19:17 ` Eli Zaretskii 0 siblings, 1 reply; 34+ messages in thread From: Maxim Nikulin @ 2021-06-04 16:31 UTC (permalink / raw) To: emacs-devel; +Cc: utkarsh190601 On 03/06/2021 22:01, Eli Zaretskii wrote: >> From: Maxim Nikulin >> Date: Thu, 3 Jun 2021 21:44:08 +0700 >> >> So locale-aware number formatting would be a great improvement for >> Emacs. On the other hand, it should be implemented with great care to >> avoid localized numbers in some cases. Maybe locale argument should be >> passed to functions that deal with numbers. Formatting of integer >> numbers is not enough, floating point numbers should be handled as well. >> Parsing numbers formatted accordingly to locale rules should be >> addressed too. A function similar to `locale-info' is highly desired to >> get properties of locale (e.g. decimal_point from result of localeconv). >> Some decision is required whether calc & Co should operate with >> localized numbers. > > Setting a locale globally in Emacs is a non-starter, for the reasons > that you point out and others. Text processing in Emacs is generally > separate from the current locale's rules, mainly to have Emacs work > the same in any locale. So passing a locale argument to functions > that produce output, with the intent to request some behavior to be > tailored to that locale, is the only reasonable way to have this kind > of functionalities in Emacs. The problem with that, of course, is > that not every supported platform can dynamically change the locale, > let alone do that efficiently. I do not think it is efficient to require from users to fight with number formatting themselves. Some links from my browser history when I was trying to figure out how to get locale-specific decimal separator in elisp: https://stackoverflow.com/questions/35661173/how-to-format-table-fields-as-currency-in-org-mode https://www.emacswiki.org/emacs/AddCommasToNumbers https://www.reddit.com/r/emacs/comments/61mhyx/creating_a_function_to_add_commasseparators_to/ Do you mean that it is necessary to create new implementation of number formatter specially for Emacs? Something like https://unicode.org/reports/tr35/tr35-numbers.html Unicode Locale Data Markup Language (LDML) Part 3: Numbers Actually it is an almost random link. I do not know which source is currently considered as the best collection of wisdom related to number formatting. Outside of Emacs world, when I needed numbers formatted accordingly to various locales previous time, I was lucky enough to use code similar to the following one and did not care concerning details: #include <cstdio> #include <QLocale> #include <QTextStream> void test(QTextStream& stream, const char *loc_name) { QLocale loc(QString::fromLocal8Bit(loc_name)); stream << "point: " << loc.decimalPoint() << " " << loc.toString(12345.67) << " " << loc.toString(1234567890) << "\n"; } int main(int argc, char *argv[]) { QTextStream stream(stdout); for (int i = 1; i < argc; ++i) { test(stream, argv[i]); } return 0; } ./qtloc de_DE en_GB fa_IR point: , 12.345,7 1.234.567.890 point: . 12,345.7 1,234,567,890 point: ٫ ۱۲٬۳۴۵٫۷ ۱٬۲۳۴٬۵۶۷٬۸۹۰ Surprisingly it works even despite I have not generated de and fa locales. On linux I see that Emacs is linked with ICU ldd /usr/bin/emacs | grep -i icu libicuuc.so.66 => /usr/lib/x86_64-linux-gnu/libicuuc.so.66 (0x00007f457c799000) libicudata.so.66 => /usr/lib/x86_64-linux-gnu/libicudata.so.66 (0x00007f457a61c000) I am not familiar with ICU API but I expect that it may be utilized https://github.com/unicode-org/icu/blob/main/icu4c/source/samples/numfmt/capi.c Do you have a bright idea concerning implementation of parser-formatter for numbers with reasonable efforts? ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC) 2021-06-04 16:31 ` Maxim Nikulin @ 2021-06-04 19:17 ` Eli Zaretskii 0 siblings, 0 replies; 34+ messages in thread From: Eli Zaretskii @ 2021-06-04 19:17 UTC (permalink / raw) To: Maxim Nikulin; +Cc: utkarsh190601, emacs-devel > Cc: utkarsh190601@gmail.com > From: Maxim Nikulin <manikulin@gmail.com> > Date: Fri, 4 Jun 2021 23:31:13 +0700 > > > Setting a locale globally in Emacs is a non-starter, for the reasons > > that you point out and others. Text processing in Emacs is generally > > separate from the current locale's rules, mainly to have Emacs work > > the same in any locale. So passing a locale argument to functions > > that produce output, with the intent to request some behavior to be > > tailored to that locale, is the only reasonable way to have this kind > > of functionalities in Emacs. The problem with that, of course, is > > that not every supported platform can dynamically change the locale, > > let alone do that efficiently. > > I do not think it is efficient to require from users to fight with > number formatting themselves. I didn't suggest that. I was talking about the design of the APIs that need to be able to provide locale-specific formatting. The implementation should be provided by Emacs core, of course. > Do you mean that it is necessary to create new implementation of number > formatter specially for Emacs? Either that, or use the underlying C library if it can accept a locale specifier, or if it supports efficient dynamic change of the locale, like we do in some of the implementations of string-collate-lessp. > On linux I see that Emacs is linked with ICU It isn't. It's either HarfBuzz or maybe libc that pulls in the ICU library. Emacs doesn't use it directly. > Do you have a bright idea concerning implementation of parser-formatter > for numbers with reasonable efforts? See above. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC) @ 2021-06-06 23:36 Boruch Baum 2021-06-07 12:28 ` Eli Zaretskii 0 siblings, 1 reply; 34+ messages in thread From: Boruch Baum @ 2021-06-06 23:36 UTC (permalink / raw) To: Emacs-Devel List; +Cc: Maxim Nikulin, Eli Zaretskii I wasn't cc'ed (and I don't subscribe to the list), so I only now saw the continuation of my post. 1] @Maxim: You seemed to indicate that the default emacs locale is 'C'. That may be true, and I may be mixing up two separate things, but my observation is that I get 'nil' when I check for any related environment variable using function `getenv', and in practice I need to temporarily manually use function setenv to set LC_COLLATE=C in order to offer several sorting options in package diredc. Note though that feature isn't performing the sort within emacs; it's temporarily setting a shell environment and having the external ls program perform the sort for emacs-core dired. Thus, my experience has been that the default has been something other than C, at least for LC_COLLATE. I suspect that's true for ALL emacs users. 2] @Eli: You wrote > > The problem with that, of course, is that not every supported > > platform can dynamically change the locale, let alone do that > > efficiently. I have no idea to what actual supported platform you're referring. 3] @ELi: Your wrote > > Text processing in Emacs is generally separate from the current > > locale's rules, > > ... > > So passing a locale argument to functions that produce output, > > with the intent to request some behavior to be tailored to that > > locale, is the only reasonable way to have this kind Agreed. My input here is that there should be clear documentation of how to retrieve a value for that argument from a buffer's context, (maybe the same way that flyspell does?). I see also that I created room for confusion in asking actually for TWO features (single-quote and upper-case I) because the two will behave differently in an expected default condition. The single quote format (for the thousands separator) can be expected to produce a result always for all conditions of locale, while I expect most locale cases won't produce any special output for the upper-case I format option. -- hkp://keys.gnupg.net CA45 09B5 5351 7C11 A9D1 7286 0036 9E45 1595 8BC0 ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC) 2021-06-06 23:36 Boruch Baum @ 2021-06-07 12:28 ` Eli Zaretskii 2021-06-08 0:45 ` Boruch Baum 0 siblings, 1 reply; 34+ messages in thread From: Eli Zaretskii @ 2021-06-07 12:28 UTC (permalink / raw) To: Boruch Baum; +Cc: manikulin, emacs-devel > Date: Sun, 6 Jun 2021 19:36:38 -0400 > From: Boruch Baum <boruch_baum@gmx.com> > Cc: Maxim Nikulin <manikulin@gmail.com>, Eli Zaretskii <eliz@gnu.org> > > 1] @Maxim: You seemed to indicate that the default emacs locale is 'C'. > That may be true That's only true for LC_NUMERIC category. > 2] @Eli: You wrote > > > > The problem with that, of course, is that not every supported > > > platform can dynamically change the locale, let alone do that > > > efficiently. > > I have no idea to what actual supported platform you're referring. GNU/Linux is the only one I know of that can efficiently switch locales dynamically (and even that in recent versions of libc, AFAIR). > > > Text processing in Emacs is generally separate from the current > > > locale's rules, > > > ... > > > So passing a locale argument to functions that produce output, > > > with the intent to request some behavior to be tailored to that > > > locale, is the only reasonable way to have this kind > > Agreed. My input here is that there should be clear documentation of > how to retrieve a value for that argument from a buffer's context, > (maybe the same way that flyspell does?). Sorry, I don't see the relevance. I was talking about calling functions, so how does some buffer enter this picture? Buffers don't have anything to do with the locale used by library functions called by Emacs. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC) 2021-06-07 12:28 ` Eli Zaretskii @ 2021-06-08 0:45 ` Boruch Baum 2021-06-08 2:35 ` Eli Zaretskii 0 siblings, 1 reply; 34+ messages in thread From: Boruch Baum @ 2021-06-08 0:45 UTC (permalink / raw) To: Eli Zaretskii; +Cc: manikulin, emacs-devel On 2021-06-07 15:28, Eli Zaretskii wrote: > Sorry, I don't see the relevance. I was talking about calling > functions, so how does some buffer enter this picture? Buffers don't > have anything to do with the locale used by library functions called > by Emacs. No? If an Emacs user has two buffers in two separate languages, the buffer-local settings aren't / won't be respected? -- hkp://keys.gnupg.net CA45 09B5 5351 7C11 A9D1 7286 0036 9E45 1595 8BC0 ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC) 2021-06-08 0:45 ` Boruch Baum @ 2021-06-08 2:35 ` Eli Zaretskii 2021-06-08 15:35 ` Stefan Monnier 2021-06-08 16:35 ` Maxim Nikulin 0 siblings, 2 replies; 34+ messages in thread From: Eli Zaretskii @ 2021-06-08 2:35 UTC (permalink / raw) To: Boruch Baum; +Cc: manikulin, emacs-devel > Date: Mon, 7 Jun 2021 20:45:10 -0400 > From: Boruch Baum <boruch_baum@gmx.com> > Cc: emacs-devel@gnu.org, manikulin@gmail.com > > On 2021-06-07 15:28, Eli Zaretskii wrote: > > Sorry, I don't see the relevance. I was talking about calling > > functions, so how does some buffer enter this picture? Buffers don't > > have anything to do with the locale used by library functions called > > by Emacs. > > No? If an Emacs user has two buffers in two separate languages, the > buffer-local settings aren't / won't be respected? First, language is different from locale. And second, we don't even have a buffer-local notion of language yet. What we can support (but seldom if ever do) is to have buffer-local case-conversion table, which is a very small part of language- or locale-dependent settings. So no, buffer-local aspects in general don't affect what you have in mind, not yet anyway. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC) 2021-06-08 2:35 ` Eli Zaretskii @ 2021-06-08 15:35 ` Stefan Monnier 2021-06-08 16:35 ` Maxim Nikulin 1 sibling, 0 replies; 34+ messages in thread From: Stefan Monnier @ 2021-06-08 15:35 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Boruch Baum, manikulin, emacs-devel > First, language is different from locale. And second, we don't even > have a buffer-local notion of language yet. What we can support (but > seldom if ever do) is to have buffer-local case-conversion table, > which is a very small part of language- or locale-dependent settings. > > So no, buffer-local aspects in general don't affect what you have in > mind, not yet anyway. Worse: it's not uncommon to run code which doesn't really care about its current-buffer, so it's not always correct to presume that the settings of the current-buffer should be used. We already suffer from such problems in some corner cases with code that uses `\<` or `\_<` in regexps matching on strings (rather than buffer content) where the result can unexpectedly depend on the buffer which happens to be current. Stefan ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC) 2021-06-08 2:35 ` Eli Zaretskii 2021-06-08 15:35 ` Stefan Monnier @ 2021-06-08 16:35 ` Maxim Nikulin 2021-06-08 18:52 ` Eli Zaretskii 1 sibling, 1 reply; 34+ messages in thread From: Maxim Nikulin @ 2021-06-08 16:35 UTC (permalink / raw) To: emacs-devel; +Cc: emacs-devel On 08/06/2021 09:35, Eli Zaretskii wrote: > From: Boruch Baum >> No? If an Emacs user has two buffers in two separate languages, the >> buffer-local settings aren't / won't be respected? > > First, language is different from locale. And second, we don't even > have a buffer-local notion of language yet. Certainly locale is more precise than just language since it includes region and other variants, moreover it can be granularly tuned (date, numbers, sorting can be adjusted independently), but I still think that all these properties can be sometimes broadly referred to as language. Does not we discuss a feature request? Low level functions can accept explicit locale. Higher level API can obtain it implicitly from buffer-local variables and global locale. For example the LOCALE argument of `string-collate-lessp' is optional one. I can even anticipate that locale may be stored in text properties some times. A random message from recent "About multilingual documents" thread at emacs-orgmode mail list: https://lists.gnu.org/archive/html/emacs-orgmode/2021-05/msg00252.html At first basic functionality may be implemented. The problem is to choose extensible API. On 07/06/2021 06:36, Boruch Baum wrote: > I get 'nil' when I check for any related > environment variable using function `getenv' Do not confuse setlocale and setenv. setenv affects later calls to setlocale (with NULL as locale argument) and child processes. setlocale deals with current processes it can take into account or override values of environment variables. setlocale is not exposed to elisp. > The single quote format (for the thousands separator) can be expected > to produce a result always for all conditions of locale, while I > expect most locale cases won't produce any special output for the > upper-case I format option. I still think that "'" and "I" formats are tightly bound. Grouping style is locale-dependent. So representation of digits is just another property of locale. LC_NUMERIC=C.UTF-8 /usr/bin/printf "%'d\n" 1234567890 1234567890 LC_NUMERIC=en_US.UTF-8 /usr/bin/printf "%'d\n" 1234567890 1,234,567,890 LC_NUMERIC=es_ES.UTF-8 /usr/bin/printf "%'d\n" 1234567890 1.234.567.890 LC_NUMERIC=ru_RU.UTF-8 /usr/bin/printf "%'d\n" 1234567890 1 234 567 890 Even group size is not always 3 : LC_NUMERIC=en_IN.UTF-8 /usr/bin/printf "%'d\n" 1234567890 : 1,23,45,67,890 https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/NumberFormat "India uses thousands/lakh/crore separators" I just have realized that nl_langinfo(3) (and nl_langinfo_l(3) as well) from libc accepts RADIXCHAR (decimal dot) and THOUSEP (group separator) arguments. They are good candidates for `locale-info' extension. On 05/06/2021 02:17, Eli Zaretskii wrote: >> From: Maxim Nikulin Date: Fri, 4 Jun 2021 23:31:13 +0700 >> On linux I see that Emacs is linked with ICU > > It isn't. It's either HarfBuzz or maybe libc that pulls in the ICU > library. Emacs doesn't use it directly. Actually Qt links my example with other libraries from ICU. My point was that since Emacs anyway (indirectly) links with this library, the dependency may be not so heavy. My personal requirements for number formatting were quite modest so far, I expect that other languages (CJK, right-to-left scripts, etc.) may require quite special treatment, so implementation in Emacs (and further maintenance) may require a lot of work. At least API of ICU should be studied to get some inspiration what features will be necessary for users from other regions. E.g. I was completely unaware that negative sign may be represented by parenthesis (JavaScript, may be executed in browser developer tools) new Intl.NumberFormat('en-GB', { style: 'currency', currency: 'USD', currencySign: 'accounting', signDisplay: 'always' }).format(-3500); "(US$3,500.00)" https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/NumberFormat/NumberFormat I do not know if Intl API is really convenient. I see that there is no direct way to get decimal separator. However it can serve as another source for inspiration. I expect enough surprises and unexpected "discoveries" during implementation of better locale support. That is why I would consider adapting some more or less established API for this purpose. P.S. On 07/06/2021 06:36, Boruch Baum wrote: > and in practice I need to temporarily manually use function setenv to > set LC_COLLATE=C in order to offer several sorting options in package > diredc. Ideally you should avoid this and use envp argument of execve(2) system call. Otherwise it could interfere with other packages, especially if threads are involved. Unsure that Emacs currently provides such API option. > it's temporarily setting a shell environment and having the external > ls program perform the sort for emacs-core dired. I am unsure if "ls" may be reliably used at all. File names may have e.g. newlines, various control characters, part that looks rather similar to ls output. I am not familiar with dired internals. At first by intention was to create an issue for diredc but skimming though its code I did not found direct "ls" invocation. Some problems with ls: https://mywiki.wooledge.org/BashPitfalls?highlight=%28%5CbCategoryShell%5Cb%29#for_f_in_.24.28ls_.2A.mp3.29 Bash Pitfalls: item #1 ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC) 2021-06-08 16:35 ` Maxim Nikulin @ 2021-06-08 18:52 ` Eli Zaretskii 2021-06-10 16:28 ` Maxim Nikulin 0 siblings, 1 reply; 34+ messages in thread From: Eli Zaretskii @ 2021-06-08 18:52 UTC (permalink / raw) To: Maxim Nikulin; +Cc: boruch_baum, emacs-devel > Cc: emacs-devel@gnu.org > From: Maxim Nikulin <manikulin@gmail.com> > Date: Tue, 8 Jun 2021 23:35:51 +0700 > > On 08/06/2021 09:35, Eli Zaretskii wrote: > > From: Boruch Baum > >> No? If an Emacs user has two buffers in two separate languages, the > >> buffer-local settings aren't / won't be respected? > > > > First, language is different from locale. And second, we don't even > > have a buffer-local notion of language yet. > > Certainly locale is more precise than just language since it includes > region and other variants, moreover it can be granularly tuned (date, > numbers, sorting can be adjusted independently), but I still think that > all these properties can be sometimes broadly referred to as language. No, they cannot, not in general. A locale comes with a whole database of different settings: language, encoding (a.k.a. "codeset"), formats of date and time, names of days of the week and of the months, rules for collation and capitalization, etc. etc. You can easily find several locales whose language is English, but some/many/all of the other locale-dependent settings are different. It isn't a coincidence that a locale's name includes more than just the language part. > Low level functions can accept explicit locale. Which ones? Most libc routines don't, they use the locale as a global identifier. And many libc's (with the prominent exception of glibc) don't support efficient change of a locale in the middle of a program, they assume that the program's locale is set once at program startup. > Higher level API can obtain it implicitly from > buffer-local variables and global locale. For example the LOCALE > argument of `string-collate-lessp' is optional one. I can even > anticipate that locale may be stored in text properties some times. A > random message from recent "About multilingual documents" thread at > emacs-orgmode mail list: > https://lists.gnu.org/archive/html/emacs-orgmode/2021-05/msg00252.html That's mostly about input methods and org-export, I don't see how it's relevant to what Boruch asked. > At first basic functionality may be implemented. The problem is to > choose extensible API. No, the problem is to have a design that would allow an efficient implementation. Given what the underlying libc does, it isn't easy. And then we have conceptual problems. For example, in a multilingual editor such as Emacs, the notion of a "buffer language" not always makes sense, you'd need to support portions of text that have different language properties. Imagine switching locales as Emacs processes adjacent stretches of text and other complications. For example, changing letter-case for a stretch or Turkish text is supposed to be different from the English or German text. I'm all ears for ideas how to design such "language support". It definitely isn't easy, so if you have ideas, please voice them! > I just have realized that nl_langinfo(3) (and nl_langinfo_l(3) as well) > from libc accepts RADIXCHAR (decimal dot) and THOUSEP (group separator) > arguments. They are good candidates for `locale-info' extension. We already use nl_langinfo in locale-info, so what exactly are you suggesting here? adding more items? You don't really expect Lisp programs to format numbers such as 123,456 by hand after learning from locale-info that the thousands separator is a comma, do you? > Actually Qt links my example with other libraries from ICU. My point was > that since Emacs anyway (indirectly) links with this library, the > dependency may be not so heavy. If you are suggesting that we introduce ICU as a dependency, we could discuss the pros and cons. It isn't a simple decision, because ICU comes with a lot of baggage that we already have implemented in Emacs, so whether we throw away what we have and use ICU instead, or just add what we miss without depending on ICU, requires good thought and good acquaintance with the ICU internals (to make sure it does what we want in Emacs, and doesn't break existing features). > My personal requirements for number > formatting were quite modest so far, I expect that other languages (CJK, > right-to-left scripts, etc.) may require quite special treatment, so > implementation in Emacs (and further maintenance) may require a lot of > work. At least API of ICU should be studied to get some inspiration what > features will be necessary for users from other regions. I don't think the problem is the API. > E.g. I was completely unaware that negative sign may be represented by > parenthesis Really? it's standard in financial applications. > I expect enough surprises and unexpected "discoveries" during > implementation of better locale support. That is why I would consider > adapting some more or less established API for this purpose. I don't think "consider" cuts it. We have already a lot of stuff in Emacs; what we don't have needs serious design and comparison of available implementation options. Emacs's needs are quite special and unlike those of most other programs. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC) 2021-06-08 18:52 ` Eli Zaretskii @ 2021-06-10 16:28 ` Maxim Nikulin 2021-06-10 16:57 ` Eli Zaretskii 2021-06-10 21:10 ` Stefan Monnier 0 siblings, 2 replies; 34+ messages in thread From: Maxim Nikulin @ 2021-06-10 16:28 UTC (permalink / raw) To: emacs-devel; +Cc: boruch_baum On 09/06/2021 01:52, Eli Zaretskii wrote: > From: Maxim Nikulin Date: Tue, 8 Jun 2021 23:35:51 +0700 I have reordered some parts of discussion. >> I just have realized that nl_langinfo(3) (and >> nl_langinfo_l(3) as well) from libc accepts RADIXCHAR >> (decimal dot) and THOUSEP (group separator) >> arguments. They are good candidates for `locale-info' >> extension. > > We already use nl_langinfo in locale-info, so what exactly > are you suggesting here? adding more items? You don't > really expect Lisp programs to format numbers such as > 123,456 by hand after learning from locale-info that the > thousands separator is a comma, do you? I have hijacked Boruch's thread and changed the subject to "CSV parsing". There are plenty of CSV dialects. If decimal separator is "," then office software uses ";" instead of comma as cell (field) separator. So to parse CSV file it is necessary to know decimal separator in a specified locale. RADIXCHAR as argument of nl_langinfo(3) is a first step to better user experience with CSV files. Unfortunately it allows only to get reasonable visual representation. Taking advantage of Org spreadsheet calculations require parsing cell contents thus parsing of numbers (and maybe dates). I mentioned earlier https://debbugs.gnu.org/47885 and a part of discussion that is missed in the bug tracker: https://lists.gnu.org/archive/html/emacs-orgmode/2021-05/msg00693.html I have seen nl_langinfo without RADIXCHAR in emacs sources http://git.savannah.gnu.org/cgit/emacs.git/tree/src/w32proc.c#n3258 http://git.savannah.gnu.org/cgit/emacs.git/tree/lib-src/ntlib.c#n520 Originally during discussion in emacs-orgmode I did not plan to raise the question concerning number formatting and parsing since I had no hope for any positive outcome without consistent proposal. Accidentally I notices Borich's message and decided to add another use case. >> On 08/06/2021 09:35, Eli Zaretskii wrote: >> > From: Boruch Baum >> >> No? If an Emacs user has two buffers in two separate languages, the >> >> buffer-local settings aren't / won't be respected? >> > >> > First, language is different from locale. And second, we don't even >> > have a buffer-local notion of language yet. >> >> Certainly locale is more precise than just language since it includes >> region and other variants, moreover it can be granularly tuned (date, >> numbers, sorting can be adjusted independently), but I still think that >> all these properties can be sometimes broadly referred to as language. > > No, they cannot, not in general. A locale comes with a whole database > of different settings: language, encoding (a.k.a. "codeset"), formats > of date and time, names of days of the week and of the months, rules > for collation and capitalization, etc. etc. You can easily find > several locales whose language is English, but some/many/all of the > other locale-dependent settings are different. It isn't a coincidence > that a locale's name includes more than just the language part. I wrote almost the same concerning locale variants and components, so I feel some sort of confusion and can not get its origin. I was trying to support Boruch that buffer-local variables may be important part of locale context, more precise than global settings, and a fallback if locale is not specified for particular span of text. In respect to such hierarchy language vs. locale difference does not matter. >> Low level functions can accept explicit locale. > > Which ones? Most libc routines don't, they use the locale > as a global identifier. And many libc's (with the prominent > exception of glibc) don't support efficient change of a > locale in the middle of a program, they assume that the > program's locale is set once at program startup. Hypothetical functions in new elisp API, maybe relying on some external libraries. I believed, you agreed that global LC_NUMERIC must be "C" to avoid various sort of problems with data exchange. I am not aware of libc functions for number formatting or parsing that can take explicit locale (I have seen such feature in C++ standard library, Qt, other languages). Totalitarian approach of libc with the only locale facet, the only timezone imposes too hard limitations to consider some libc functions as useful and reliable in more or less complex application. Its API is suitable for simple tools that can quickly do their work and do not assume any conversion. More flexible base layer is required when mix of environments is expected. Full support of locale features requires a lot of work, that is why I am asking if some external library can be used instead. >> Higher level API can obtain it implicitly from >> buffer-local variables and global locale. For example the >> LOCALE argument of `string-collate-lessp' is optional >> one. I can even anticipate that locale may be stored in >> text properties some times. A random message from recent >> "About multilingual documents" thread at emacs-orgmode >> mail list: >> https://lists.gnu.org/archive/html/emacs-orgmode/2021-05/msg00252.html > > That's mostly about input methods and org-export, I don't > see how it's relevant to what Boruch asked. I added this link to show you that demand for multilanguage documents is real. Notice that problems with spell checking were mentioned in that discussion. Earlier I saw suggestions to switch ispell language with input method. In my opinion it is ridiculous. Personally I rather need combined dictionary then explicitly marked text regions. I expect that new features will be wider utilized when possibility to use them will appear. >> At first basic functionality may be implemented. The >> problem is to choose extensible API. > > No, the problem is to have a design that would allow an > efficient implementation. Given what the underlying libc > does, it isn't easy. That is why I looking for an alternative to libc. Previously you wrote "locale switching". I would rather say constructing and destroying locales on demand. Switching may behave not so well when thread are involved. > And then we have conceptual problems. For example, in a > multilingual editor such as Emacs, the notion of a "buffer > language" not always makes sense, you'd need to support > portions of text that have different language properties. > Imagine switching locales as Emacs processes adjacent > stretches of text and other complications. For example, > changing letter-case for a stretch or Turkish text is > supposed to be different from the English or German text. > I'm all ears for ideas how to design such "language > support". It definitely isn't easy, so if you have ideas, > please voice them! I never have a consistent vision nor see a conceptual problem. Buffer-local settings are just more specific than global ones. That is I mentioned text properties as even more precise in my previous message. Maybe even current mode can help to build proper hierarchy of locale contexts. HTML has "lang" attribute, there is "\foreignlanguage" in LaTeX, etc. I have heard that special case exists in Turkish, but I was not curious enough to find details and rules when and how it should be applied. > If you are suggesting that we introduce ICU as a dependency, > we could discuss the pros and cons. I consider it as the most complete available implementation. Do you know a comparable alternative? I have realized that since Emacs has support of dynamic modules, it is possible to create a prototype with bindings to external library without rebuilding of Emacs. > I don't think the problem is the API. I think, introducing features gradually will be more headache for developers of external packages than absence of support at all. API determines the scope of such features. >> E.g. I was completely unaware that negative sign may be >> represented by parenthesis > > Really? it's standard in financial applications. Is it really so standard? Maybe I have seen such format, even guessed from some context that e.g. table column with such numbers should assume negative values, or e.g. in discount entry. At least I did not recognize such format as some general rule. new Intl.NumberFormat('de-DE', {style: 'currency', currency: 'USD', currencySign: 'accounting', signDisplay: 'always'}).format(-3500); "-3.500,00 $" new Intl.NumberFormat('es-ES', {style: 'currency', currency: 'USD', currencySign: 'accounting', signDisplay: 'always'}).format(-3500); "-3500,00 US$" new Intl.NumberFormat('fr-FR', {style: 'currency', currency: 'USD', currencySign: 'accounting', signDisplay: 'always'}).format(-3500); "(3 500,00 $US)" new Intl.NumberFormat('ru-RU', {style: 'currency', currency: 'USD', currencySign: 'accounting', signDisplay: 'always'}).format(-3500); "-3 500,00 $" >> I expect enough surprises and unexpected "discoveries" >> during implementation of better locale support. That is >> why I would consider adapting some more or less >> established API for this purpose. > > I don't think "consider" cuts it. We have already a lot of > stuff in Emacs; what we don't have needs serious design and > comparison of available implementation options. Emacs's > needs are quite special and unlike those of most other > programs. I still think that expectation of users around the globe are more special than Emacs' needs at least in respect to format of numbers. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC) 2021-06-10 16:28 ` Maxim Nikulin @ 2021-06-10 16:57 ` Eli Zaretskii 2021-06-10 18:01 ` Boruch Baum 2021-06-11 16:58 ` Maxim Nikulin 2021-06-10 21:10 ` Stefan Monnier 1 sibling, 2 replies; 34+ messages in thread From: Eli Zaretskii @ 2021-06-10 16:57 UTC (permalink / raw) To: Maxim Nikulin; +Cc: boruch_baum, emacs-devel > From: Maxim Nikulin <manikulin@gmail.com> > Date: Thu, 10 Jun 2021 23:28:59 +0700 > Cc: boruch_baum@gmx.com > > > We already use nl_langinfo in locale-info, so what exactly > > are you suggesting here? adding more items? You don't > > really expect Lisp programs to format numbers such as > > 123,456 by hand after learning from locale-info that the > > thousands separator is a comma, do you? > > I have hijacked Boruch's thread and changed the subject to "CSV > parsing". That explains part of my confusion. Please try not to hijack discussions; instead, start a separate thread, to avoid such confusion. For processing CSV, if there's a need to know whether the locale uses the comma as a decimal separator, we could indeed extend locale-info. But such an extension is almost trivial and doesn't even touch on the significant problems in the rest of the discussion. > I was trying to support Boruch that buffer-local variables may be > important part of locale context, more precise than global settings, They are more precise, but they don't support mixed languages in the same buffer, something that happens in Emacs very frequently. Which means they are not precise enough. So my POV is that we should look for a way to be able to specify the language of some span of text, in which case buffers that use a single language will be a special case. > > And then we have conceptual problems. For example, in a > > multilingual editor such as Emacs, the notion of a "buffer > > language" not always makes sense, you'd need to support > > portions of text that have different language properties. > > Imagine switching locales as Emacs processes adjacent > > stretches of text and other complications. For example, > > changing letter-case for a stretch or Turkish text is > > supposed to be different from the English or German text. > > I'm all ears for ideas how to design such "language > > support". It definitely isn't easy, so if you have ideas, > > please voice them! > > I never have a consistent vision nor see a conceptual problem. Here's a trivial example: (insert (downcase (buffer-substring POS1 POS2))) Contrast with (insert (downcase "FOO")) The function 'downcase' gets a Lisp string, but it has no way of knowing whether the string is actually a portion of current buffer's text. So how can it apply the correct letter-case conversions, even if some buffer-local setting specifies that this should be done using some specific language's rules? IOW, one of the non-trivial problems is how to process Lisp strings correctly for these purposes. Buffers can have local variables, but what about strings? > > If you are suggesting that we introduce ICU as a dependency, > > we could discuss the pros and cons. > > I consider it as the most complete available implementation. Do you > know a comparable alternative? Yes: what we have already in Emacs. That covers a lot of the same Unicode turf that ICU handles, because we import and use the same Unicode files and tables. The question is: what is best for the future development of Emacs in this area: depend on ICU (which would mean we need to rewrite lots of code that is working well), or extend what we have to support more Unicode features? One not-so-trivial aspect of this is efficiency of fetching character properties (Emacs has char-tables for that, which are efficient both CPU- and memory-wise). Another aspect is support for raw bytes in buffers and strings. And there are probably some others. It is not a simple decision. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC) 2021-06-10 16:57 ` Eli Zaretskii @ 2021-06-10 18:01 ` Boruch Baum 2021-06-10 18:50 ` Eli Zaretskii 2021-06-11 16:58 ` Maxim Nikulin 1 sibling, 1 reply; 34+ messages in thread From: Boruch Baum @ 2021-06-10 18:01 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Maxim Nikulin, emacs-devel On 2021-06-10 19:57, Eli Zaretskii wrote: > > It is not a simple decision. My request at the beginning of the (original) thread was much more limited in scope and still seems to me in fact to be a simple decision, and with no side effects. Paraphrased: Please consider exposing to the elisp `format' function the single-quote and upper-case 'I' format specifiers of the libc (or other) `printf' command. Doing this will just offer an elisp programmer a new option. -- hkp://keys.gnupg.net CA45 09B5 5351 7C11 A9D1 7286 0036 9E45 1595 8BC0 ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC) 2021-06-10 18:01 ` Boruch Baum @ 2021-06-10 18:50 ` Eli Zaretskii 2021-06-10 19:04 ` Boruch Baum 0 siblings, 1 reply; 34+ messages in thread From: Eli Zaretskii @ 2021-06-10 18:50 UTC (permalink / raw) To: Boruch Baum; +Cc: manikulin, emacs-devel > Date: Thu, 10 Jun 2021 14:01:45 -0400 > From: Boruch Baum <boruch_baum@gmx.com> > Cc: Maxim Nikulin <manikulin@gmail.com>, emacs-devel@gnu.org > > Please consider exposing to the elisp `format' function the > single-quote and upper-case 'I' format specifiers of the libc (or > other) `printf' command. > > Doing this will just offer an elisp programmer a new option. That would make the output of 'format' dependent on the current locale, unless we do something else to allow Lisp programs to take control on what those specifiers produce. That "something else" is what I was talking about. It is true that I was talking about larger range of issues, but still, even this small feature touches on some of them. And I don't think you had any ideas for how to resolve those issues, or did I miss something? ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC) 2021-06-10 18:50 ` Eli Zaretskii @ 2021-06-10 19:04 ` Boruch Baum 2021-06-10 19:23 ` Eli Zaretskii 0 siblings, 1 reply; 34+ messages in thread From: Boruch Baum @ 2021-06-10 19:04 UTC (permalink / raw) To: Eli Zaretskii; +Cc: manikulin, emacs-devel On 2021-06-10 21:50, Eli Zaretskii wrote: > > Date: Thu, 10 Jun 2021 14:01:45 -0400 > > From: Boruch Baum <boruch_baum@gmx.com> > > Cc: Maxim Nikulin <manikulin@gmail.com>, emacs-devel@gnu.org > > > > Please consider exposing to the elisp `format' function the > > single-quote and upper-case 'I' format specifiers of the libc (or > > other) `printf' command. > > > > Doing this will just offer an elisp programmer a new option. > > That would make the output of 'format' dependent on the current > locale That's the elisp programmer's business, not your responsibilty. > ... > And I don't think you had any ideas for how to resolve those issues, > or did I miss something? Yes, that I haven't invested in responding about those issues because I don't see any of them as relevant. + Elisp function `format' exists. + Elsip function `format' uses `printf' format specifiers. + Elisp function `format' doesn't expose two of them. -- hkp://keys.gnupg.net CA45 09B5 5351 7C11 A9D1 7286 0036 9E45 1595 8BC0 ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC) 2021-06-10 19:04 ` Boruch Baum @ 2021-06-10 19:23 ` Eli Zaretskii 2021-06-10 20:20 ` Boruch Baum 2021-06-11 13:56 ` Filipp Gunbin 0 siblings, 2 replies; 34+ messages in thread From: Eli Zaretskii @ 2021-06-10 19:23 UTC (permalink / raw) To: Boruch Baum; +Cc: manikulin, emacs-devel > Date: Thu, 10 Jun 2021 15:04:53 -0400 > From: Boruch Baum <boruch_baum@gmx.com> > Cc: manikulin@gmail.com, emacs-devel@gnu.org > > > That would make the output of 'format' dependent on the current > > locale > > That's the elisp programmer's business, not your responsibilty. What could the Lisp programmer do in this situation? > + Elsip function `format' uses `printf' format specifiers. Only for some of the 'format's capabilities, not for all of them. > + Elisp function `format' doesn't expose two of them. I don't think it's TRT for Emacs to expose locale-dependent features that cannot be controlled from Lisp, sorry. We need to find a better way. For example, there could be a Lisp variable that specifies the group separator character, and then 'format' could use that character when the format spec includes %'. Which means we'd need to implement that in our own code; patches welcome. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC) 2021-06-10 19:23 ` Eli Zaretskii @ 2021-06-10 20:20 ` Boruch Baum 2021-06-11 6:19 ` Eli Zaretskii 2021-06-11 13:56 ` Filipp Gunbin 1 sibling, 1 reply; 34+ messages in thread From: Boruch Baum @ 2021-06-10 20:20 UTC (permalink / raw) To: Eli Zaretskii; +Cc: manikulin, emacs-devel On 2021-06-10 22:23, Eli Zaretskii wrote: > > Date: Thu, 10 Jun 2021 15:04:53 -0400 > > From: Boruch Baum <boruch_baum@gmx.com> > > Cc: manikulin@gmail.com, emacs-devel@gnu.org > > > > > That would make the output of 'format' dependent on the current > > > locale > > > > That's the elisp programmer's business, not your responsibilty. > > What could the Lisp programmer do in this situation? It's not your responsibilty. I can say that in the use-case that prompted my request, I'm confident it will *never* be an issue. I ask format to give me a string and I display it. End of story. Whether just 99% or 99.99%, the overwhelming majority of cases will be the same. Your concerns are total non-issues. > > + Elsip function `format' uses `printf' format specifiers. > > Only for some of the 'format's capabilities, not for all of them. [Commentary: 'Some' isn't a number or a percentage.] [Commentary: I see all format specifiers supported but the two requested.] > > + Elisp function `format' doesn't expose two of them. > > I don't think it's TRT for Emacs to expose locale-dependent features > that cannot be controlled from Lisp Then don't make them locale specific. Implement the single-quote specifier the same way you currently handle the floating-point specifier '%f', a locale-specific format that has existed in emacs without complaint since ... -- hkp://keys.gnupg.net CA45 09B5 5351 7C11 A9D1 7286 0036 9E45 1595 8BC0 ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC) 2021-06-10 20:20 ` Boruch Baum @ 2021-06-11 6:19 ` Eli Zaretskii 2021-06-11 8:18 ` Boruch Baum 2021-06-11 16:51 ` Maxim Nikulin 0 siblings, 2 replies; 34+ messages in thread From: Eli Zaretskii @ 2021-06-11 6:19 UTC (permalink / raw) To: Boruch Baum; +Cc: manikulin, emacs-devel > Date: Thu, 10 Jun 2021 16:20:45 -0400 > From: Boruch Baum <boruch_baum@gmx.com> > Cc: manikulin@gmail.com, emacs-devel@gnu.org > > On 2021-06-10 22:23, Eli Zaretskii wrote: > > > Date: Thu, 10 Jun 2021 15:04:53 -0400 > > > From: Boruch Baum <boruch_baum@gmx.com> > > > Cc: manikulin@gmail.com, emacs-devel@gnu.org > > > > > > > That would make the output of 'format' dependent on the current > > > > locale > > > > > > That's the elisp programmer's business, not your responsibilty. > > > > What could the Lisp programmer do in this situation? > > It's not your responsibilty. It is my responsibility to make sure we don't add to Emacs features that are not very useful, or are against the Emacs philosophy and/or design principles. > I can say that in the use-case that prompted my request, I'm confident > it will *never* be an issue. I ask format to give me a string and I > display it. End of story. Whether just 99% or 99.99%, the overwhelming > majority of cases will be the same. Your concerns are total non-issues. You can always write a module to implement this feature, if you want it for your own purposes. Or you could change Emacs to support that directly and maintain that change locally. There's no need to introduce into Emacs features that are useful for a few people. > [Commentary: I see all format specifiers supported but the two > requested.] You are overlooking some aspects of the code if that is your conclusion. > > I don't think it's TRT for Emacs to expose locale-dependent features > > that cannot be controlled from Lisp > > Then don't make them locale specific. Implement the single-quote > specifier the same way you currently handle the floating-point specifier > '%f', a locale-specific format that has existed in emacs without > complaint since ... That was my suggestion, more or less. Patches are welcome to implement that. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC) 2021-06-11 6:19 ` Eli Zaretskii @ 2021-06-11 8:18 ` Boruch Baum 2021-06-11 16:51 ` Maxim Nikulin 1 sibling, 0 replies; 34+ messages in thread From: Boruch Baum @ 2021-06-11 8:18 UTC (permalink / raw) To: Eli Zaretskii; +Cc: manikulin, emacs-devel On 2021-06-11 09:19, Eli Zaretskii wrote: > You can always write a module to implement this feature, if you want > it for your own purposes. Done and published and on MELPA before my first post here. And I wasn't the first; there are other code examples available elsewhere. > There's no need to introduce into Emacs features that are useful for a > few people. ??? But it's clear that your set in your decision. I think I've done more than enogh to try and benefit others on this one. -- hkp://keys.gnupg.net CA45 09B5 5351 7C11 A9D1 7286 0036 9E45 1595 8BC0 ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC) 2021-06-11 6:19 ` Eli Zaretskii 2021-06-11 8:18 ` Boruch Baum @ 2021-06-11 16:51 ` Maxim Nikulin 1 sibling, 0 replies; 34+ messages in thread From: Maxim Nikulin @ 2021-06-11 16:51 UTC (permalink / raw) To: emacs-devel Eli, Boruch, you are overreacting (both). On 11/06/2021 13:19, Eli Zaretskii wrote: > There's no need to > introduce into Emacs features that are useful for a few people. I think that expectation of users and developers in respect to support of locales evolves in time. Proper formatting of numbers is useful more widely then for a few people. Boruch, till your last messages, I believed that you were convinced that adding support of "'" and "I" is not so easy. Support of locale-dependent format specifiers through printf looks attractive but it can not be directly used by `format' or other elisp functions in a safe way. Some code calling `format' implicitly expects that it generates locale-independent numbers, so changing its behavior is not backward compatible. libc can only work with single global locale at any moment. I expect that attempt to "temporary" call setlocale(LC_NUMERIC, "") will be permanent source of bugs: forgotten reverting call, call of a function that needs universal format in locale-specific context, threads started at inappropriate moment, etc. Another implementation of locale functions is necessary with ability to perform parsing and formatting without touching of global variables. Personally I expect basic level functions with explicit locale context (random names): (locale-format-number-with-ctx (locale-get-current-context :group-separator 'suppress) 1234567890) or with explicit locale instead of `locale-get-current-context'. It is better to add some convenience helpers that inspect text properties, buffer-local and global settings to determine context: (locale-format-number 1234567890) and maybe even `locale-format[-with-ctx]' that accepts printf-like format string. On 11/06/2021 03:20, Boruch Baum wrote: > Then don't make them locale specific. Implement the > single-quote specifier the same way you currently handle the > floating-point specifier '%f', a locale-specific format that > has existed in emacs without complaint since ... You are confusing something. "%f" is not locale-specific inside Emacs, it uses "universal" format with dot "." as decimal separator even in locales with "," in this role. At the same time "'" is highly locale-dependent in libc. Group sizes and group separator widely vary. I posted this example earlier: LC_NUMERIC=C.UTF-8 /usr/bin/printf "%'d\n" 1234567890 1234567890 LC_NUMERIC=en_US.UTF-8 /usr/bin/printf "%'d\n" 1234567890 1,234,567,890 LC_NUMERIC=es_ES.UTF-8 /usr/bin/printf "%'d\n" 1234567890 1.234.567.890 LC_NUMERIC=ru_RU.UTF-8 /usr/bin/printf "%'d\n" 1234567890 1 234 567 890 LC_NUMERIC=en_IN.UTF-8 /usr/bin/printf "%'d\n" 1234567890 1,23,45,67,890 > It's not your responsibilty. > > I can say that in the use-case that prompted my request, I'm > confident it will *never* be an issue. I ask format to give > me a string and I display it. End of story. Whether just 99% > or 99.99%, the overwhelming majority of cases will be the > same. Your concerns are total non-issues. I would prefer to avoid idiosyncrasy when "%'d" is locale-dependent but "%f" is not. P.S. With some limitation (printf binary is available and you do not need to work with floating point numbers), you can leverage libc formatting facilities with the following crutch: (shell-command-to-string (format "/usr/bin/printf \"%%'d\" %d" 1234567890)) ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC) 2021-06-10 19:23 ` Eli Zaretskii 2021-06-10 20:20 ` Boruch Baum @ 2021-06-11 13:56 ` Filipp Gunbin 2021-06-11 14:10 ` Eli Zaretskii 1 sibling, 1 reply; 34+ messages in thread From: Filipp Gunbin @ 2021-06-11 13:56 UTC (permalink / raw) To: Eli Zaretskii; +Cc: manikulin, Boruch Baum, emacs-devel On 10/06/2021 22:23 +0300, Eli Zaretskii wrote: > I don't think it's TRT for Emacs to expose locale-dependent features > that cannot be controlled from Lisp, sorry. We need to find a better > way. For example, there could be a Lisp variable that specifies the > group separator character, and then 'format' could use that character > when the format spec includes %'. Which means we'd need to implement > that in our own code; patches welcome. Maybe an alternative set of specifiers, which output data in locale-specific format. Then a single variable to let-bound around format, which instructs what locale to use. Very simple... Filipp ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC) 2021-06-11 13:56 ` Filipp Gunbin @ 2021-06-11 14:10 ` Eli Zaretskii 2021-06-11 18:52 ` Filipp Gunbin 0 siblings, 1 reply; 34+ messages in thread From: Eli Zaretskii @ 2021-06-11 14:10 UTC (permalink / raw) To: Filipp Gunbin; +Cc: manikulin, boruch_baum, emacs-devel > From: Filipp Gunbin <fgunbin@fastmail.fm> > Cc: Boruch Baum <boruch_baum@gmx.com>, manikulin@gmail.com, > emacs-devel@gnu.org > Date: Fri, 11 Jun 2021 16:56:34 +0300 > > On 10/06/2021 22:23 +0300, Eli Zaretskii wrote: > > > I don't think it's TRT for Emacs to expose locale-dependent features > > that cannot be controlled from Lisp, sorry. We need to find a better > > way. For example, there could be a Lisp variable that specifies the > > group separator character, and then 'format' could use that character > > when the format spec includes %'. Which means we'd need to implement > > that in our own code; patches welcome. > > Maybe an alternative set of specifiers, which output data in > locale-specific format. Then a single variable to let-bound around > format, which instructs what locale to use. Very simple... Sorry, I don't think I understand what you propose. Please elaborate on the "alternative set of specifiers, which output data in locale-specific format". ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC) 2021-06-11 14:10 ` Eli Zaretskii @ 2021-06-11 18:52 ` Filipp Gunbin 2021-06-11 19:34 ` Eli Zaretskii 0 siblings, 1 reply; 34+ messages in thread From: Filipp Gunbin @ 2021-06-11 18:52 UTC (permalink / raw) To: Eli Zaretskii; +Cc: manikulin, boruch_baum, emacs-devel On 11/06/2021 17:10 +0300, Eli Zaretskii wrote: >> From: Filipp Gunbin <fgunbin@fastmail.fm> >> Cc: Boruch Baum <boruch_baum@gmx.com>, manikulin@gmail.com, >> emacs-devel@gnu.org >> Date: Fri, 11 Jun 2021 16:56:34 +0300 >> >> On 10/06/2021 22:23 +0300, Eli Zaretskii wrote: >> >> > I don't think it's TRT for Emacs to expose locale-dependent features >> > that cannot be controlled from Lisp, sorry. We need to find a better >> > way. For example, there could be a Lisp variable that specifies the >> > group separator character, and then 'format' could use that character >> > when the format spec includes %'. Which means we'd need to implement >> > that in our own code; patches welcome. >> >> Maybe an alternative set of specifiers, which output data in >> locale-specific format. Then a single variable to let-bound around >> format, which instructs what locale to use. Very simple... > > Sorry, I don't think I understand what you propose. Please elaborate > on the "alternative set of specifiers, which output data in > locale-specific format". I mean that for every specifier which could be affected by locale (but isn't), there could be additional specifier, which takes locale into account. Less awkward, there could be an explicit modifier which says "use locale for this specifier in format". Something like `O' or `E' modifier in "format-time-string". This way only given format call is affected, without surprises somewhere below in the call stack. Then, a locale to use could be let-bound around this format call, thus overriding the default which came from env vars or from somewhere else. Filipp ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC) 2021-06-11 18:52 ` Filipp Gunbin @ 2021-06-11 19:34 ` Eli Zaretskii 0 siblings, 0 replies; 34+ messages in thread From: Eli Zaretskii @ 2021-06-11 19:34 UTC (permalink / raw) To: Filipp Gunbin; +Cc: manikulin, boruch_baum, emacs-devel > From: Filipp Gunbin <fgunbin@fastmail.fm> > Cc: boruch_baum@gmx.com, manikulin@gmail.com, emacs-devel@gnu.org > Date: Fri, 11 Jun 2021 21:52:57 +0300 > > I mean that for every specifier which could be affected by locale (but > isn't), there could be additional specifier, which takes locale into > account. Less awkward, there could be an explicit modifier which says > "use locale for this specifier in format". Something like `O' or `E' > modifier in "format-time-string". That could work, but if we rely on libc functions for the locale-dependent behavior, it could be slow, because switching a locale could be expensive. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC) 2021-06-10 16:57 ` Eli Zaretskii 2021-06-10 18:01 ` Boruch Baum @ 2021-06-11 16:58 ` Maxim Nikulin 2021-06-11 18:04 ` Eli Zaretskii 1 sibling, 1 reply; 34+ messages in thread From: Maxim Nikulin @ 2021-06-11 16:58 UTC (permalink / raw) To: emacs-devel; +Cc: boruch_baum On 10/06/2021 23:57, Eli Zaretskii wrote: >> From: Maxim Nikulin Date: Thu, 10 Jun 2021 23:28:59 +0700 > > For processing CSV, if there's a need to know whether the > locale uses the comma as a decimal separator, we could > indeed extend locale-info. But such an extension is almost > trivial and doesn't even touch on the significant problems > in the rest of the discussion. > You forgot `setlocale(LC_NUMERIC, "C")', didn't you? #include <langinfo.h> #include <locale.h> #include <stdio.h> int main() { setlocale(LC_ALL, ""); printf("%c", *nl_langinfo(RADIXCHAR)); setlocale(LC_NUMERIC, "C"); printf("%c\n", *nl_langinfo(RADIXCHAR)); return 0; } Output is ",.". There is nl_langinfo_l(3), but it requires more work. After parsing of rows to cells, it may be necessary to parse numbers ("2,34" to 2.34). That is why quality of CSV file import is tightly related to handling of number formats. >> I was trying to support Boruch that buffer-local variables >> may be important part of locale context, more precise than >> global settings, > > They are more precise, but they don't support mixed > languages in the same buffer, something that happens in > Emacs very frequently. In some cases I would prefer to have uniform format of numbers and dates despite alternating language in the buffer, e.g. for my private notes. > Here's a trivial example: > > (insert (downcase (buffer-substring POS1 POS2))) > > Contrast with > > (insert (downcase "FOO")) Either `set-text-properties' should be called on "FOO" before passing it to `downcase' or `locale-downcase' with LOCALE first argument should be added. Moreover, such `locale-downcase' function may be used to implement higher level functions working with implicit locales. LOCALE may assume some hierarchy with user overrides for particular call, text properties, buffer variables, global settings. > Yes: what we have already in Emacs. That covers a lot of > the same Unicode turf that ICU handles, because we import > and use the same Unicode files and tables. There are plenty of xml files in cldr-common-39.0.zip (common/main/*.xml) https://www.unicode.org/Public/cldr/39/ in addition to Unicode data in Emacs sources. They include rules for number formatting https://unicode.org/reports/tr35/tr35-numbers.html Of course, human-style number formatting, currencies, financial style, etc. may be discarded and implementation may be limited to grouping and decimal separators (leaving other features to further requests). There is newlocale(3) function in glibc to obtain minimal subset of properties. I am not familiar with other platforms. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC) 2021-06-11 16:58 ` Maxim Nikulin @ 2021-06-11 18:04 ` Eli Zaretskii 2021-06-14 16:38 ` Maxim Nikulin 0 siblings, 1 reply; 34+ messages in thread From: Eli Zaretskii @ 2021-06-11 18:04 UTC (permalink / raw) To: Maxim Nikulin; +Cc: boruch_baum, emacs-devel > From: Maxim Nikulin <manikulin@gmail.com> > Date: Fri, 11 Jun 2021 23:58:24 +0700 > Cc: boruch_baum@gmx.com > > On 10/06/2021 23:57, Eli Zaretskii wrote: > >> From: Maxim Nikulin Date: Thu, 10 Jun 2021 23:28:59 +0700 > > > > For processing CSV, if there's a need to know whether the > > locale uses the comma as a decimal separator, we could > > indeed extend locale-info. But such an extension is almost > > trivial and doesn't even touch on the significant problems > > in the rest of the discussion. > > > > You forgot `setlocale(LC_NUMERIC, "C")', didn't you? No, I didn't. Adding a call to setlocale to locale-info, even if we want to add an argument for the caller to control the locale, is trivial. > > Here's a trivial example: > > > > (insert (downcase (buffer-substring POS1 POS2))) > > > > Contrast with > > > > (insert (downcase "FOO")) > > Either `set-text-properties' should be called on "FOO" before passing it > to `downcase' Which property will help here? we don't have such properties. they need to be designed and implemented. > or `locale-downcase' with LOCALE first argument should be > added. How would you implement locale-downcase? Are you familiar with how Emacs case tables work? And even if we had locale-downcase, which locale would you pass to it in any given use case? Please note that I'm not saying these issues cannot be solved -- they can. I'm saying that designing them requires non-trivial thought, something we didn't yet do. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC) 2021-06-11 18:04 ` Eli Zaretskii @ 2021-06-14 16:38 ` Maxim Nikulin 2021-06-14 17:19 ` Eli Zaretskii 0 siblings, 1 reply; 34+ messages in thread From: Maxim Nikulin @ 2021-06-14 16:38 UTC (permalink / raw) To: emacs-devel On 12/06/2021 01:04, Eli Zaretskii wrote: >> From: Maxim Nikulin Date: Fri, 11 Jun 2021 23:58:24 +0700 >> On 10/06/2021 23:57, Eli Zaretskii wrote: >> >> From: Maxim Nikulin Date: Thu, 10 Jun 2021 23:28:59 +0700 >> > >> > For processing CSV, if there's a need to know whether the >> > locale uses the comma as a decimal separator, we could >> > indeed extend locale-info. But such an extension is almost >> > trivial and doesn't even touch on the significant problems >> > in the rest of the discussion. >> >> You forgot `setlocale(LC_NUMERIC, "C")', didn't you? > > No, I didn't. Adding a call to setlocale to locale-info, even if we > want to add an argument for the caller to control the locale, is > trivial. I would avoid such manipulations and the reason is not efficiency of particular implementation. Locale is not thread local, so changing it in *getter* is a source rare but really obscure hardly reproducible problems. I do not like such output 1234.567890 1234,567890 1234.567890 of the following program changing locale in a parallel thread #include <locale.h> #include <pthread.h> #include <stdio.h> #include <time.h> #define DELAY_NS 40000000 void* other_thread(void *arg) { struct timespec delay = { 0, DELAY_NS/2 }; nanosleep(&delay, NULL); printf("%f\n", 1234.56789); delay.tv_nsec = DELAY_NS; nanosleep(&delay, NULL); printf("%f\n", 1234.56789); nanosleep(&delay, NULL); printf("%f\n", 1234.56789); return NULL; } int main() { setlocale(LC_NUMERIC, "C"); pthread_t thread_id; pthread_create(&thread_id, NULL, &other_thread, NULL); struct timespec delay = { 0, DELAY_NS }; nanosleep(&delay, NULL); setlocale(LC_NUMERIC, ""); nanosleep(&delay, NULL); setlocale(LC_NUMERIC, "C"); void *res; pthread_join(thread_id, &res); return 0; } Explicit locale objects decoupled from application-wide global preferences are safer and more flexible. >> > Here's a trivial example: >> > >> > (insert (downcase (buffer-substring POS1 POS2))) >> > >> > Contrast with >> > >> > (insert (downcase "FOO")) >> >> Either `set-text-properties' should be called on "FOO" before passing it >> to `downcase' > > Which property will help here? we don't have such properties. they > need to be designed and implemented. Let's name it "locale". Its value is some object that represents either a "solid" locale such as de_DE or combined LC_NUMERIC=en_GB + LC_TIME=de_DE + default fr_FR. Data required for particular operations may be loaded on demand. >> or `locale-downcase' with LOCALE first argument should be >> added. > > How would you implement locale-downcase? Are you familiar with how > Emacs case tables work? No, I am not familiar with Emacs internals dealing with case conversion. I already wrote I am even unaware how to properly handle Turkish. For the scripts I am familiar with, it is enough to have default table for normalizing and conversion. I can admit that sometimes conversion may depend on language and the language can not be determined from code point. In such cases I expect additional override table that has higher priority than the default one. > And even if we had locale-downcase, which locale would you > pass to it in any given use case? I already mentioned responsibility chain: explicit value or set of overrides passed by user, text property for particular span of characters, buffer-local variables, global environment variables. Locale may be instantiated from its name "it_IT". Convenience functions to obtain locale at point likely will be useful as well. (Actually I am assuming number parsing-formatting rather than case conversion.) ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC) 2021-06-14 16:38 ` Maxim Nikulin @ 2021-06-14 17:19 ` Eli Zaretskii 2021-06-16 17:27 ` Maxim Nikulin 0 siblings, 1 reply; 34+ messages in thread From: Eli Zaretskii @ 2021-06-14 17:19 UTC (permalink / raw) To: Maxim Nikulin; +Cc: emacs-devel > From: Maxim Nikulin <manikulin@gmail.com> > Date: Mon, 14 Jun 2021 23:38:19 +0700 > > >> You forgot `setlocale(LC_NUMERIC, "C")', didn't you? > > > > No, I didn't. Adding a call to setlocale to locale-info, even if we > > want to add an argument for the caller to control the locale, is > > trivial. > > I would avoid such manipulations and the reason is not efficiency of > particular implementation. But we already do that in locale-info, for locale categories other than LC_NUMERIC. > >> > Here's a trivial example: > >> > > >> > (insert (downcase (buffer-substring POS1 POS2))) > >> > > >> > Contrast with > >> > > >> > (insert (downcase "FOO")) > >> > >> Either `set-text-properties' should be called on "FOO" before passing it > >> to `downcase' > > > > Which property will help here? we don't have such properties. they > > need to be designed and implemented. > Let's name it "locale". Its value is some object that represents either > a "solid" locale such as de_DE or combined LC_NUMERIC=en_GB + > LC_TIME=de_DE + default fr_FR. Data required for particular operations > may be loaded on demand. How do you associate such an object with text of a buffer or a string such that different parts of the text could have different "locales" (as required for a multi-lingual editor such as Emacs)? > > How would you implement locale-downcase? Are you familiar with how > > Emacs case tables work? > > No, I am not familiar with Emacs internals dealing with case conversion. > I already wrote I am even unaware how to properly handle Turkish. For > the scripts I am familiar with, it is enough to have default table for > normalizing and conversion. I can admit that sometimes conversion may > depend on language and the language can not be determined from code > point. In such cases I expect additional override table that has higher > priority than the default one. > > > And even if we had locale-downcase, which locale would you > > pass to it in any given use case? > > I already mentioned responsibility chain: explicit value or set of > overrides passed by user, text property for particular span of > characters, buffer-local variables, global environment variables. Locale > may be instantiated from its name "it_IT". Convenience functions to > obtain locale at point likely will be useful as well. (Actually I am > assuming number parsing-formatting rather than case conversion.) What you describe doesn't exist, not even in its design stage. We are back where we started: I said at the very beginning that this infrastructure is missing. It is futile to discuss solutions which rely on infrastructure that doesn't exist. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC) 2021-06-14 17:19 ` Eli Zaretskii @ 2021-06-16 17:27 ` Maxim Nikulin 2021-06-16 17:36 ` Eli Zaretskii 0 siblings, 1 reply; 34+ messages in thread From: Maxim Nikulin @ 2021-06-16 17:27 UTC (permalink / raw) To: emacs-devel On 15/06/2021 00:19, Eli Zaretskii wrote: >> From: Maxim Nikulin Date: Mon, 14 Jun 2021 23:38:19 +0700 >>>> You forgot `setlocale(LC_NUMERIC, "C")', didn't you? >>> >>> No, I didn't. Adding a call to setlocale to locale-info, even if we >>> want to add an argument for the caller to control the locale, is >>> trivial. >> >> I would avoid such manipulations and the reason is not efficiency of >> particular implementation. > > But we already do that in locale-info, for locale categories other > than LC_NUMERIC. I have seen it call for collation. It may be reasonable in past (e.g. as quick plumbing), but I thunk such things should be avoided for the sake of thread safety. Moreover, you are crying that implementations other than glibc are inefficient. Proper instruments for concurrency and parallel execution may alleviate issues like the following: https://lists.gnu.org/archive/html/emacs-devel/2021-05/msg01297.html > I hear quite a few people run at least two instances of > Emacs, for example if they don't want Gnus fetching new > articles and email to freeze the interactive session for > prolonged times. >>> Which property will help here? we don't have such properties. they >>> need to be designed and implemented. >> Let's name it "locale". Its value is some object that represents either >> a "solid" locale such as de_DE or combined LC_NUMERIC=en_GB + >> LC_TIME=de_DE + default fr_FR. Data required for particular operations >> may be loaded on demand. > > How do you associate such an object with text of a buffer or a string > such that different parts of the text could have different "locales" > (as required for a multi-lingual editor such as Emacs)? I already suggested some variants and you did not argue. Technically it can be done through `set-text-properties'. If there are no such text properties than it may be assumed that no fine grain tuning is requires, so buffer-local variables or global environment are used. Language may be guessed from code points of characters. Particular modes may either inhibit localization for program code or extract necessary information from HTML lang attributes, arguments of LaTeX \foreignlanguage macro, etc. In my opinion, Emacs is not really multi-lingual yet due to limitations and inconveniences. Some other software demonstrated significantly greater progress during last decade. Maybe achieving current level was so painful that you are prefer to avoid touching of related code for any reason, not to speak of various improvements. > > And even if we had locale-downcase, which locale would you > > pass to it in any given use case? > > I already mentioned responsibility chain: explicit value or set of > overrides passed by user, text property for particular span of > characters, buffer-local variables, global environment variables. Locale > may be instantiated from its name "it_IT". Convenience functions to > obtain locale at point likely will be useful as well. (Actually I am > assuming number parsing-formatting rather than case conversion.) I am aware that such features do not exist yet. Only libc is available, but we consider it as inappropriate (you due to performance issues, me due to thread safety and possible bugs due to missed calls restoring old state). You are against using of CLDR detailed info for locales through ICU due to alternative implementation of Unicode character tables (another part of ICU) already exists in Emacs. At the same time you are refusing any attempts to discuss possible extensions from any side: low level base functions taking locale as explicit argument or high level requirements what interface can be useful to "implicitly" derive locale of particular part of text (actually text prepared for intelligent handling of locales). Certainly with position "locale-aware formatting can not be implemented because Emacs has no necessary infrastructure and such feature is needed by only a handful of user" there is no way to improve anything. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC) 2021-06-16 17:27 ` Maxim Nikulin @ 2021-06-16 17:36 ` Eli Zaretskii 0 siblings, 0 replies; 34+ messages in thread From: Eli Zaretskii @ 2021-06-16 17:36 UTC (permalink / raw) To: Maxim Nikulin; +Cc: emacs-devel > From: Maxim Nikulin <manikulin@gmail.com> > Date: Thu, 17 Jun 2021 00:27:49 +0700 > > Certainly with position "locale-aware formatting can not be implemented > because Emacs has no necessary infrastructure and such feature is needed > by only a handful of user" there is no way to improve anything. Please see how many changes I committed over the years to Emacs, some of them quite revolutionary (bidirectional editing comes to mind), and I'm sure you will realize that the above is a gross misunderstanding of what I meant. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC) 2021-06-10 16:28 ` Maxim Nikulin 2021-06-10 16:57 ` Eli Zaretskii @ 2021-06-10 21:10 ` Stefan Monnier 2021-06-12 14:41 ` Maxim Nikulin 1 sibling, 1 reply; 34+ messages in thread From: Stefan Monnier @ 2021-06-10 21:10 UTC (permalink / raw) To: Maxim Nikulin; +Cc: emacs-devel, boruch_baum > There are plenty of CSV dialects. If decimal separator is "," then office > software uses ";" instead of comma as cell (field) separator. But there's no reason to presume that a given CSV file was generated in the same locale as the one we're currently using. So the locale could be one ingredient in the machinery used to guess which separator was used, but I'm not sure it would be of much help. [ BTW, I'll take the opportunity to advocate for the use of TSV instead, which is slightly less ill-defined. ] Stefan ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CSV parsing and other issues (Re: LC_NUMERIC) 2021-06-10 21:10 ` Stefan Monnier @ 2021-06-12 14:41 ` Maxim Nikulin 0 siblings, 0 replies; 34+ messages in thread From: Maxim Nikulin @ 2021-06-12 14:41 UTC (permalink / raw) To: emacs-devel On 11/06/2021 04:10, Stefan Monnier wrote: >> There are plenty of CSV dialects. If decimal separator is >> "," then office software uses ";" instead of comma as cell >> (field) separator. > > But there's no reason to presume that a given CSV file was > generated in the same locale as the one we're currently > using. > > So the locale could be one ingredient in the machinery used > to guess which separator was used, but I'm not sure it would > be of much help. You are right. My expectation is still that ";" is mostly used for locales with comma as decimal separator, and in such cases it must be tried with higher priority due to records that have enough amount of both characters. 1,2;3,45;56,789 Originally the question raised exactly in the context of attempt to improve guessing of separator: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=47885 The patches have however other problems. Advanced options for table import are likely more suitable e.g. for csv-mode and may become unnecessary burden in org-mode (especially if kill-yank would work well in both directions). Certainly users should have opportunity to explicitly specify the dialect of the files they are going to import. > [ BTW, I'll take the opportunity to advocate for the use of > TSV instead, which is slightly less ill-defined. ] In real world one often does have full control of file formats he has to deal with. In simple cases I can use space separated columns of numbers having fixed width. On the other hand downloaded bank statements are namely CSV with ";" as delimiter and in legacy windows 8-bit encoding (and such files have a kind of header with varying column number distinct from the following table). So ability to get decimal separator for current locale may slightly improve user experience with import of CSV files at least in Org mode. However it is just an aspect of support of locale-aware number formats in Emacs. ^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2021-06-16 17:36 UTC | newest] Thread overview: 34+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-06-02 18:54 LC_NUMERIC formatting [FEATURE REQUEST] Boruch Baum 2021-06-03 14:44 ` CSV parsing and other issues (Re: LC_NUMERIC) Maxim Nikulin 2021-06-03 15:01 ` Eli Zaretskii 2021-06-04 16:31 ` Maxim Nikulin 2021-06-04 19:17 ` Eli Zaretskii -- strict thread matches above, loose matches on Subject: below -- 2021-06-06 23:36 Boruch Baum 2021-06-07 12:28 ` Eli Zaretskii 2021-06-08 0:45 ` Boruch Baum 2021-06-08 2:35 ` Eli Zaretskii 2021-06-08 15:35 ` Stefan Monnier 2021-06-08 16:35 ` Maxim Nikulin 2021-06-08 18:52 ` Eli Zaretskii 2021-06-10 16:28 ` Maxim Nikulin 2021-06-10 16:57 ` Eli Zaretskii 2021-06-10 18:01 ` Boruch Baum 2021-06-10 18:50 ` Eli Zaretskii 2021-06-10 19:04 ` Boruch Baum 2021-06-10 19:23 ` Eli Zaretskii 2021-06-10 20:20 ` Boruch Baum 2021-06-11 6:19 ` Eli Zaretskii 2021-06-11 8:18 ` Boruch Baum 2021-06-11 16:51 ` Maxim Nikulin 2021-06-11 13:56 ` Filipp Gunbin 2021-06-11 14:10 ` Eli Zaretskii 2021-06-11 18:52 ` Filipp Gunbin 2021-06-11 19:34 ` Eli Zaretskii 2021-06-11 16:58 ` Maxim Nikulin 2021-06-11 18:04 ` Eli Zaretskii 2021-06-14 16:38 ` Maxim Nikulin 2021-06-14 17:19 ` Eli Zaretskii 2021-06-16 17:27 ` Maxim Nikulin 2021-06-16 17:36 ` Eli Zaretskii 2021-06-10 21:10 ` Stefan Monnier 2021-06-12 14:41 ` Maxim Nikulin
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/emacs.git https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.