From: Maxim Nikulin <manikulin@gmail.com>
To: emacs-devel@gnu.org
Cc: boruch_baum@gmx.com
Subject: Re: CSV parsing and other issues (Re: LC_NUMERIC)
Date: Fri, 11 Jun 2021 23:58:24 +0700 [thread overview]
Message-ID: <ce36cd3b-dd9b-9219-afbe-84a9bdd8f2b8@gmail.com> (raw)
In-Reply-To: <83lf7hbqte.fsf@gnu.org>
On 10/06/2021 23:57, Eli Zaretskii wrote:
>> From: Maxim Nikulin Date: Thu, 10 Jun 2021 23:28:59 +0700
>
> For processing CSV, if there's a need to know whether the
> locale uses the comma as a decimal separator, we could
> indeed extend locale-info. But such an extension is almost
> trivial and doesn't even touch on the significant problems
> in the rest of the discussion.
>
You forgot `setlocale(LC_NUMERIC, "C")', didn't you?
#include <langinfo.h>
#include <locale.h>
#include <stdio.h>
int main() {
setlocale(LC_ALL, "");
printf("%c", *nl_langinfo(RADIXCHAR));
setlocale(LC_NUMERIC, "C");
printf("%c\n", *nl_langinfo(RADIXCHAR));
return 0;
}
Output is ",.". There is nl_langinfo_l(3), but it requires more work.
After parsing of rows to cells, it may be necessary to parse numbers
("2,34" to 2.34). That is why quality of CSV file import is tightly
related to handling of number formats.
>> I was trying to support Boruch that buffer-local variables
>> may be important part of locale context, more precise than
>> global settings,
>
> They are more precise, but they don't support mixed
> languages in the same buffer, something that happens in
> Emacs very frequently.
In some cases I would prefer to have uniform format of numbers and dates
despite alternating language in the buffer, e.g. for my private notes.
> Here's a trivial example:
>
> (insert (downcase (buffer-substring POS1 POS2)))
>
> Contrast with
>
> (insert (downcase "FOO"))
Either `set-text-properties' should be called on "FOO" before passing it
to `downcase' or `locale-downcase' with LOCALE first argument should be
added. Moreover, such `locale-downcase' function may be used to
implement higher level functions working with implicit locales. LOCALE
may assume some hierarchy with user overrides for particular call, text
properties, buffer variables, global settings.
> Yes: what we have already in Emacs. That covers a lot of
> the same Unicode turf that ICU handles, because we import
> and use the same Unicode files and tables.
There are plenty of xml files in cldr-common-39.0.zip
(common/main/*.xml) https://www.unicode.org/Public/cldr/39/ in addition
to Unicode data in Emacs sources. They include rules for number
formatting https://unicode.org/reports/tr35/tr35-numbers.html
Of course, human-style number formatting, currencies, financial style,
etc. may be discarded and implementation may be limited to grouping and
decimal separators (leaving other features to further requests). There
is newlocale(3) function in glibc to obtain minimal subset of
properties. I am not familiar with other platforms.
next prev parent reply other threads:[~2021-06-11 16:58 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-06 23:36 CSV parsing and other issues (Re: LC_NUMERIC) Boruch Baum
2021-06-07 12:28 ` Eli Zaretskii
2021-06-08 0:45 ` Boruch Baum
2021-06-08 2:35 ` Eli Zaretskii
2021-06-08 15:35 ` Stefan Monnier
2021-06-08 16:35 ` Maxim Nikulin
2021-06-08 18:52 ` Eli Zaretskii
2021-06-10 16:28 ` Maxim Nikulin
2021-06-10 16:57 ` Eli Zaretskii
2021-06-10 18:01 ` Boruch Baum
2021-06-10 18:50 ` Eli Zaretskii
2021-06-10 19:04 ` Boruch Baum
2021-06-10 19:23 ` Eli Zaretskii
2021-06-10 20:20 ` Boruch Baum
2021-06-11 6:19 ` Eli Zaretskii
2021-06-11 8:18 ` Boruch Baum
2021-06-11 16:51 ` Maxim Nikulin
2021-06-11 13:56 ` Filipp Gunbin
2021-06-11 14:10 ` Eli Zaretskii
2021-06-11 18:52 ` Filipp Gunbin
2021-06-11 19:34 ` Eli Zaretskii
2021-06-11 16:58 ` Maxim Nikulin [this message]
2021-06-11 18:04 ` Eli Zaretskii
2021-06-14 16:38 ` Maxim Nikulin
2021-06-14 17:19 ` Eli Zaretskii
2021-06-16 17:27 ` Maxim Nikulin
2021-06-16 17:36 ` Eli Zaretskii
2021-06-10 21:10 ` Stefan Monnier
2021-06-12 14:41 ` Maxim Nikulin
-- strict thread matches above, loose matches on Subject: below --
2021-06-02 18:54 LC_NUMERIC formatting [FEATURE REQUEST] Boruch Baum
2021-06-03 14:44 ` CSV parsing and other issues (Re: LC_NUMERIC) Maxim Nikulin
2021-06-03 15:01 ` Eli Zaretskii
2021-06-04 16:31 ` Maxim Nikulin
2021-06-04 19:17 ` Eli Zaretskii
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ce36cd3b-dd9b-9219-afbe-84a9bdd8f2b8@gmail.com \
--to=manikulin@gmail.com \
--cc=boruch_baum@gmx.com \
--cc=emacs-devel@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).