unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Maxim Nikulin <manikulin@gmail.com>
To: emacs-devel@gnu.org
Cc: boruch_baum@gmx.com
Subject: Re: CSV parsing and other issues (Re: LC_NUMERIC)
Date: Fri, 11 Jun 2021 23:58:24 +0700	[thread overview]
Message-ID: <ce36cd3b-dd9b-9219-afbe-84a9bdd8f2b8@gmail.com> (raw)
In-Reply-To: <83lf7hbqte.fsf@gnu.org>

On 10/06/2021 23:57, Eli Zaretskii wrote:
 >> From: Maxim Nikulin Date: Thu, 10 Jun 2021 23:28:59 +0700
 >
 > For processing CSV, if there's a need to know whether the
 > locale uses the comma as a decimal separator, we could
 > indeed extend locale-info.  But such an extension is almost
 > trivial and doesn't even touch on the significant problems
 > in the rest of the discussion.
 >

You forgot `setlocale(LC_NUMERIC, "C")', didn't you?

#include <langinfo.h>
#include <locale.h>
#include <stdio.h>

int main() {
	setlocale(LC_ALL, "");
	printf("%c", *nl_langinfo(RADIXCHAR));
	setlocale(LC_NUMERIC, "C");
	printf("%c\n", *nl_langinfo(RADIXCHAR));
	return 0;
}

Output is ",.". There is nl_langinfo_l(3), but it requires more work.

After parsing of rows to cells, it may be necessary to parse numbers 
("2,34" to 2.34). That is why quality of CSV file import is tightly 
related to handling of number formats.

 >> I was trying to support Boruch that buffer-local variables
 >> may be important part of locale context, more precise than
 >> global settings,
 >
 > They are more precise, but they don't support mixed
 > languages in the same buffer, something that happens in
 > Emacs very frequently.

In some cases I would prefer to have uniform format of numbers and dates
despite alternating language in the buffer, e.g. for my private notes.

 > Here's a trivial example:
 >
 >     (insert (downcase (buffer-substring POS1 POS2)))
 >
 > Contrast with
 >
 >     (insert (downcase "FOO"))

Either `set-text-properties' should be called on "FOO" before passing it 
to `downcase' or `locale-downcase' with LOCALE first argument should be 
added. Moreover, such `locale-downcase' function may be used to 
implement higher level functions working with implicit locales.  LOCALE 
may assume some hierarchy with user overrides for particular call, text 
properties, buffer variables, global settings.

 > Yes: what we have already in Emacs.  That covers a lot of
 > the same Unicode turf that ICU handles, because we import
 > and use the same Unicode files and tables.

There are plenty of xml files in cldr-common-39.0.zip 
(common/main/*.xml) https://www.unicode.org/Public/cldr/39/ in addition 
to Unicode data in Emacs sources.  They include rules for number 
formatting https://unicode.org/reports/tr35/tr35-numbers.html
Of course, human-style number formatting, currencies, financial style, 
etc. may be discarded and implementation may be limited to grouping and 
decimal separators (leaving other features to further requests).  There 
is newlocale(3) function in glibc to obtain minimal subset of 
properties. I am not familiar with other platforms.





  parent reply	other threads:[~2021-06-11 16:58 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-06 23:36 CSV parsing and other issues (Re: LC_NUMERIC) Boruch Baum
2021-06-07 12:28 ` Eli Zaretskii
2021-06-08  0:45   ` Boruch Baum
2021-06-08  2:35     ` Eli Zaretskii
2021-06-08 15:35       ` Stefan Monnier
2021-06-08 16:35       ` Maxim Nikulin
2021-06-08 18:52         ` Eli Zaretskii
2021-06-10 16:28           ` Maxim Nikulin
2021-06-10 16:57             ` Eli Zaretskii
2021-06-10 18:01               ` Boruch Baum
2021-06-10 18:50                 ` Eli Zaretskii
2021-06-10 19:04                   ` Boruch Baum
2021-06-10 19:23                     ` Eli Zaretskii
2021-06-10 20:20                       ` Boruch Baum
2021-06-11  6:19                         ` Eli Zaretskii
2021-06-11  8:18                           ` Boruch Baum
2021-06-11 16:51                           ` Maxim Nikulin
2021-06-11 13:56                       ` Filipp Gunbin
2021-06-11 14:10                         ` Eli Zaretskii
2021-06-11 18:52                           ` Filipp Gunbin
2021-06-11 19:34                             ` Eli Zaretskii
2021-06-11 16:58               ` Maxim Nikulin [this message]
2021-06-11 18:04                 ` Eli Zaretskii
2021-06-14 16:38                   ` Maxim Nikulin
2021-06-14 17:19                     ` Eli Zaretskii
2021-06-16 17:27                       ` Maxim Nikulin
2021-06-16 17:36                         ` Eli Zaretskii
2021-06-10 21:10             ` Stefan Monnier
2021-06-12 14:41               ` Maxim Nikulin
  -- strict thread matches above, loose matches on Subject: below --
2021-06-02 18:54 LC_NUMERIC formatting [FEATURE REQUEST] Boruch Baum
2021-06-03 14:44 ` CSV parsing and other issues (Re: LC_NUMERIC) Maxim Nikulin
2021-06-03 15:01   ` Eli Zaretskii
2021-06-04 16:31     ` Maxim Nikulin
2021-06-04 19:17       ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ce36cd3b-dd9b-9219-afbe-84a9bdd8f2b8@gmail.com \
    --to=manikulin@gmail.com \
    --cc=boruch_baum@gmx.com \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).