From: Maxim Nikulin <manikulin@gmail.com>
To: emacs-devel@gnu.org
Cc: Utkarsh Singh <utkarsh190601@gmail.com>
Subject: CSV parsing and other issues (Re: LC_NUMERIC)
Date: Thu, 3 Jun 2021 21:44:08 +0700 [thread overview]
Message-ID: <921965d7-af86-6d2e-8b48-3d0b9b51998e@gmail.com> (raw)
In-Reply-To: <20210602185441.nhvhirdffamahgfy@E15-2016.optimum.net>
On 03/06/2021 01:54, Boruch Baum wrote:
> Please consider having the elisp 'format' function adopt the
> single-quote and 'I' flags. Each is already implemented in both the GNU
> C printf command and the linux printf command. The single-quote flag is
> part of the 'Single UNIX Specification' and the 'I' flag has been part
> of glibc since version 2.2 [ref: man(3) printf].
>
> If function 'format' uses 'printf' as its backend, this would seem to be
> a matter of exposing an existing feature.
I do not know the story why Emacs does not support locale-aware number
formats, but I suspect that relying on libc is opening a can of worms.
Once setlocale(LC_NUMERIC, "") is invoked, one is never sure if printf-
and scanf-like functions deal with default "C" representation or with
formatted accordingly to current locale numbers. Some numbers related to
communication protocols must be always formatted using "C" locale. I do
not remember if it happened with XFree86 or with Xorg, but at certain
moment users experienced problems. X11 could not start at all due to
invalid configs. The source of problem was "," as decimal separator in
some locales and wrong expectations concerning numbers in config files.
Recently I found the following fixup_locale function:
http://git.savannah.gnu.org/cgit/emacs.git/tree/src/emacs.c#n2861
setlocale (LC_NUMERIC, "C");
I was surprised that impossible to determine current decimal separator
from elisp. At the same time e.g. `string-collate-lessp' has LOCALE
argument.
A month ago some patches were submitted to Org mode with intention to
improve import of tables, see https://debbugs.gnu.org/47885 A part of
discussion is missed in the bug tracker:
https://lists.gnu.org/archive/html/emacs-orgmode/2021-05/msg00693.html
Org mode has a piece of code that tries to guess if the file has commas
or tabs as field separator (CSV or TSV format). The suggested change
adds e.g. semicolon. (Sidenote: probably csv-mode is a better place than
org-mode for such code.)
The problem is that office software uses semicolon for locales where
comma serves as decimal separator for floating-point numbers (e.g.
de_DE, es_ES, fr_FR, ru_RU, etc.):
A;1,2;3,4
So semicolon should be tried with higher priority than comma if in
current locale numbers are represented as e.g 1,2. Unfortunately the
only way to get such information from Emacs is to call some external
application. Maintaining own mapping of locale to separator is
unnecessary burden.
Besides office software, there are some equipment that always use "C"
number formatting, so a user can have a mix of files with various
dialects of CSV. Thus locale info is not enough, some heuristics is
required anyway.
More subtle questions rise on the next step. Org allows to perform
calculations on table cells (and there is calc). Should numbers be
converted to "C" locale representation during import? Should conversion
happen when passing cell content as argument and the result converted
back to current locale? I anticipate that buffer-local setting will be
requested. There was even discussion of mixed-language documents in
emacs-orgmode mail list, however numbers were not mentioned.
So locale-aware number formatting would be a great improvement for
Emacs. On the other hand, it should be implemented with great care to
avoid localized numbers in some cases. Maybe locale argument should be
passed to functions that deal with numbers. Formatting of integer
numbers is not enough, floating point numbers should be handled as well.
Parsing numbers formatted accordingly to locale rules should be
addressed too. A function similar to `locale-info' is highly desired to
get properties of locale (e.g. decimal_point from result of localeconv).
Some decision is required whether calc & Co should operate with
localized numbers.
next prev parent reply other threads:[~2021-06-03 14:44 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-02 18:54 LC_NUMERIC formatting [FEATURE REQUEST] Boruch Baum
2021-06-03 14:44 ` Maxim Nikulin [this message]
2021-06-03 15:01 ` CSV parsing and other issues (Re: LC_NUMERIC) Eli Zaretskii
2021-06-04 16:31 ` Maxim Nikulin
2021-06-04 19:17 ` Eli Zaretskii
-- strict thread matches above, loose matches on Subject: below --
2021-06-06 23:36 Boruch Baum
2021-06-07 12:28 ` Eli Zaretskii
2021-06-08 0:45 ` Boruch Baum
2021-06-08 2:35 ` Eli Zaretskii
2021-06-08 15:35 ` Stefan Monnier
2021-06-08 16:35 ` Maxim Nikulin
2021-06-08 18:52 ` Eli Zaretskii
2021-06-10 16:28 ` Maxim Nikulin
2021-06-10 16:57 ` Eli Zaretskii
2021-06-10 18:01 ` Boruch Baum
2021-06-10 18:50 ` Eli Zaretskii
2021-06-10 19:04 ` Boruch Baum
2021-06-10 19:23 ` Eli Zaretskii
2021-06-10 20:20 ` Boruch Baum
2021-06-11 6:19 ` Eli Zaretskii
2021-06-11 8:18 ` Boruch Baum
2021-06-11 16:51 ` Maxim Nikulin
2021-06-11 13:56 ` Filipp Gunbin
2021-06-11 14:10 ` Eli Zaretskii
2021-06-11 18:52 ` Filipp Gunbin
2021-06-11 19:34 ` Eli Zaretskii
2021-06-11 16:58 ` Maxim Nikulin
2021-06-11 18:04 ` Eli Zaretskii
2021-06-14 16:38 ` Maxim Nikulin
2021-06-14 17:19 ` Eli Zaretskii
2021-06-16 17:27 ` Maxim Nikulin
2021-06-16 17:36 ` Eli Zaretskii
2021-06-10 21:10 ` Stefan Monnier
2021-06-12 14:41 ` Maxim Nikulin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=921965d7-af86-6d2e-8b48-3d0b9b51998e@gmail.com \
--to=manikulin@gmail.com \
--cc=emacs-devel@gnu.org \
--cc=utkarsh190601@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.