From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Maxim Nikulin Newsgroups: gmane.emacs.devel Subject: Re: CSV parsing and other issues (Re: LC_NUMERIC) Date: Fri, 4 Jun 2021 23:31:13 +0700 Message-ID: References: <20210602185441.nhvhirdffamahgfy@E15-2016.optimum.net> <921965d7-af86-6d2e-8b48-3d0b9b51998e@gmail.com> <83eedjvvps.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="29285"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 Cc: utkarsh190601@gmail.com To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Fri Jun 04 19:01:28 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1lpDCK-0007TD-AY for ged-emacs-devel@m.gmane-mx.org; Fri, 04 Jun 2021 19:01:28 +0200 Original-Received: from localhost ([::1]:42996 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lpDCJ-0006r0-8b for ged-emacs-devel@m.gmane-mx.org; Fri, 04 Jun 2021 13:01:27 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:60738) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lpCjH-00055u-Sl for emacs-devel@gnu.org; Fri, 04 Jun 2021 12:31:29 -0400 Original-Received: from ciao.gmane.io ([116.202.254.214]:58252) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lpCjF-0001e2-2N for emacs-devel@gnu.org; Fri, 04 Jun 2021 12:31:27 -0400 Original-Received: from list by ciao.gmane.io with local (Exim 4.92) (envelope-from ) id 1lpCjB-0009fD-3t for emacs-devel@gnu.org; Fri, 04 Jun 2021 18:31:21 +0200 X-Injected-Via-Gmane: http://gmane.org/ In-Reply-To: <83eedjvvps.fsf@gnu.org> Content-Language: en-US Received-SPF: pass client-ip=116.202.254.214; envelope-from=ged-emacs-devel@m.gmane-mx.org; helo=ciao.gmane.io X-Spam_score_int: 0 X-Spam_score: -0.1 X-Spam_bar: / X-Spam_report: (-0.1 / 5.0 requ) BAYES_00=-1.9, DKIM_ADSP_CUSTOM_MED=0.001, FORGED_GMAIL_RCVD=1, FREEMAIL_FORGED_FROMDOMAIN=0.249, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.249, NICE_REPLY_A=-0.59, NML_ADSP_CUSTOM_MED=0.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:270391 Archived-At: On 03/06/2021 22:01, Eli Zaretskii wrote: >> From: Maxim Nikulin >> Date: Thu, 3 Jun 2021 21:44:08 +0700 >> >> So locale-aware number formatting would be a great improvement for >> Emacs. On the other hand, it should be implemented with great care to >> avoid localized numbers in some cases. Maybe locale argument should be >> passed to functions that deal with numbers. Formatting of integer >> numbers is not enough, floating point numbers should be handled as well. >> Parsing numbers formatted accordingly to locale rules should be >> addressed too. A function similar to `locale-info' is highly desired to >> get properties of locale (e.g. decimal_point from result of localeconv). >> Some decision is required whether calc & Co should operate with >> localized numbers. > > Setting a locale globally in Emacs is a non-starter, for the reasons > that you point out and others. Text processing in Emacs is generally > separate from the current locale's rules, mainly to have Emacs work > the same in any locale. So passing a locale argument to functions > that produce output, with the intent to request some behavior to be > tailored to that locale, is the only reasonable way to have this kind > of functionalities in Emacs. The problem with that, of course, is > that not every supported platform can dynamically change the locale, > let alone do that efficiently. I do not think it is efficient to require from users to fight with number formatting themselves. Some links from my browser history when I was trying to figure out how to get locale-specific decimal separator in elisp: https://stackoverflow.com/questions/35661173/how-to-format-table-fields-as-currency-in-org-mode https://www.emacswiki.org/emacs/AddCommasToNumbers https://www.reddit.com/r/emacs/comments/61mhyx/creating_a_function_to_add_commasseparators_to/ Do you mean that it is necessary to create new implementation of number formatter specially for Emacs? Something like https://unicode.org/reports/tr35/tr35-numbers.html Unicode Locale Data Markup Language (LDML) Part 3: Numbers Actually it is an almost random link. I do not know which source is currently considered as the best collection of wisdom related to number formatting. Outside of Emacs world, when I needed numbers formatted accordingly to various locales previous time, I was lucky enough to use code similar to the following one and did not care concerning details: #include #include #include void test(QTextStream& stream, const char *loc_name) { QLocale loc(QString::fromLocal8Bit(loc_name)); stream << "point: " << loc.decimalPoint() << " " << loc.toString(12345.67) << " " << loc.toString(1234567890) << "\n"; } int main(int argc, char *argv[]) { QTextStream stream(stdout); for (int i = 1; i < argc; ++i) { test(stream, argv[i]); } return 0; } ./qtloc de_DE en_GB fa_IR point: , 12.345,7 1.234.567.890 point: . 12,345.7 1,234,567,890 point: ٫ ۱۲٬۳۴۵٫۷ ۱٬۲۳۴٬۵۶۷٬۸۹۰ Surprisingly it works even despite I have not generated de and fa locales. On linux I see that Emacs is linked with ICU ldd /usr/bin/emacs | grep -i icu libicuuc.so.66 => /usr/lib/x86_64-linux-gnu/libicuuc.so.66 (0x00007f457c799000) libicudata.so.66 => /usr/lib/x86_64-linux-gnu/libicudata.so.66 (0x00007f457a61c000) I am not familiar with ICU API but I expect that it may be utilized https://github.com/unicode-org/icu/blob/main/icu4c/source/samples/numfmt/capi.c Do you have a bright idea concerning implementation of parser-formatter for numbers with reasonable efforts?