From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Maxim Nikulin Newsgroups: gmane.emacs.devel Subject: Re: CSV parsing and other issues (Re: LC_NUMERIC) Date: Fri, 11 Jun 2021 23:58:24 +0700 Message-ID: References: <20210606233638.v7b7rwbufay5ltn7@E15-2016.optimum.net> <83a6o1hn9l.fsf@gnu.org> <20210608004510.usj7rw2i6tmx6qnw@E15-2016.optimum.net> <83h7i9f5ij.fsf@gnu.org> <73df2202-081b-5e50-677d-e4498b6782d4@gmail.com> <83eedcdw8k.fsf@gnu.org> <83lf7hbqte.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="4141"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 Cc: boruch_baum@gmx.com To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Fri Jun 11 18:59:53 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1lrkVd-0000sU-8Y for ged-emacs-devel@m.gmane-mx.org; Fri, 11 Jun 2021 18:59:53 +0200 Original-Received: from localhost ([::1]:47484 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lrkVb-0007ik-I3 for ged-emacs-devel@m.gmane-mx.org; Fri, 11 Jun 2021 12:59:51 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:33950) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lrkUV-00071T-FD for emacs-devel@gnu.org; Fri, 11 Jun 2021 12:58:45 -0400 Original-Received: from ciao.gmane.io ([116.202.254.214]:36180) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lrkUN-0007aQ-U0 for emacs-devel@gnu.org; Fri, 11 Jun 2021 12:58:37 -0400 Original-Received: from list by ciao.gmane.io with local (Exim 4.92) (envelope-from ) id 1lrkUL-0009a4-0i for emacs-devel@gnu.org; Fri, 11 Jun 2021 18:58:33 +0200 X-Injected-Via-Gmane: http://gmane.org/ In-Reply-To: <83lf7hbqte.fsf@gnu.org> Content-Language: en-US Received-SPF: pass client-ip=116.202.254.214; envelope-from=ged-emacs-devel@m.gmane-mx.org; helo=ciao.gmane.io X-Spam_score_int: 5 X-Spam_score: 0.5 X-Spam_bar: / X-Spam_report: (0.5 / 5.0 requ) BAYES_00=-1.9, DKIM_ADSP_CUSTOM_MED=0.001, FORGED_GMAIL_RCVD=1, FREEMAIL_FORGED_FROMDOMAIN=0.249, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.249, NICE_REPLY_A=-0.001, NML_ADSP_CUSTOM_MED=0.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:270706 Archived-At: On 10/06/2021 23:57, Eli Zaretskii wrote: >> From: Maxim Nikulin Date: Thu, 10 Jun 2021 23:28:59 +0700 > > For processing CSV, if there's a need to know whether the > locale uses the comma as a decimal separator, we could > indeed extend locale-info. But such an extension is almost > trivial and doesn't even touch on the significant problems > in the rest of the discussion. > You forgot `setlocale(LC_NUMERIC, "C")', didn't you? #include #include #include int main() { setlocale(LC_ALL, ""); printf("%c", *nl_langinfo(RADIXCHAR)); setlocale(LC_NUMERIC, "C"); printf("%c\n", *nl_langinfo(RADIXCHAR)); return 0; } Output is ",.". There is nl_langinfo_l(3), but it requires more work. After parsing of rows to cells, it may be necessary to parse numbers ("2,34" to 2.34). That is why quality of CSV file import is tightly related to handling of number formats. >> I was trying to support Boruch that buffer-local variables >> may be important part of locale context, more precise than >> global settings, > > They are more precise, but they don't support mixed > languages in the same buffer, something that happens in > Emacs very frequently. In some cases I would prefer to have uniform format of numbers and dates despite alternating language in the buffer, e.g. for my private notes. > Here's a trivial example: > > (insert (downcase (buffer-substring POS1 POS2))) > > Contrast with > > (insert (downcase "FOO")) Either `set-text-properties' should be called on "FOO" before passing it to `downcase' or `locale-downcase' with LOCALE first argument should be added. Moreover, such `locale-downcase' function may be used to implement higher level functions working with implicit locales. LOCALE may assume some hierarchy with user overrides for particular call, text properties, buffer variables, global settings. > Yes: what we have already in Emacs. That covers a lot of > the same Unicode turf that ICU handles, because we import > and use the same Unicode files and tables. There are plenty of xml files in cldr-common-39.0.zip (common/main/*.xml) https://www.unicode.org/Public/cldr/39/ in addition to Unicode data in Emacs sources. They include rules for number formatting https://unicode.org/reports/tr35/tr35-numbers.html Of course, human-style number formatting, currencies, financial style, etc. may be discarded and implementation may be limited to grouping and decimal separators (leaving other features to further requests). There is newlocale(3) function in glibc to obtain minimal subset of properties. I am not familiar with other platforms.