From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Maxim Nikulin Newsgroups: gmane.emacs.devel Subject: Re: CSV parsing and other issues (Re: LC_NUMERIC) Date: Fri, 11 Jun 2021 23:51:40 +0700 Message-ID: <4ad5bbe4-a5f0-37f8-2642-0cbdfcdfacc2@gmail.com> References: <20210608004510.usj7rw2i6tmx6qnw@E15-2016.optimum.net> <83h7i9f5ij.fsf@gnu.org> <73df2202-081b-5e50-677d-e4498b6782d4@gmail.com> <83eedcdw8k.fsf@gnu.org> <83lf7hbqte.fsf@gnu.org> <20210610180145.staexpqsmqpiv44c@E15-2016.optimum.net> <83a6nxbllz.fsf@gnu.org> <20210610190453.gaq5pqlfbvy4zwmp@E15-2016.optimum.net> <837dj1bk1k.fsf@gnu.org> <20210610202045.4b42osqejx66pb6p@E15-2016.optimum.net> <83v96kapog.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="6355"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Fri Jun 11 18:52:24 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1lrkOO-0001Zd-Ae for ged-emacs-devel@m.gmane-mx.org; Fri, 11 Jun 2021 18:52:24 +0200 Original-Received: from localhost ([::1]:45394 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lrkON-0005yL-D4 for ged-emacs-devel@m.gmane-mx.org; Fri, 11 Jun 2021 12:52:23 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:33020) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lrkNs-0005EF-KO for emacs-devel@gnu.org; Fri, 11 Jun 2021 12:51:52 -0400 Original-Received: from ciao.gmane.io ([116.202.254.214]:39402) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lrkNr-0003IS-4G for emacs-devel@gnu.org; Fri, 11 Jun 2021 12:51:52 -0400 Original-Received: from list by ciao.gmane.io with local (Exim 4.92) (envelope-from ) id 1lrkNn-0000i9-8x for emacs-devel@gnu.org; Fri, 11 Jun 2021 18:51:47 +0200 X-Injected-Via-Gmane: http://gmane.org/ In-Reply-To: <83v96kapog.fsf@gnu.org> Content-Language: en-US Received-SPF: pass client-ip=116.202.254.214; envelope-from=ged-emacs-devel@m.gmane-mx.org; helo=ciao.gmane.io X-Spam_score_int: 5 X-Spam_score: 0.5 X-Spam_bar: / X-Spam_report: (0.5 / 5.0 requ) BAYES_00=-1.9, DKIM_ADSP_CUSTOM_MED=0.001, FORGED_GMAIL_RCVD=1, FREEMAIL_FORGED_FROMDOMAIN=0.249, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.249, NICE_REPLY_A=-0.001, NML_ADSP_CUSTOM_MED=0.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:270705 Archived-At: Eli, Boruch, you are overreacting (both). On 11/06/2021 13:19, Eli Zaretskii wrote: > There's no need to > introduce into Emacs features that are useful for a few people. I think that expectation of users and developers in respect to support of locales evolves in time. Proper formatting of numbers is useful more widely then for a few people. Boruch, till your last messages, I believed that you were convinced that adding support of "'" and "I" is not so easy. Support of locale-dependent format specifiers through printf looks attractive but it can not be directly used by `format' or other elisp functions in a safe way. Some code calling `format' implicitly expects that it generates locale-independent numbers, so changing its behavior is not backward compatible. libc can only work with single global locale at any moment. I expect that attempt to "temporary" call setlocale(LC_NUMERIC, "") will be permanent source of bugs: forgotten reverting call, call of a function that needs universal format in locale-specific context, threads started at inappropriate moment, etc. Another implementation of locale functions is necessary with ability to perform parsing and formatting without touching of global variables. Personally I expect basic level functions with explicit locale context (random names): (locale-format-number-with-ctx (locale-get-current-context :group-separator 'suppress) 1234567890) or with explicit locale instead of `locale-get-current-context'. It is better to add some convenience helpers that inspect text properties, buffer-local and global settings to determine context: (locale-format-number 1234567890) and maybe even `locale-format[-with-ctx]' that accepts printf-like format string. On 11/06/2021 03:20, Boruch Baum wrote: > Then don't make them locale specific. Implement the > single-quote specifier the same way you currently handle the > floating-point specifier '%f', a locale-specific format that > has existed in emacs without complaint since ... You are confusing something. "%f" is not locale-specific inside Emacs, it uses "universal" format with dot "." as decimal separator even in locales with "," in this role. At the same time "'" is highly locale-dependent in libc. Group sizes and group separator widely vary. I posted this example earlier: LC_NUMERIC=C.UTF-8 /usr/bin/printf "%'d\n" 1234567890 1234567890 LC_NUMERIC=en_US.UTF-8 /usr/bin/printf "%'d\n" 1234567890 1,234,567,890 LC_NUMERIC=es_ES.UTF-8 /usr/bin/printf "%'d\n" 1234567890 1.234.567.890 LC_NUMERIC=ru_RU.UTF-8 /usr/bin/printf "%'d\n" 1234567890 1 234 567 890 LC_NUMERIC=en_IN.UTF-8 /usr/bin/printf "%'d\n" 1234567890 1,23,45,67,890 > It's not your responsibilty. > > I can say that in the use-case that prompted my request, I'm > confident it will *never* be an issue. I ask format to give > me a string and I display it. End of story. Whether just 99% > or 99.99%, the overwhelming majority of cases will be the > same. Your concerns are total non-issues. I would prefer to avoid idiosyncrasy when "%'d" is locale-dependent but "%f" is not. P.S. With some limitation (printf binary is available and you do not need to work with floating point numbers), you can leverage libc formatting facilities with the following crutch: (shell-command-to-string (format "/usr/bin/printf \"%%'d\" %d" 1234567890))