From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: CSV parsing and other issues (Re: LC_NUMERIC) Date: Tue, 08 Jun 2021 21:52:59 +0300 Message-ID: <83eedcdw8k.fsf@gnu.org> References: <20210606233638.v7b7rwbufay5ltn7@E15-2016.optimum.net> <83a6o1hn9l.fsf@gnu.org> <20210608004510.usj7rw2i6tmx6qnw@E15-2016.optimum.net> <83h7i9f5ij.fsf@gnu.org> <73df2202-081b-5e50-677d-e4498b6782d4@gmail.com> Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="13648"; mail-complaints-to="usenet@ciao.gmane.io" Cc: boruch_baum@gmx.com, emacs-devel@gnu.org To: Maxim Nikulin Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Tue Jun 08 20:54:48 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1lqgs9-0003D2-Ng for ged-emacs-devel@m.gmane-mx.org; Tue, 08 Jun 2021 20:54:45 +0200 Original-Received: from localhost ([::1]:60090 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lqgs8-0002pG-Oj for ged-emacs-devel@m.gmane-mx.org; Tue, 08 Jun 2021 14:54:44 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:35718) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lqgqn-00014d-JJ for emacs-devel@gnu.org; Tue, 08 Jun 2021 14:53:21 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:42908) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lqgqn-0002BY-09; Tue, 08 Jun 2021 14:53:21 -0400 Original-Received: from 84.94.185.95.cable.012.net.il ([84.94.185.95]:2821 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lqgqk-0000kn-BS; Tue, 08 Jun 2021 14:53:20 -0400 In-Reply-To: <73df2202-081b-5e50-677d-e4498b6782d4@gmail.com> (message from Maxim Nikulin on Tue, 8 Jun 2021 23:35:51 +0700) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:270584 Archived-At: > Cc: emacs-devel@gnu.org > From: Maxim Nikulin > Date: Tue, 8 Jun 2021 23:35:51 +0700 > > On 08/06/2021 09:35, Eli Zaretskii wrote: > > From: Boruch Baum > >> No? If an Emacs user has two buffers in two separate languages, the > >> buffer-local settings aren't / won't be respected? > > > > First, language is different from locale. And second, we don't even > > have a buffer-local notion of language yet. > > Certainly locale is more precise than just language since it includes > region and other variants, moreover it can be granularly tuned (date, > numbers, sorting can be adjusted independently), but I still think that > all these properties can be sometimes broadly referred to as language. No, they cannot, not in general. A locale comes with a whole database of different settings: language, encoding (a.k.a. "codeset"), formats of date and time, names of days of the week and of the months, rules for collation and capitalization, etc. etc. You can easily find several locales whose language is English, but some/many/all of the other locale-dependent settings are different. It isn't a coincidence that a locale's name includes more than just the language part. > Low level functions can accept explicit locale. Which ones? Most libc routines don't, they use the locale as a global identifier. And many libc's (with the prominent exception of glibc) don't support efficient change of a locale in the middle of a program, they assume that the program's locale is set once at program startup. > Higher level API can obtain it implicitly from > buffer-local variables and global locale. For example the LOCALE > argument of `string-collate-lessp' is optional one. I can even > anticipate that locale may be stored in text properties some times. A > random message from recent "About multilingual documents" thread at > emacs-orgmode mail list: > https://lists.gnu.org/archive/html/emacs-orgmode/2021-05/msg00252.html That's mostly about input methods and org-export, I don't see how it's relevant to what Boruch asked. > At first basic functionality may be implemented. The problem is to > choose extensible API. No, the problem is to have a design that would allow an efficient implementation. Given what the underlying libc does, it isn't easy. And then we have conceptual problems. For example, in a multilingual editor such as Emacs, the notion of a "buffer language" not always makes sense, you'd need to support portions of text that have different language properties. Imagine switching locales as Emacs processes adjacent stretches of text and other complications. For example, changing letter-case for a stretch or Turkish text is supposed to be different from the English or German text. I'm all ears for ideas how to design such "language support". It definitely isn't easy, so if you have ideas, please voice them! > I just have realized that nl_langinfo(3) (and nl_langinfo_l(3) as well) > from libc accepts RADIXCHAR (decimal dot) and THOUSEP (group separator) > arguments. They are good candidates for `locale-info' extension. We already use nl_langinfo in locale-info, so what exactly are you suggesting here? adding more items? You don't really expect Lisp programs to format numbers such as 123,456 by hand after learning from locale-info that the thousands separator is a comma, do you? > Actually Qt links my example with other libraries from ICU. My point was > that since Emacs anyway (indirectly) links with this library, the > dependency may be not so heavy. If you are suggesting that we introduce ICU as a dependency, we could discuss the pros and cons. It isn't a simple decision, because ICU comes with a lot of baggage that we already have implemented in Emacs, so whether we throw away what we have and use ICU instead, or just add what we miss without depending on ICU, requires good thought and good acquaintance with the ICU internals (to make sure it does what we want in Emacs, and doesn't break existing features). > My personal requirements for number > formatting were quite modest so far, I expect that other languages (CJK, > right-to-left scripts, etc.) may require quite special treatment, so > implementation in Emacs (and further maintenance) may require a lot of > work. At least API of ICU should be studied to get some inspiration what > features will be necessary for users from other regions. I don't think the problem is the API. > E.g. I was completely unaware that negative sign may be represented by > parenthesis Really? it's standard in financial applications. > I expect enough surprises and unexpected "discoveries" during > implementation of better locale support. That is why I would consider > adapting some more or less established API for this purpose. I don't think "consider" cuts it. We have already a lot of stuff in Emacs; what we don't have needs serious design and comparison of available implementation options. Emacs's needs are quite special and unlike those of most other programs.