unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: Maxim Nikulin <manikulin@gmail.com>
Cc: boruch_baum@gmx.com, emacs-devel@gnu.org
Subject: Re: CSV parsing and other issues (Re: LC_NUMERIC)
Date: Tue, 08 Jun 2021 21:52:59 +0300	[thread overview]
Message-ID: <83eedcdw8k.fsf@gnu.org> (raw)
In-Reply-To: <73df2202-081b-5e50-677d-e4498b6782d4@gmail.com> (message from Maxim Nikulin on Tue, 8 Jun 2021 23:35:51 +0700)

> Cc: emacs-devel@gnu.org
> From: Maxim Nikulin <manikulin@gmail.com>
> Date: Tue, 8 Jun 2021 23:35:51 +0700
> 
> On 08/06/2021 09:35, Eli Zaretskii wrote:
>  > From: Boruch Baum
>  >> No? If an Emacs user has two buffers in two separate languages, the
>  >> buffer-local settings aren't / won't be respected?
>  >
>  > First, language is different from locale.  And second, we don't even
>  > have a buffer-local notion of language yet.
> 
> Certainly locale is more precise than just language since it includes 
> region and other variants, moreover it can be granularly tuned (date, 
> numbers, sorting can be adjusted independently), but I still think that 
> all these properties can be sometimes broadly referred to as language.

No, they cannot, not in general.  A locale comes with a whole database
of different settings: language, encoding (a.k.a. "codeset"), formats
of date and time, names of days of the week and of the months, rules
for collation and capitalization, etc. etc.  You can easily find
several locales whose language is English, but some/many/all of the
other locale-dependent settings are different.  It isn't a coincidence
that a locale's name includes more than just the language part.

> Low level functions can accept explicit locale.

Which ones?  Most libc routines don't, they use the locale as a global
identifier.  And many libc's (with the prominent exception of glibc)
don't support efficient change of a locale in the middle of a program,
they assume that the program's locale is set once at program startup.

> Higher level API can obtain it implicitly from 
> buffer-local variables and global locale. For example the LOCALE 
> argument of `string-collate-lessp' is optional one. I can even 
> anticipate that locale may be stored in text properties some times. A 
> random message from recent "About multilingual documents" thread at 
> emacs-orgmode mail list:
> https://lists.gnu.org/archive/html/emacs-orgmode/2021-05/msg00252.html

That's mostly about input methods and org-export, I don't see how it's
relevant to what Boruch asked.

> At first basic functionality may be implemented. The problem is to 
> choose extensible API.

No, the problem is to have a design that would allow an efficient
implementation.  Given what the underlying libc does, it isn't easy.

And then we have conceptual problems.  For example, in a multilingual
editor such as Emacs, the notion of a "buffer language" not always
makes sense, you'd need to support portions of text that have
different language properties.  Imagine switching locales as Emacs
processes adjacent stretches of text and other complications.  For
example, changing letter-case for a stretch or Turkish text is
supposed to be different from the English or German text.  I'm all
ears for ideas how to design such "language support".  It definitely
isn't easy, so if you have ideas, please voice them!

> I just have realized that nl_langinfo(3) (and nl_langinfo_l(3) as well) 
> from libc accepts RADIXCHAR (decimal dot) and THOUSEP (group separator) 
> arguments. They are good candidates for `locale-info' extension.

We already use nl_langinfo in locale-info, so what exactly are you
suggesting here? adding more items?  You don't really expect Lisp
programs to format numbers such as 123,456 by hand after learning from
locale-info that the thousands separator is a comma, do you?

> Actually Qt links my example with other libraries from ICU. My point was 
> that since Emacs anyway (indirectly) links with this library, the 
> dependency may be not so heavy.

If you are suggesting that we introduce ICU as a dependency, we could
discuss the pros and cons.  It isn't a simple decision, because ICU
comes with a lot of baggage that we already have implemented in Emacs,
so whether we throw away what we have and use ICU instead, or just add
what we miss without depending on ICU, requires good thought and good
acquaintance with the ICU internals (to make sure it does what we want
in Emacs, and doesn't break existing features).

> My personal requirements for number 
> formatting were quite modest so far, I expect that other languages (CJK, 
> right-to-left scripts, etc.) may require quite special treatment, so 
> implementation in Emacs (and further maintenance) may require a lot of 
> work. At least API of ICU should be studied to get some inspiration what 
> features will be necessary for users from other regions.

I don't think the problem is the API.

> E.g. I was completely unaware that negative sign may be represented by 
> parenthesis

Really? it's standard in financial applications.

> I expect enough surprises and unexpected "discoveries" during 
> implementation of better locale support. That is why I would consider 
> adapting some more or less established API for this purpose.

I don't think "consider" cuts it.  We have already a lot of stuff in
Emacs; what we don't have needs serious design and comparison of
available implementation options.  Emacs's needs are quite special and
unlike those of most other programs.



  reply	other threads:[~2021-06-08 18:52 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-06 23:36 CSV parsing and other issues (Re: LC_NUMERIC) Boruch Baum
2021-06-07 12:28 ` Eli Zaretskii
2021-06-08  0:45   ` Boruch Baum
2021-06-08  2:35     ` Eli Zaretskii
2021-06-08 15:35       ` Stefan Monnier
2021-06-08 16:35       ` Maxim Nikulin
2021-06-08 18:52         ` Eli Zaretskii [this message]
2021-06-10 16:28           ` Maxim Nikulin
2021-06-10 16:57             ` Eli Zaretskii
2021-06-10 18:01               ` Boruch Baum
2021-06-10 18:50                 ` Eli Zaretskii
2021-06-10 19:04                   ` Boruch Baum
2021-06-10 19:23                     ` Eli Zaretskii
2021-06-10 20:20                       ` Boruch Baum
2021-06-11  6:19                         ` Eli Zaretskii
2021-06-11  8:18                           ` Boruch Baum
2021-06-11 16:51                           ` Maxim Nikulin
2021-06-11 13:56                       ` Filipp Gunbin
2021-06-11 14:10                         ` Eli Zaretskii
2021-06-11 18:52                           ` Filipp Gunbin
2021-06-11 19:34                             ` Eli Zaretskii
2021-06-11 16:58               ` Maxim Nikulin
2021-06-11 18:04                 ` Eli Zaretskii
2021-06-14 16:38                   ` Maxim Nikulin
2021-06-14 17:19                     ` Eli Zaretskii
2021-06-16 17:27                       ` Maxim Nikulin
2021-06-16 17:36                         ` Eli Zaretskii
2021-06-10 21:10             ` Stefan Monnier
2021-06-12 14:41               ` Maxim Nikulin
  -- strict thread matches above, loose matches on Subject: below --
2021-06-02 18:54 LC_NUMERIC formatting [FEATURE REQUEST] Boruch Baum
2021-06-03 14:44 ` CSV parsing and other issues (Re: LC_NUMERIC) Maxim Nikulin
2021-06-03 15:01   ` Eli Zaretskii
2021-06-04 16:31     ` Maxim Nikulin
2021-06-04 19:17       ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83eedcdw8k.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=boruch_baum@gmx.com \
    --cc=emacs-devel@gnu.org \
    --cc=manikulin@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).