unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: Maxim Nikulin <manikulin@gmail.com>
Cc: boruch_baum@gmx.com, emacs-devel@gnu.org
Subject: Re: CSV parsing and other issues (Re: LC_NUMERIC)
Date: Thu, 10 Jun 2021 19:57:33 +0300	[thread overview]
Message-ID: <83lf7hbqte.fsf@gnu.org> (raw)
In-Reply-To: <ea575f9a-5d90-d13f-4d8c-58552541034d@gmail.com> (message from Maxim Nikulin on Thu, 10 Jun 2021 23:28:59 +0700)

> From: Maxim Nikulin <manikulin@gmail.com>
> Date: Thu, 10 Jun 2021 23:28:59 +0700
> Cc: boruch_baum@gmx.com
> 
> > We already use nl_langinfo in locale-info, so what exactly
> > are you suggesting here? adding more items?  You don't
> > really expect Lisp programs to format numbers such as
> > 123,456 by hand after learning from locale-info that the
> > thousands separator is a comma, do you?
> 
> I have hijacked Boruch's thread and changed the subject to "CSV 
> parsing".

That explains part of my confusion.  Please try not to hijack
discussions; instead, start a separate thread, to avoid such
confusion.

For processing CSV, if there's a need to know whether the locale uses
the comma as a decimal separator, we could indeed extend locale-info.
But such an extension is almost trivial and doesn't even touch on the
significant problems in the rest of the discussion.

> I was trying to support Boruch that buffer-local variables may be
> important part of locale context, more precise than global settings,

They are more precise, but they don't support mixed languages in the
same buffer, something that happens in Emacs very frequently.  Which
means they are not precise enough.  So my POV is that we should look
for a way to be able to specify the language of some span of text, in
which case buffers that use a single language will be a special case.

> > And then we have conceptual problems.  For example, in a
> > multilingual editor such as Emacs, the notion of a "buffer
> > language" not always makes sense, you'd need to support
> > portions of text that have different language properties.
> > Imagine switching locales as Emacs processes adjacent
> > stretches of text and other complications.  For example,
> > changing letter-case for a stretch or Turkish text is
> > supposed to be different from the English or German text.
> > I'm all ears for ideas how to design such "language
> > support".  It definitely isn't easy, so if you have ideas,
> > please voice them!
> 
> I never have a consistent vision nor see a conceptual problem. 

Here's  a trivial example:

  (insert (downcase (buffer-substring POS1 POS2)))

Contrast with

  (insert (downcase "FOO"))

The function 'downcase' gets a Lisp string, but it has no way of
knowing whether the string is actually a portion of current buffer's
text.  So how can it apply the correct letter-case conversions, even
if some buffer-local setting specifies that this should be done using
some specific language's rules?

IOW, one of the non-trivial problems is how to process Lisp strings
correctly for these purposes.  Buffers can have local variables, but
what about strings?

> > If you are suggesting that we introduce ICU as a dependency,
> > we could discuss the pros and cons.
> 
> I consider it as the most complete available implementation.  Do you 
> know a comparable alternative?

Yes: what we have already in Emacs.  That covers a lot of the same
Unicode turf that ICU handles, because we import and use the same
Unicode files and tables.  The question is: what is best for the
future development of Emacs in this area: depend on ICU (which would
mean we need to rewrite lots of code that is working well), or extend
what we have to support more Unicode features?  One not-so-trivial
aspect of this is efficiency of fetching character properties (Emacs
has char-tables for that, which are efficient both CPU- and
memory-wise).  Another aspect is support for raw bytes in buffers and
strings.  And there are probably some others.

It is not a simple decision.



  reply	other threads:[~2021-06-10 16:57 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-06 23:36 CSV parsing and other issues (Re: LC_NUMERIC) Boruch Baum
2021-06-07 12:28 ` Eli Zaretskii
2021-06-08  0:45   ` Boruch Baum
2021-06-08  2:35     ` Eli Zaretskii
2021-06-08 15:35       ` Stefan Monnier
2021-06-08 16:35       ` Maxim Nikulin
2021-06-08 18:52         ` Eli Zaretskii
2021-06-10 16:28           ` Maxim Nikulin
2021-06-10 16:57             ` Eli Zaretskii [this message]
2021-06-10 18:01               ` Boruch Baum
2021-06-10 18:50                 ` Eli Zaretskii
2021-06-10 19:04                   ` Boruch Baum
2021-06-10 19:23                     ` Eli Zaretskii
2021-06-10 20:20                       ` Boruch Baum
2021-06-11  6:19                         ` Eli Zaretskii
2021-06-11  8:18                           ` Boruch Baum
2021-06-11 16:51                           ` Maxim Nikulin
2021-06-11 13:56                       ` Filipp Gunbin
2021-06-11 14:10                         ` Eli Zaretskii
2021-06-11 18:52                           ` Filipp Gunbin
2021-06-11 19:34                             ` Eli Zaretskii
2021-06-11 16:58               ` Maxim Nikulin
2021-06-11 18:04                 ` Eli Zaretskii
2021-06-14 16:38                   ` Maxim Nikulin
2021-06-14 17:19                     ` Eli Zaretskii
2021-06-16 17:27                       ` Maxim Nikulin
2021-06-16 17:36                         ` Eli Zaretskii
2021-06-10 21:10             ` Stefan Monnier
2021-06-12 14:41               ` Maxim Nikulin
  -- strict thread matches above, loose matches on Subject: below --
2021-06-02 18:54 LC_NUMERIC formatting [FEATURE REQUEST] Boruch Baum
2021-06-03 14:44 ` CSV parsing and other issues (Re: LC_NUMERIC) Maxim Nikulin
2021-06-03 15:01   ` Eli Zaretskii
2021-06-04 16:31     ` Maxim Nikulin
2021-06-04 19:17       ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83lf7hbqte.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=boruch_baum@gmx.com \
    --cc=emacs-devel@gnu.org \
    --cc=manikulin@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).