From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Maxim Nikulin Newsgroups: gmane.emacs.devel Subject: Re: CSV parsing and other issues (Re: LC_NUMERIC) Date: Sat, 12 Jun 2021 21:41:48 +0700 Message-ID: References: <20210606233638.v7b7rwbufay5ltn7@E15-2016.optimum.net> <83a6o1hn9l.fsf@gnu.org> <20210608004510.usj7rw2i6tmx6qnw@E15-2016.optimum.net> <83h7i9f5ij.fsf@gnu.org> <73df2202-081b-5e50-677d-e4498b6782d4@gmail.com> <83eedcdw8k.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="23680"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sat Jun 12 16:43:01 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1ls4qi-00060H-S3 for ged-emacs-devel@m.gmane-mx.org; Sat, 12 Jun 2021 16:43:00 +0200 Original-Received: from localhost ([::1]:56718 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ls4qh-0006e9-4g for ged-emacs-devel@m.gmane-mx.org; Sat, 12 Jun 2021 10:42:59 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:51752) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ls4po-0005yy-EM for emacs-devel@gnu.org; Sat, 12 Jun 2021 10:42:04 -0400 Original-Received: from ciao.gmane.io ([116.202.254.214]:36174) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ls4pm-0001iE-TC for emacs-devel@gnu.org; Sat, 12 Jun 2021 10:42:04 -0400 Original-Received: from list by ciao.gmane.io with local (Exim 4.92) (envelope-from ) id 1ls4pj-0004hx-Ir for emacs-devel@gnu.org; Sat, 12 Jun 2021 16:41:59 +0200 X-Injected-Via-Gmane: http://gmane.org/ In-Reply-To: Content-Language: en-US Received-SPF: pass client-ip=116.202.254.214; envelope-from=ged-emacs-devel@m.gmane-mx.org; helo=ciao.gmane.io X-Spam_score_int: 28 X-Spam_score: 2.8 X-Spam_bar: ++ X-Spam_report: (2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_ADSP_CUSTOM_MED=0.001, FORGED_GMAIL_RCVD=1, FORGED_MUA_MOZILLA=2.309, FREEMAIL_FORGED_FROMDOMAIN=0.248, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.248, NICE_REPLY_A=-0.001, NML_ADSP_CUSTOM_MED=0.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:270765 Archived-At: On 11/06/2021 04:10, Stefan Monnier wrote: >> There are plenty of CSV dialects. If decimal separator is >> "," then office software uses ";" instead of comma as cell >> (field) separator. > > But there's no reason to presume that a given CSV file was > generated in the same locale as the one we're currently > using. > > So the locale could be one ingredient in the machinery used > to guess which separator was used, but I'm not sure it would > be of much help. You are right. My expectation is still that ";" is mostly used for locales with comma as decimal separator, and in such cases it must be tried with higher priority due to records that have enough amount of both characters. 1,2;3,45;56,789 Originally the question raised exactly in the context of attempt to improve guessing of separator: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=47885 The patches have however other problems. Advanced options for table import are likely more suitable e.g. for csv-mode and may become unnecessary burden in org-mode (especially if kill-yank would work well in both directions). Certainly users should have opportunity to explicitly specify the dialect of the files they are going to import. > [ BTW, I'll take the opportunity to advocate for the use of > TSV instead, which is slightly less ill-defined. ] In real world one often does have full control of file formats he has to deal with. In simple cases I can use space separated columns of numbers having fixed width. On the other hand downloaded bank statements are namely CSV with ";" as delimiter and in legacy windows 8-bit encoding (and such files have a kind of header with varying column number distinct from the following table). So ability to get decimal separator for current locale may slightly improve user experience with import of CSV files at least in Org mode. However it is just an aspect of support of locale-aware number formats in Emacs.