From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Philip Kaludercic Newsgroups: gmane.emacs.devel Subject: Re: [PATCH] csv-mode.el: Add function for reading a CSV line Date: Wed, 22 May 2024 16:14:31 +0000 Message-ID: <87r0dthli0.fsf@posteo.net> References: <86wmnmixmf.fsf@fastmail.fm> <87le42v093.fsf@posteo.net> <86ttiqjppp.fsf@fastmail.fm> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="38754"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Emacs Devel To: Joost Kremers Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Wed May 22 18:15:24 2024 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1s9ocd-0009lG-2R for ged-emacs-devel@m.gmane-mx.org; Wed, 22 May 2024 18:15:23 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1s9obw-0002nH-Uc; Wed, 22 May 2024 12:14:40 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1s9obu-0002mj-OP for emacs-devel@gnu.org; Wed, 22 May 2024 12:14:38 -0400 Original-Received: from mout01.posteo.de ([185.67.36.65]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1s9obr-0005zu-At for emacs-devel@gnu.org; Wed, 22 May 2024 12:14:38 -0400 Original-Received: from submission (posteo.de [185.67.36.169]) by mout01.posteo.de (Postfix) with ESMTPS id 7B5C5240028 for ; Wed, 22 May 2024 18:14:32 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=posteo.net; s=2017; t=1716394472; bh=20kLPNmd7Oz9JQkVjAb0iqud/EKutMVkfXVnnab9v8U=; h=From:To:Cc:Subject:OpenPGP:Date:Message-ID:MIME-Version: Content-Type:From; b=RkER+jXq7aIphZEcF0KNZvz8Cruhpc2hi9Jsw14w2wAZ9GpoMns+f7w8hNvbn2xY9 lIJM288wcTHZ6S0edzXDdImmxk5/OF6VG1KlrAPyTLPyHx8AsWWb5Vs2C4W2umt3UY jG0HmiOr3xAVf/G5OT+bjoIDxEuGQIkIwg89svZKPkx41/YKTEwaK80NidLDhAwuoS gkfJSh9VYqX6fRVzsDKmRaYmadi7p0AZ89tOneVqpw5onKKfaeJIJQm7ihnqh5e8kS RHiquWeQBU/XZ/1pzwGH2lxEkAcZ+Jfve8jbL9hLFXU/D92ohL53TOAVkyDJ5SBUjK 1x+L4lZdlbQ9Q== Original-Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4VkxFz5Gnfz6tyX; Wed, 22 May 2024 18:14:31 +0200 (CEST) In-Reply-To: <86ttiqjppp.fsf@fastmail.fm> (Joost Kremers's message of "Wed, 22 May 2024 09:00:34 +0200") OpenPGP: id=7126E1DE2F0CE35C770BED01F2C3CC513DB89F66; url="https://keys.openpgp.org/vks/v1/by-fingerprint/7126E1DE2F0CE35C770BED01F2C3CC513DB89F66"; preference=signencrypt Received-SPF: pass client-ip=185.67.36.65; envelope-from=philipk@posteo.net; helo=mout01.posteo.de X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:319486 Archived-At: Joost Kremers writes: > On Wed, May 22 2024, Philip Kaludercic wrote: >> Joost Kremers writes: >>> +(defun csv--unquote-value (value) >>> + "Remove quotes around VALUE. >>> +If VALUE contains escaped quote characters, un-escape them. If >>> +VALUE is not quoted, return it unchanged." >>> + (save-match-data >>> + (let ((quote-regexp (apply #'concat `("[" ,@csv-field-quotes "]")))) >>> + (string-match (concat "^\\(" quote-regexp "\\)\\(.*\\)\\(" quote-regexp "\\)$") value) >> >> Shouldn't this `string-match' be in the if-let? > > I considered that, but in this particular case, `(match-string 1 value)` returns > nil if the first character of `value` isn't in `csv-field-quotes`, so it seems > to be OK. > > Emphasis on "seems" though... Plus, there's no need to call `match-string` at > all if `string-match` failed, of course. So new patch attached. > >> Take this example, >> >> (let ((str "1 2 3")) >> (list (string-match "2" str) >> (match-string 0 str) >> (string-match "4" str) >> (match-string 0 str))) >> ;;=> (2 "2" nil "2") >> >> even though string-match failed, the match data remains and matc-string >> returns non-nil values. > > Oh... I kinda assumed that `string-match` would always reset all of the match > data, but apparently not. Good to know! > > > Thanks, > > Joost > > > -- > Joost Kremers > Life has its moments > > From bb582c8e413451f59db1d26d4c0208348370283b Mon Sep 17 00:00:00 2001 > From: Joost Kremers > Date: Wed, 22 May 2024 00:07:34 +0200 > Subject: [PATCH] Add function for reading a CSV line and return its values as > a list. > > * (csv-parse-current-row): New function; unlike csv--collect-fields, > unquotes the field values. > * (csv--unquote-value): New function. > --- > csv-mode-tests.el | 23 +++++++++++++++++++++++ > csv-mode.el | 26 +++++++++++++++++++++++++- > 2 files changed, 48 insertions(+), 1 deletion(-) > > diff --git a/csv-mode-tests.el b/csv-mode-tests.el > index 0caeab7..12d0417 100644 > --- a/csv-mode-tests.el > +++ b/csv-mode-tests.el > @@ -144,5 +144,28 @@ > (csv--separator-score ?\; csv-tests--data > (length csv-tests--data))))) > > +(ert-deftest csv-tests-unquote-value () > + (should (equal (csv--unquote-value "Hello, World") > + "Hello, World")) > + (should (equal (csv--unquote-value "\"Hello, World\"") > + "Hello, World")) > + (should (equal (csv--unquote-value "Hello, \"\"World") > + "Hello, \"\"World")) > + (should (equal (csv--unquote-value "\"Hello, \"\"World\"\"\"") > + "Hello, \"World\"")) > + (should (equal (csv--unquote-value "'Hello, World'") > + "'Hello, World'")) > + (should (equal (let ((csv-field-quotes '("\"" "'"))) > + (csv--unquote-value "\"Hello, World'")) > + "\"Hello, World'")) > + (should (equal (let ((csv-field-quotes '("\"" "'"))) > + (csv--unquote-value "'Hello, World'")) > + "Hello, World")) > + (should (equal (let ((csv-field-quotes '("\"" "'"))) > + (csv--unquote-value "'Hello, ''World'''")) > + "Hello, 'World'")) > + (should (equal (csv--unquote-value "|Hello, World|") > + "|Hello, World|"))) > + > (provide 'csv-mode-tests) > ;;; csv-mode-tests.el ends here > diff --git a/csv-mode.el b/csv-mode.el > index f639dcf..ebcd9da 100644 > --- a/csv-mode.el > +++ b/csv-mode.el > @@ -4,7 +4,7 @@ > > ;; Author: "Francis J. Wright" > ;; Maintainer: emacs-devel@gnu.org > -;; Version: 1.23 > +;; Version: 1.24 > ;; Package-Requires: ((emacs "27.1") (cl-lib "0.5")) > ;; Keywords: convenience > > @@ -107,6 +107,10 @@ > > ;;; News: > > +;; Since 1.24 > +;; - New function `csv--unquote-value'. > +;; - New function `csv-parse-current-row'. > + > ;; Since 1.21: > ;; - New command `csv-insert-column'. > ;; - New config var `csv-align-min-width' for `csv-align-mode'. > @@ -1400,6 +1404,26 @@ point is assumed to be at the beginning of the line." > (forward-char))) > (nreverse fields))))) > > +(defun csv--unquote-value (value) > + "Remove quotes around VALUE. > +If VALUE contains escaped quote characters, un-escape them. If > +VALUE is not quoted, return it unchanged." > + (save-match-data > + (let ((quote-regexp (apply #'concat `("[" ,@csv-field-quotes "]")))) > + (if-let (((string-match (concat "^\\(" quote-regexp "\\)\\(.*\\)\\(" quote-regexp "\\)$") value)) > + (quote-char (match-string 1 value)) > + ((equal quote-char (match-string 3 value))) > + (unquoted (match-string 2 value))) > + (replace-regexp-in-string (concat quote-char quote-char) quote-char unquoted) > + value)))) > + > +(defun csv-parse-current-row () > + "Parse the current CSV line. > +Return the field values as a list." > + (save-mark-and-excursion > + (goto-char (line-beginning-position)) > + (mapcar #'csv--unquote-value (csv--collect-fields (line-end-position))))) > + > (defvar-local csv--header-line nil) > (defvar-local csv--header-hscroll nil) > (defvar-local csv--header-string nil) Seems fine to me. I'd apply it if there are no objections. Until then, you can prepare to modify your package to use this change. -- Philip Kaludercic on peregrine