From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "Pascal J. Bourguignon" Newsgroups: gmane.emacs.help Subject: Re: simple first emacs script Date: Wed, 15 Dec 2010 19:37:00 +0100 Organization: Informatimago Message-ID: <87oc8myjar.fsf@kuiper.lan.informatimago.com> References: <2V2Oo.37337$hW6.28446@newsfe08.ams2> <87hbefytr8.fsf@kuiper.lan.informatimago.com> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: dough.gmane.org 1292438580 7472 80.91.229.12 (15 Dec 2010 18:43:00 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Wed, 15 Dec 2010 18:43:00 +0000 (UTC) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Wed Dec 15 19:42:52 2010 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1PSwJI-0004Gk-HC for geh-help-gnu-emacs@m.gmane.org; Wed, 15 Dec 2010 19:42:51 +0100 Original-Received: from localhost ([127.0.0.1]:56726 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PSwJF-0004KR-70 for geh-help-gnu-emacs@m.gmane.org; Wed, 15 Dec 2010 13:42:41 -0500 Original-Path: usenet.stanford.edu!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail Original-Newsgroups: gnu.emacs.help Original-Lines: 310 Original-X-Trace: individual.net kK+xmP6hvFKCnm6aBi6uLAZ4ca6TDRw+SqI1GXN8vRIvkn3Qwn Cancel-Lock: sha1:NDM0MDc3MGFkZTQxNWQ5MmE4MjA2YjNhYjUxNGEzNGIyNDZkYmViNQ== sha1:ZfNhozkDUCIXnAQS7boplG6RLJE= Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwAQMAAABtzGvEAAAABlBMVEUAAAD///+l2Z/dAAAA oElEQVR4nK3OsRHCMAwF0O8YQufUNIQRGIAja9CxSA55AxZgFO4coMgYrEDDQZWPIlNAjwq9 033pbOBPtbXuB6PKNBn5gZkhGa86Z4x2wE67O+06WxGD/HCOGR0deY3f9Ijwwt7rNGNf6Oac l/GuZTF1wFGKiYYHKSFAkjIo1b6sCYS1sVmFhhhahKQssRjRT90ITWUk6vvK3RsPGs+M1RuR mV+hO/VvFAAAAABJRU5ErkJggg== X-Accept-Language: fr, es, en X-Disabled: X-No-Archive: no User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (gnu/linux) Original-Xref: usenet.stanford.edu gnu.emacs.help:183324 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:77567 Archived-At: Tom writes: > Wow Pascal that is quite an amazing response thanks. > > You introduced several new things I don't know about so I can't > comment on them until I go away and learn them but in response to what > I do understand. > > Yes the indentation was destroyed by newsreader, but thanks for > pointing me to paredit as I was finding managing parenthesis a pain. > >> The (require 'csv-mode) form would be better placed on the toplevel >> (ie. above the defun form). > > I don't get this. If I understand you correctly you are suggesting > something like this: > (require 'csv-mode) > (defun ... > ) > > If I do this then wont the require mode cease to be part of the > functions definition. Normally it would not be required to set the > mode the csv as the file extension would be .csv and csv-mode is > called automatically, but the raw files I receive have random > extensions - I suppose I could rename them all to overcome this but it > seemed simpler to tell the function to go into csv mode otherwise it > tries to process the file in fundamental mode. require only loads a library (if it is not already loaded). Setting a mode for a buffer would be done by calling the specific mode command, which may be defined in a library: (csv-mode) (By the way, these commands, as well as a lot of rarely commands, may be not loaded initially, but defined as autoloaded functions, that will load the library that contains their real definition automatically the first time they're called. For example, if you start emacs again, before loading csv-mode.el, the documentation of the command csv-mode (C-x f csv-mode RET) is: csv-mode is an interactive autoloaded Lisp function in `csv-mode.el'. [Arg list not available until function definition is loaded.] Major mode for editing comma-separated value files. ) Loading csv-mode with (require 'csv-mode) doesn't change the mode of the buffer, and this doesn't prevent csv-kill-fields to work, even if the mode of the buffer is not csv-mode. In general, modes only establish the key bindings and font-lock keywords to help editing a specific kind of text, but all the commands are applyable in all the modes. There may be some special modes that do some behind the scene processing (eg. building data structures in parallel to the buffer, defining buffer-local variables) that would be required by some of their commands for them to work, but it's rather rare. But, I would advise to avoid changing the mode of the buffer in a command such as nirs-data-clean. If you want to open the .R14 files in the CSV mode, you can do it by adding an entry to the auto-mode-alist variable (C-h v auto-mode-alist RET) in ~/.emacs: (push '("\\.R14$" . csv-mode) auto-mode-alist) However, the R14 example file you gave is not a CSV formated file. See below. >> Instead of push-mark (I don't see the matching pop-mark), you might use >> the save-excursion macro. > > I did actually start with save-excursion but I have no interest in > saving the point the mark, the whole point of pushing the mark and > moving the point to the start of the buffer was to specify the region > arguments in csv-kill-fields, i.e. > (csv-kill-fields '(4 ...) (point) (mark)) > > I guess this might be more logically done with > (csv-kill-fields '(4 ...) (point-min) (point-max)) > would that be considered better form? Yes, I noticed later your use of the (point) and (mark). But indeed, if you follow the documentation of push-mark, you'll see in the documentation of set-mark that this mechanism is reserved to the user interactive use, and that commands should avoid tampering with it. At first, I kept (point-min) and (point-max) in local variables start and end, but since I added the replacement of spaces by comma, this changed the size of the buffer, and therefore the value of (point-max). Therefore I called these functions everytime. If you want to memorize a buffer position in the course of editing that may change its absolute position (insertion and deletions), you may use markers (see the functions make-marker and set-marker), but markers need to be 'freed' explicitely by reseting them to nil, so they're less convenient to use than just calling (point-max) again (but to keep a position in the middle of the buffer they'd be the right mechanism to use). >> Ah, if you read the documentation of csv-kill-fields, you will see that >> it depends on the right setting of the variable csv-separators to know >> what separator to use. By default I have it set to a comma. So you >> want to bind this variable in your function: >> >> (let ((csv-separators '(" "))) >> (csv-kill-fields ...)) >> Note however that it is a single character string, and that your fields >> are separated by several. When I try it, it fails with csv-kill-fields >> complaining about the number of columns. It is probably better to use >> commas to separate the fields, ... > > I have read the documentation (that doesn't mean I understood it > though). I have the csv separators specified in my .emacs file (it > seem it will accept both " " and "," so csv-modes seems to read my > files correctly. Well, I'm not sure if it's a good idea to have both in the csv-separators list. One would have to check how csv functions deal with csv-separators. For example, I wonder what would happen if a record contained both separators: data item,other data, item It this ("data" "item,other" "data," "item") or ("data item" "other data" " item") or ("data" "item" "other" "data" "item") ? > But I guess it is logical to specify these in the > function in case I run it on a computer without these specified. I > don't seem to get problems with csv-kill-fields complaining about > number of columns but maybe I have just worked through it with trial > and error and no real understanding. I was surprised by this result too, given my reading of the documentation of csv-kill-fields. > Your final script seems more solid than mine, as in it behaves in the > same way on either iteration even after undoing. It doesn't seem to > work perfectly with across all channel options in some of the files I > ran it one, but there are lots of ideas in there (such as temporarily > using commas) for me to incorporate into my script so thanks again. Perhaps the problem comes from the fact that the files don't look like csv files really. They seem to have fixed-width columns, filled with spaces. In a csv file, if the separator character is present consecutively, that would mean that there is an empty field in between. Aligning data with a variable number of spaces is therefore incompatible. Perhaps some of your files have fields with spaces in the middle, or empty fields. Then simply replacing sequences of spaces by comma to make it csv (or have csv function interpret the space as a field separator) will make the csv function interpret incorrectly the fields. I would advise to check the specifications of the file format, and perhaps use a different code to convert it to csv. For example, assuming we have just records of fixed-width fields. (defun spacep (ch) (= ch ?\ )) ; one space character. (let ((one-record "08.11.10 14:57:17 67 0 4 -2.9254 -2.3866 0 0 72 0 4 -3.3003 -2.7971 0 0 63 0 4 -2.8989 -2.2108 0 0 75 0 4 -3.6963 -3.3294 0 0 AB0912040885-0 AB0912040757-0 AB0912040628-0 AB0912040780-0")) (loop ; let's detect a data -> space transition with data = nil with fields = '() for pos from 0 for ch across one-record do (if data (when (spacep ch) (setf data nil) (push pos fields)) (unless (spacep ch) (setf data t))) finally (return (cons 0 (reverse (cons (length one-record) fields)))))) --> (0 8 17 21 26 30 43 56 59 63 67 72 76 89 102 105 109 113 118 122 135 148 151 155 159 164 168 181 194 197 201 217 233 249 265) So you could now split the record in fields, remove the spaces, and concatenate it back into a csv record: (defvar *r14-field-positions* '(0 8 17 21 26 30 43 56 59 63 67 72 76 89 102 105 109 113 118 122 135 148 151 155 159 164 168 181 194 197 201 217 233 249 265)) (defun csvify-r14-record (record) (unsplit-string (mapcar (lambda (field) ; if the field contains a comma, ; it needs to be quoted. (if (find ?, field) (concat "\"" (replace-regexp-in-string "\"" "\\\"" field) "\"") field)) (loop for (start end) on *r14-field-positions* while end collect (string-trim " " (subseq record start end)))) ",")) (let ((one-record "08.11.10 14:57:17 67 0 4 -2.9254 -2.3866 0 0 72 0 4 -3.3003 -2.7971 0 0 63 0 4 -2.8989 -2.2108 0 0 75 0 4 -3.6963 -3.3294 0 0 AB0912040885-0 AB0912040757-0 AB0912040628-0 AB0912040780-0")) (csvify-r14-record one-record)) --> "08.11.10,14:57:17,67,0,4,-2.9254,-2.3866,0,0,72,0,4,-3.3003,-2.7971,0,0,63,0,4,-2.8989,-2.2108,0,0,75,0,4,-3.6963,-3.3294,0,0,AB0912040885-0,AB0912040757-0,AB0912040628-0,AB0912040780-0" So now we only have to call this function on each line of the buffer: (defun csvify-r14-buffer () (interactive) (dolines (start-line end-line) (let ((new-record (csvify-r14-record (buffer-substring start-line end-line)))) (delete-region start-line end-line) (insert new-record)))) With the following functions and macro (from my personal library): (defun string-trim (character-bag string-designator) "Common-Lisp: returns a substring of string, with all characters in \ character-bag stripped off the beginning and end. " (unless (sequencep character-bag) (signal 'type-error "Expected a sequence for `character-bag'.")) (let* ((string (string* string-designator)) (margin (format "[%s]*" (regexp-quote (if (stringp character-bag) character-bag (map 'string 'identity character-bag))))) (trimer (format "\\`%s\\(\\(.\\|\n\\)*?\\)%s\\'" margin margin))) (replace-regexp-in-string trimer "\\1" string))) (defun unsplit-string (string-list &rest separator) "Does the inverse than split-string. If no separator is provided then a simple space is used." (if (null separator) (setq separator " ") (if (= 1 (length separator)) (setq separator (car separator)) (error "unsplit-string: Too many separator arguments."))) (if (not (char-or-string-p separator)) (error "unsplit-string: separator must be a string or a char.")) (apply 'concat (list-insert-separator string-list separator))) (defmacro* with-marker ((var position) &body body) (let ((vposition (gensym))) ; so (eq var position) still works. `(let* ((,vposition ,position) (,var (make-marker))) (set-marker ,var ,vposition) (unwind-protect (progn ,@body) (set-marker ,var nil))))) (defmacro* dolines (start-end &body body) "Executes the body with start-var and end-var bound to the start \ and the end of each lines of the current buffer in turn." (let ((vline (gensym))) (destructuring-bind (start-var end-var) start-end `(let ((sm (make-marker)) (em (make-marker))) (unwind-protect (progn (goto-char (point-min)) (while (< (point) (point-max)) (let ((,vline (point))) (set-marker sm (point)) (set-marker em (progn (end-of-line) (point))) (let ((,start-var (marker-position sm)) (,end-var (marker-position em))) ,@body) (goto-char ,vline) (forward-line 1)))) (set-marker sm nil) (set-marker em nil)) nil)))) So instead of the replace-regexp, you can use (csvify-r14-buffer). At the end, you didn't say what resulting file format you wanted. You could remove the last replace-regexp, and keep the result in csv format, keep it, and have fields containing commas be left quoted (but you don't seem to have such data anyways), or write a more sophisticated command to format csv records into whatever format you want. -- __Pascal Bourguignon__ http://www.informatimago.com/ A bad day in () is better than a good day in {}.