file filtering

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

* file filtering
@ 2007-01-30 15:34 Peter Tury
  2007-01-30 16:58 ` HS
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Peter Tury @ 2007-01-30 15:34 UTC (permalink / raw)
  To: help-gnu-emacs

Hi,

I would like to write an emacs lisp script what (filters +) modifies a
file logically in the following way:

* processes the file content line by line

* if line corresponds to a given regexp, then replaces the line by
  something built up from found regexp-parts (\1...)

* otherwise deletes the line

I would like to use this script similarly to grep: so emacs would run
in the backgroup (using --script initial option at Emacs invocation).

For this I am looking for some functionalities/functions what I don't
know:

* how to read a file without loading the whole file into memory
  (i.e. e.g. without loading it into a buffer)

E.g. I thought of a solution when I would read from the file only
strings what correspond to a given regexp. Something like
(insert-file-contents filename regexp). (In the "simpliest" case
regexp would be "^.*$".) Is this possible?

Then, the second step would be to replace the just inserted text, so
something like the following would be even better
(insert-file-contents filename regexp replace-match-first-arg): this
would find the regexp in filename, replace the found string according
to replace-match (in memory) and insert only the result into the buffer. 

Then (after a while loop what processes the whole file), the third
step would be to write the result into a new file, so the best would
be something like this :-) (append-to-file to-filename from-filename
regexp-to-read replace-match-first-arg-to-append)

I think I could create these functions if I would know how to read a
portion (not fixed number of chars!) of a file...

My problem is this: if I work on buffers (instead of files), I have to
create two buffers: one that corresponds to the original file and one
that corresponds to the result file -- or otherwise I have to delete
those portions of the first buffer what didn't matched by the regexp
searches -- and I don't know how to do it simply :-( Or using two
buffers (strings??) (and storing the two files in them) for such a
task isn't an ugly solution?

How to solve this task in the simpliest way?

Thanks,
P

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: file filtering
  2007-01-30 15:34 file filtering Peter Tury
@ 2007-01-30 16:58 ` HS
  2007-01-31  8:05   ` Peter Tury
  2007-02-01  5:55 ` Kevin Rodgers
       [not found] ` <mailman.3856.1170309361.2155.help-gnu-emacs@gnu.org>
  2 siblings, 1 reply; 12+ messages in thread
From: HS @ 2007-01-30 16:58 UTC (permalink / raw)
  To: help-gnu-emacs

Excuse me for saying that here, but do you really need/want to use 
elisp?
It seems much easier and "logical" to solve this problem - since it's 
a command-line script that will do some text processing - with Ruby, 
Python or Perl.
Cheers,
HS

On 30 jan, 12:34, Peter Tury <tury.pe...@gmail.com> wrote:
> Hi,
>
> I would like to write an emacs lisp script what (filters +) modifies a
> file logically in the following way:
>
> * processes the file content line by line
>
> * if line corresponds to a given regexp, then replaces the line by
>   something built up from found regexp-parts (\1...)
>
> * otherwise deletes the line
>
> I would like to use this script similarly to grep: so emacs would run
> in the backgroup (using --script initial option at Emacs invocation).
>
> For this I am looking for some functionalities/functions what I don't
> know:
>
> * how to read a file without loading the whole file into memory
>   (i.e. e.g. without loading it into a buffer)
>
> E.g. I thought of a solution when I would read from the file only
> strings what correspond to a given regexp. Something like
> (insert-file-contents filename regexp). (In the "simpliest" case
> regexp would be "^.*$".) Is this possible?
>
> Then, the second step would be to replace the just inserted text, so
> something like the following would be even better
> (insert-file-contents filename regexp replace-match-first-arg): this
> would find the regexp in filename, replace the found string according
> to replace-match (in memory) and insert only the result into the buffer.
>
> Then (after a while loop what processes the whole file), the third
> step would be to write the result into a new file, so the best would
> be something like this :-) (append-to-file to-filename from-filename
> regexp-to-read replace-match-first-arg-to-append)
>
> I think I could create these functions if I would know how to read a
> portion (not fixed number of chars!) of a file...
>
> My problem is this: if I work on buffers (instead of files), I have to
> create two buffers: one that corresponds to the original file and one
> that corresponds to the result file -- or otherwise I have to delete
> those portions of the first buffer what didn't matched by the regexp
> searches -- and I don't know how to do it simply :-( Or using two
> buffers (strings??) (and storing the two files in them) for such a
> task isn't an ugly solution?
>
> How to solve this task in the simpliest way?
>
> Thanks,
> P

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: file filtering
  2007-01-30 16:58 ` HS
@ 2007-01-31  8:05   ` Peter Tury
  2007-01-31 12:50     ` HS
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Tury @ 2007-01-31  8:05 UTC (permalink / raw)
  To: help-gnu-emacs

"HS" <hugows@gmail.com> writes:

> Excuse me for saying that here, but do you really need/want to use 
> elisp?

You are right: I don't really need, but I do really want... ;-) More
precisely: I have an elisp fun what does what I need, but not exactly
in the way I could really like. So I asked here if it is possible to
do in some better way or not. From (lack of) early answers it seems:
not nice way exists :-(

> It seems much easier and "logical" to solve this problem - since
> it's a command-line script that will do some text processing - with
> Ruby, Python or Perl.

Yes. Usually. But can exist cases when this would be part of a bigger
"system" what is "logical to implement" in elisp... Why mix it up with
other languages if not really needed?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: file filtering
  2007-01-31  8:05   ` Peter Tury
@ 2007-01-31 12:50     ` HS
  2007-01-31 13:34       ` Peter Tury
  0 siblings, 1 reply; 12+ messages in thread
From: HS @ 2007-01-31 12:50 UTC (permalink / raw)
  To: help-gnu-emacs

On 31 jan, 05:05, Peter Tury <tury.pe...@gmail.com> wrote:
> "HS" <hug...@gmail.com> writes:
> > Excuse me for saying that here, but do you really need/want to use
> > elisp?
>
> You are right: I don't really need, but I do really want... ;-) More
> precisely: I have an elisp fun what does what I need, but not exactly
> in the way I could really like. So I asked here if it is possible to
> do in some better way or not. From (lack of) early answers it seems:
> not nice way exists :-(
>
> > It seems much easier and "logical" to solve this problem - since
> > it's a command-line script that will do some text processing - with
> > Ruby, Python or Perl.
>
> Yes. Usually. But can exist cases when this would be part of a bigger
> "system" what is "logical to implement" in elisp... Why mix it up with
> other languages if not really needed?

Hm... I can't imagine how "not to read into a buffer"...
I just tried something like this, and it works:

(defun process-file (file)
  (interactive "f")
  (with-temp-buffer
    (insert-file-contents file)
	(delete-non-matching-lines "valid [0-9]+")
	(replace-regexp "valid \\([0-9]+\\)" "I found a \\1" )
	(write-file "test2.txt" nil)))

-------- input file ----------------
blablabla
valid 39
idasuiahsduihas
valid 123
dasuiohdiuahs
dasuiohduahs
dasudhas
valid 29
dasiuhuidah
dasjddij
d
d
asijdsj

-------- output file --------
I found a 39
I found a 123
I found a 29

Don't know if you can use something from that, but anyway :)
Good luck,
HS

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: file filtering
  2007-01-31 12:50     ` HS
@ 2007-01-31 13:34       ` Peter Tury
  2007-01-31 14:51         ` HS
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Tury @ 2007-01-31 13:34 UTC (permalink / raw)
  To: help-gnu-emacs

"HS" <hugo...> writes:

> Hm... I can't imagine how "not to read into a buffer"...
> I just tried something like this, and it works:
>
> (defun process-file (file)
>   (interactive "f")
>   (with-temp-buffer
>     (insert-file-contents file)
> 	(delete-non-matching-lines "valid [0-9]+")
> 	(replace-regexp "valid \\([0-9]+\\)" "I found a \\1" )
> 	(write-file "test2.txt" nil)))

Thanks! I've already learned new things from your answer:
with-temp-file, keep-lines, etc. And I like this solution. The only
disturbance this goes through the whole buffer twice...

And: why I can't find keep-lines in my Emacs 22's info? I really
searched the info for similar functions before my first ask and didn't
find this :-((

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: file filtering
  2007-01-31 13:34       ` Peter Tury
@ 2007-01-31 14:51         ` HS
  2007-02-01  7:47           ` Peter Tury
  0 siblings, 1 reply; 12+ messages in thread
From: HS @ 2007-01-31 14:51 UTC (permalink / raw)
  To: help-gnu-emacs

On 31 jan, 10:34, Peter Tury <tury.pe...@gmail.com> wrote:
> "HS" <hugo...> writes:
> > Hm... I can't imagine how "not to read into a buffer"...
> > I just tried something like this, and it works:
>
> > (defun process-file (file)
> >   (interactive "f")
> >   (with-temp-buffer
> >     (insert-file-contents file)
> >    (delete-non-matching-lines "valid [0-9]+")
> >    (replace-regexp "valid \\([0-9]+\\)" "I found a \\1" )
> >    (write-file "test2.txt" nil)))
>
> Thanks! I've already learned new things from your answer:
> with-temp-file, keep-lines, etc. And I like this solution. The only
> disturbance this goes through the whole buffer twice...
>
> And: why I can't find keep-lines in my Emacs 22's info? I really
> searched the info for similar functions before my first ask and didn't
> find this :-((

Hmm... I can find in mine and I'm also using GNU Emacs 22
"keep-lines is an interactive compiled Lisp function in `replace.el'."

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: file filtering
  2007-01-30 15:34 file filtering Peter Tury
  2007-01-30 16:58 ` HS
@ 2007-02-01  5:55 ` Kevin Rodgers
       [not found] ` <mailman.3856.1170309361.2155.help-gnu-emacs@gnu.org>
  2 siblings, 0 replies; 12+ messages in thread
From: Kevin Rodgers @ 2007-02-01  5:55 UTC (permalink / raw)
  To: help-gnu-emacs

Peter Tury wrote:
> I would like to write an emacs lisp script what (filters +) modifies a
> file logically in the following way:
> 
> * processes the file content line by line
> 
> * if line corresponds to a given regexp, then replaces the line by
>   something built up from found regexp-parts (\1...)
> 
> * otherwise deletes the line
> 
> I would like to use this script similarly to grep: so emacs would run
> in the backgroup (using --script initial option at Emacs invocation).

(find-file FILENAME)
(shell-command-on-region (point-min) (point-max)
			 (format "sed -n s/%s/%s/p"
				 (shell-quote-argument REGULAR_EXPRESSION)
				 (shell-quote-argument REPLACEMENT))
			 nil t)
(save-buffer) ; or (write-file NEW_FILENAME)

> For this I am looking for some functionalities/functions what I don't
> know:
> 
> * how to read a file without loading the whole file into memory
>   (i.e. e.g. without loading it into a buffer)
> 
> E.g. I thought of a solution when I would read from the file only
> strings what correspond to a given regexp. Something like
> (insert-file-contents filename regexp). (In the "simpliest" case
> regexp would be "^.*$".) Is this possible?

(shell-command (format "grep %s %s"
			(shell-quote-argument FILENAME)
			(shell-quote-argument REGULAR_EXPRESSION))
		t)

> Then, the second step would be to replace the just inserted text, so
> something like the following would be even better
> (insert-file-contents filename regexp replace-match-first-arg): this
> would find the regexp in filename, replace the found string according
> to replace-match (in memory) and insert only the result into the buffer. 

(replace-regexp REGEXP TO-STRING nil (point-min) (point-max))

> Then (after a while loop what processes the whole file), the third
> step would be to write the result into a new file, so the best would
> be something like this :-) (append-to-file to-filename from-filename
> regexp-to-read replace-match-first-arg-to-append)

(write-file NEW_FILENAME)

> I think I could create these functions if I would know how to read a
> portion (not fixed number of chars!) of a file...

Use an external command like grep to select the desired lines.  But
since you need to do that, you may as well use an external command like
sed to do the whole replacement -- otherwise, you're matching the
regular expression twice, once outside emacs to select the lines to
insert into the buffer and once inside emacs to find the text to
replace.

> My problem is this: if I work on buffers (instead of files), I have to
> create two buffers: one that corresponds to the original file and one
> that corresponds to the result file -- or otherwise I have to delete
> those portions of the first buffer what didn't matched by the regexp
> searches -- and I don't know how to do it simply :-( Or using two
> buffers (strings??) (and storing the two files in them) for such a
> task isn't an ugly solution?
> 
> How to solve this task in the simpliest way?

(with-temp-file NEW_FILENAME
   (shell-command (format "sed -n s/%s/%s/p %s"
			 (shell-quote-argument REGULAR_EXPRESSION)
			 (shell-quote-argument REPLACEMENT)
			 (shell-quote-argument FILENAME))
		 t		       ; output-buffer: (current-buffer)
		 nil))

-- 
Kevin Rodgers
Denver, Colorado, USA

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: file filtering
  2007-01-31 14:51         ` HS
@ 2007-02-01  7:47           ` Peter Tury
  2007-02-01 14:26             ` Mathias Dahl
                               ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Peter Tury @ 2007-02-01  7:47 UTC (permalink / raw)
  To: help-gnu-emacs

"HS" <hugows...> writes:

> On 31 jan, 10:34, Peter Tury <tury.pe...@gmail.com> wrote:
>> And: why I can't find keep-lines in my Emacs 22's info? I really
>> searched the info for similar functions before my first ask and didn't
>> find this :-((
>
> Hmm... I can find in mine and I'm also using GNU Emacs 22
> "keep-lines is an interactive compiled Lisp function in `replace.el'."

OK, I wasn't clear enough: I can find it _now_ = when I know its name
(C-h f is easy enough). But I couldn't find it when I just browsed the
info (C-h i): neither in Emacs nor in Elisp parts. It is neither
mentioned in thier indices.

Now I looked into replace.el and saw it _is_ part of GNU Emacs. So its
documentation should be reachable from C-h i, shouldn't it? Or: how
can I find a function if I don't know its name, just the functionality
I need? (C-h a seems to be a bit weak here for me.)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: file filtering
  2007-02-01  7:47           ` Peter Tury
@ 2007-02-01 14:26             ` Mathias Dahl
  2007-02-04 17:18             ` Kevin Rodgers
       [not found]             ` <mailman.3999.1170609530.2155.help-gnu-emacs@gnu.org>
  2 siblings, 0 replies; 12+ messages in thread
From: Mathias Dahl @ 2007-02-01 14:26 UTC (permalink / raw)
  To: help-gnu-emacs

Peter Tury <tury.peter@gmail.com> writes:

> Now I looked into replace.el and saw it _is_ part of GNU Emacs. So
> its documentation should be reachable from C-h i, shouldn't it? Or:
> how can I find a function if I don't know its name, just the
> functionality I need? (C-h a seems to be a bit weak here for me.)

In this case C-h a seem to work well if you understand that you want
to work with "lines". Of course, maybe you though about "rows"...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: file filtering
  2007-02-01  7:47           ` Peter Tury
  2007-02-01 14:26             ` Mathias Dahl
@ 2007-02-04 17:18             ` Kevin Rodgers
       [not found]             ` <mailman.3999.1170609530.2155.help-gnu-emacs@gnu.org>
  2 siblings, 0 replies; 12+ messages in thread
From: Kevin Rodgers @ 2007-02-04 17:18 UTC (permalink / raw)
  To: help-gnu-emacs

Peter Tury wrote:
> "HS" <hugows...> writes:
> 
>> On 31 jan, 10:34, Peter Tury <tury.pe...@gmail.com> wrote:
>>> And: why I can't find keep-lines in my Emacs 22's info? I really
>>> searched the info for similar functions before my first ask and didn't
>>> find this :-((
>> Hmm... I can find in mine and I'm also using GNU Emacs 22
>> "keep-lines is an interactive compiled Lisp function in `replace.el'."
> 
> OK, I wasn't clear enough: I can find it _now_ = when I know its name
> (C-h f is easy enough). But I couldn't find it when I just browsed the
> info (C-h i): neither in Emacs nor in Elisp parts. It is neither
> mentioned in thier indices.
> 
> Now I looked into replace.el and saw it _is_ part of GNU Emacs. So its
> documentation should be reachable from C-h i, shouldn't it? Or: how
> can I find a function if I don't know its name, just the functionality
> I need? (C-h a seems to be a bit weak here for me.)

keep-lines is documented under the Searching and Replacement node of the
Emacs manual, in particular its Other Repeating Search subnode.  The
other commands documented in the same node are occur,
list-matching-lines, multi-occur, multi-occur-in-matching-buffers,
how-many, and flush-lines.

Perhaps a "filtering buffer contents" link in the Concept Index to the
Other Repeating Search node would be useful -- would that have helped
you find keep-lines?

-- 
Kevin Rodgers
Denver, Colorado, USA

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: file filtering
       [not found]             ` <mailman.3999.1170609530.2155.help-gnu-emacs@gnu.org>
@ 2007-02-14 12:19               ` Peter Tury
  0 siblings, 0 replies; 12+ messages in thread
From: Peter Tury @ 2007-02-14 12:19 UTC (permalink / raw)
  To: help-gnu-emacs

Kevin Rodgers <kevin.d.rodgers@gmail.com> writes:

> Peter Tury wrote:
>>
>> OK, I wasn't clear enough: I can find it _now_ = when I know its name
>> (C-h f is easy enough). But I couldn't find it when I just browsed the
>> info (C-h i): neither in Emacs nor in Elisp parts. It is neither
>> mentioned in thier indices.

Now I see indices are built up in a bit different manner than I thought...

>> Now I looked into replace.el and saw it _is_ part of GNU Emacs. So its
>> documentation should be reachable from C-h i, shouldn't it? Or: how
>> can I find a function if I don't know its name, just the functionality
>> I need? (C-h a seems to be a bit weak here for me.)
>
> keep-lines is documented under the Searching and Replacement node of the
> Emacs manual, in particular its Other Repeating Search subnode.  The
> other commands documented in the same node are occur,
> list-matching-lines, multi-occur, multi-occur-in-matching-buffers,
> how-many, and flush-lines.

Thanks!

> Perhaps a "filtering buffer contents" link in the Concept Index to the
> Other Repeating Search node would be useful -- would that have helped
> you find keep-lines?

Yes, I think so. Especially since nothing is in the index for "filter". But this is me...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: file filtering
       [not found] ` <mailman.3856.1170309361.2155.help-gnu-emacs@gnu.org>
@ 2007-02-14 12:49   ` Peter Tury
  0 siblings, 0 replies; 12+ messages in thread
From: Peter Tury @ 2007-02-14 12:49 UTC (permalink / raw)
  To: help-gnu-emacs

Kevin Rodgers <kevin.d....> writes:

> (find-file FILENAME)
> (shell-command-on-region (point-min) (point-max)
> 			 (format "sed -n s/%s/%s/p"
> 				 (shell-quote-argument REGULAR_EXPRESSION)
> 				 (shell-quote-argument REPLACEMENT))
> 			 nil t)
> (save-buffer) ; or (write-file NEW_FILENAME)

> (shell-command (format "grep %s %s"
> 			(shell-quote-argument FILENAME)
> 			(shell-quote-argument REGULAR_EXPRESSION))
> 		t)

> Use an external command like grep to select the desired lines.  But
> since you need to do that, you may as well use an external command like
> sed to do the whole replacement -- otherwise, you're matching the
> regular expression twice, once outside emacs to select the lines to
> insert into the buffer and once inside emacs to find the text to
> replace.

> (with-temp-file NEW_FILENAME
>   (shell-command (format "sed -n s/%s/%s/p %s"
> 			 (shell-quote-argument REGULAR_EXPRESSION)
> 			 (shell-quote-argument REPLACEMENT)
> 			 (shell-quote-argument FILENAME))
> 		 t		       ; output-buffer: (current-buffer)
> 		 nil))

Thanks for your detailed answer! They are nice and I learned from them
a lot.

However I wrote something similar (though not such elegant) even
before I wrote my first question in this thread -> my initial problem
was my solution used external tools (namely: grep) to filter out
unnecessary lines and I didn't know if it is possible to get rid of
any external tool and beeing efficient at the same time. Now I see
this is not really possible.

Probably this is not a real problem, since Emacs is an interactive
editor, not a performace-tuned "offline" file manipulator. (But if I
think "Emacs is more than an Editor and less than an OS -- or vice
versa", then I am not totally convinced ;-)

It is also true that pipeing the modified lines from one file (opened
for reading) directly into another file (opened for writing) also have
its drawbacks, even if it is probably the most efficient
solution. (Efficiency is not everything.)

----------

  By the way. Efficiancy. I've seen an article
  (http://swtch.com/~rsc/regexp/regexp1.html) what says most
  contemporary tools (especially what are orginated from the "unix
  era"?) uses a rather unefficient regexp handling method. Do you know
  if Emacs falls into this category?

Thanks,
P

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2007-02-14 12:49 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-01-30 15:34 file filtering Peter Tury
2007-01-30 16:58 ` HS
2007-01-31  8:05   ` Peter Tury
2007-01-31 12:50     ` HS
2007-01-31 13:34       ` Peter Tury
2007-01-31 14:51         ` HS
2007-02-01  7:47           ` Peter Tury
2007-02-01 14:26             ` Mathias Dahl
2007-02-04 17:18             ` Kevin Rodgers
     [not found]             ` <mailman.3999.1170609530.2155.help-gnu-emacs@gnu.org>
2007-02-14 12:19               ` Peter Tury
2007-02-01  5:55 ` Kevin Rodgers
     [not found] ` <mailman.3856.1170309361.2155.help-gnu-emacs@gnu.org>
2007-02-14 12:49   ` Peter Tury

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.