A very simple question on SED or AWK for a GURU, possibly a lisp script or emacs batch processing of many files

unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed

* A very simple question on SED or AWK for a GURU, possibly a lisp script or emacs batch processing of many files
@ 2003-01-14  0:03 gnuist006
  2003-01-14  3:46 ` Christopher J. White
                   ` (5 more replies)
  0 siblings, 6 replies; 10+ messages in thread
From: gnuist006 @ 2003-01-14  0:03 UTC (permalink / raw)


Here is the type of lines I have in a file:

junk  label="junk1/junk2/junk3/.../junkn/" more junk

I want to find every line that has

label="..."

pattern

and then I want to replace every / by _ inside the
quotes.

For the purposes of continuity, I want the script to look like this:

cat file |
sed commands |
awk commands |

etc.

I do not care if it all sed or awk or in what order.

Note that the junk is usually alphanumeric with dots etc but no slashes.
So it can be represented by [^/]* if / is considered non-special otherwise
escape it. There may be other /'s on the line outside the pattern 
the double quotes starting with label= and they must not be changed.

This problem can be described as making changes to a pattern matching a
regexp. It is not the problem of making changes to a pattern in the whole
line contaning a regexp. That is what is making it difficult for me.
The other reason is that I do not have a definite number of slashes in
the pattern in the single quote otherwise I would use the tagged expression.

I hope you to enjoy this problem. I put on the net only after wrestling it
with some.

gnuist.

BTW, I can do this kind of operation inside emacs. However, I do not know
how to write an lisp script. Or how to automatically load 100s or files
one after another in emacs, run a function on them and then store their
output to another file. Then close these buffers and go to the next file.
I would like as many approaches to this problem as possible, ie
sed
awk
lisp script
lisp in emacs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: A very simple question on SED or AWK for a GURU, possibly a lisp script or emacs batch processing of many files
  2003-01-14  0:03 A very simple question on SED or AWK for a GURU, possibly a lisp script or emacs batch processing of many files gnuist006
@ 2003-01-14  3:46 ` Christopher J. White
  2003-01-14  7:13 ` Friedrich Dominicus
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Christopher J. White @ 2003-01-14  3:46 UTC (permalink / raw)


Here's my quick perl solution...

Save as "junk.pl", then run as "junk.pl <file1> <file2> <file3> ... ". 
Each file is renamed to <file>.old, output <file>.  For each

Note, this assumes that there are no double quotes in the label
expression to be manipulated.  I can't see how you'd determine the
end of the expression if this isn't true (unless double quotes
might be quoted with a backslash or something).  

#!/usr/bin/perl

foreach my $outfile (@ARGV) 
{
    my $infile = $outfile . ".old";

    print "infile: $infile\n";
    print "outfile: $outfile\n";

    my $line, $s1, $s2, $s3;

    rename $outfile, $infile;

    open INFILE, "<$infile";
    open OUTFILE, ">$outfile";

    while ($line = <INFILE>) {
        if ($line =~ /^(.*)label=\"([^\"]*)\"(.*)$/)
        {
            $s1 = $1; $s2 = $2; $s3 = $3;
            $s2 =~ s/\//_/g;
            print OUTFILE $s1 . "label=\"" . $s2 . "\"" . $s3 . "\n";
        }
        else
        {
            print OUTFILE $line;
        }
    }

    close INFILE;
    close OUTFILE;
}

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: A very simple question on SED or AWK for a GURU, possibly a lisp script or emacs batch processing of many files
  2003-01-14  0:03 A very simple question on SED or AWK for a GURU, possibly a lisp script or emacs batch processing of many files gnuist006
  2003-01-14  3:46 ` Christopher J. White
@ 2003-01-14  7:13 ` Friedrich Dominicus
  2003-01-14  8:42 ` gnuist006
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Friedrich Dominicus @ 2003-01-14  7:13 UTC (permalink / raw)


gnuist006@hotmail.com (gnuist006) writes:

> Here is the type of lines I have in a file:
> 
> junk  label="junk1/junk2/junk3/.../junkn/" more junk
> 
> I want to find every line that has
> 
> label="..."
> 
> pattern
> 
> and then I want to replace every / by _ inside the
> quotes.
> 
> For the purposes of continuity, I want the script to look like this:
> 
> cat file |
> sed commands |
> awk commands |
you do not nead cat for just one file. 

> 
> etc.
> 
> I do not care if it all sed or awk or in what order.
Well you posted to c.l.lisp here's a Lisp solutoin for one file. It is
left to you to expand to more files (which is not too difficult)
(defun q-2003-01-14 (in-file)
  (let ((out-file (concatenate 'string (subseq in-file 0 
                                               (position #\. in-file :from-end t)) ".out")))
    (with-open-file (out out-file :direction :output 
                         :if-does-not-exist :create
                         :if-exists :supersede)
      (clawk:for-file-lines (in-file)
        (when (clawk:match clawk:$0 "label=\"\(.*)\"")
          (let* ((submatch (aref clawk:*regs* 1))
                 (substr (subseq clawk:$0 (car submatch) (cdr submatch))))
            (replace clawk:$0 
                     (pregexp-replace* "/" substr "_")
                     :start1 (car submatch)
                     :end1 (cdr submatch))))
        (princ clawk:$0 out)
        (terpri out) 
        (values)))))
              

With this file 
junk label="junk1/junk2/junk3/junk4" other stuff
other junk label="j1/j2/j3" other stuff and much more
nothing
label="/t1/t2/t3"

I got this result
junk label="junk1_junk2_junk3_junk4" other stuff
other junk label="j1_j2_j3" other stuff and much more
nothing
label="_t1_t2_t3"

Quite nice scripting with Common Lisp IMHO :)

Regards
Friedrich

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: A very simple question on SED or AWK for a GURU, possibly a lisp script or emacs batch processing of many files
  2003-01-14  0:03 A very simple question on SED or AWK for a GURU, possibly a lisp script or emacs batch processing of many files gnuist006
  2003-01-14  3:46 ` Christopher J. White
  2003-01-14  7:13 ` Friedrich Dominicus
@ 2003-01-14  8:42 ` gnuist006
  2003-01-14 10:23   ` Friedrich Dominicus
  2003-01-14 22:10   ` Wayne Throop
  2003-01-14 18:18 ` ericjb
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 10+ messages in thread
From: gnuist006 @ 2003-01-14  8:42 UTC (permalink / raw)


The main group for followup to this is: comp.unix.shell
or gnu.emacs.help depending on the type of solution.
I hope that I have the relevant groups for cross-posting.
----

Even though the problem posed in this thread is still

*** UNSOLVED ***

let me extend my gratitude to Mr Christofer and Friedrich
for their kind attempts to answer this. I hope for some
more help tonight.

Mr Friedrich's reply uses perl. I am not familiar with
this language. Also I prefer not to generate very many 
intermediate files since I have to process a large number
of them. I also want to use bash/sed/awk instead of
perl.

I can write a bash wrapper loop but what I do not know
here is how to implement the core logic of replacing an
indefinite number of forward-slashes within a pattern
of interest.



On the other hand for a lisp based solution 
I CAN write a macro or a lisp function to do the
core logic in lisp inside emacs by myself using narrow 
and widen or transient mode. But here what I do not know 
is how to load one file after another and then save it
to a new name and close that buffer. Please just show me
how to do a bunch of files in this way. I can generate the
file names like this in bash:

for i in `du -a directory | grep file.txt | sed to remove some junk from du`; do

core logic

done


Thanks a lot and do not forget to have fun working on
this problem. It is a little out of the way.

Gnuist


BTW there is a disjoint thread on this subject in comp.unix.shell
and there have been no useful replies as you can see.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: A very simple question on SED or AWK for a GURU, possibly a lisp script or emacs batch processing of many files
  2003-01-14  8:42 ` gnuist006
@ 2003-01-14 10:23   ` Friedrich Dominicus
  2003-01-14 22:10   ` Wayne Throop
  1 sibling, 0 replies; 10+ messages in thread
From: Friedrich Dominicus @ 2003-01-14 10:23 UTC (permalink / raw)


gnuist006@hotmail.com (gnuist006) writes:

> 
> Mr Friedrich's reply uses perl. 
Definitly not. It's Common Lisp.


> 
> On the other hand for a lisp based solution 
> I CAN write a macro or a lisp function to do the
> core logic in lisp inside emacs by myself using narrow 
> and widen or transient mode. But here what I do not know 
> is how to load one file after another and then save it
> to a new name and close that buffer. Please just show me
> how to do a bunch of files in this way. I can generate the
> file names like this in bash:
Well extending my solution to more files is easy

(mapc '#(lambda (file) (q-2003-01-14 file) 
          ;; rename the generated file if needed)
        (directory "pattern"))

That's all

Doing that all in Emacs Lisp isn't much more difficult.


> 
> for i in `du -a directory | grep file.txt | sed to remove some junk
> from du`; do
For getting a file listing in Common Lisp use directory in Emacs Lisp
it's directory-files. 

But I *strongly* sugggest you post there where you expect an
answer. Is it a shell problem use some .shell group if it's Emacs Lisp
use some emacs Newgroup and if you want Common Lisp post here. 

Friedrich

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: A very simple question on SED or AWK for a GURU, possibly a lisp script or emacs batch processing of many files
  2003-01-14  8:42 ` gnuist006
  2003-01-14 10:23   ` Friedrich Dominicus
@ 2003-01-14 22:10   ` Wayne Throop
  1 sibling, 0 replies; 10+ messages in thread
From: Wayne Throop @ 2003-01-14 22:10 UTC (permalink / raw)


: gnuist006@hotmail.com (gnuist006)
: Even though the problem posed in this thread is still
: 
: *** UNSOLVED ***
: 
: let me extend my gratitude to Mr Christofer and Friedrich for their
: kind attempts to answer this.  I hope for some more help tonight. 

    perl -pe 's-\b(label="[^"]*")- ((($x=$1) =~ s./._.g),$x) -ge'

: I can write a bash wrapper loop but what I do not know here is how to
: implement the core logic of replacing an indefinite number of
: forward-slashes within a pattern of interest. 

    s/\//_/g

: I also want to use bash/sed/awk instead of perl. 

Why?

Oh, well.  You'd think this bit of awk-wardness would work
by analogy with the perl above	`

    awk '{gsub("label=\"[^\"]*\"", gensub("\/","_","g","&"));print}'

but it doesn't.  Hrm.  Maybe

    awk '{
        s=$0;
        m=match(s,"label=\"[^\"]*\"");
        if(m){
            pre =substr(s,1,RSTART-1);
            inf =substr(s,RSTART,RLENGTH);
            post=substr(s,RSTART+RLENGTH);
            gsub("/","_",inf);
            s= pre inf post;
        }
        print s;
    }'

Yeah, that works, at least on the cases I tested.  The perl is a tiny
bit cleaner, though; the perl version handles multiple labels on a line,
and the \b ensures the "label" isn't part of a larger word.  Both of
which are a bit tricky to do in awk.  Doable, just not as easy. 

Plus which, the perl looks more like line noise,
which is cool, and promotes job security.


Wayne Throop   throopw@sheol.org   http://sheol.org/throopw

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: A very simple question on SED or AWK for a GURU, possibly a lisp script or emacs batch processing of many files
  2003-01-14  0:03 A very simple question on SED or AWK for a GURU, possibly a lisp script or emacs batch processing of many files gnuist006
                   ` (2 preceding siblings ...)
  2003-01-14  8:42 ` gnuist006
@ 2003-01-14 18:18 ` ericjb
  2003-01-14 19:34   ` Stefan Monnier <foo@acm.com>
  2003-01-14 22:54 ` Alan Mackenzie
  2003-01-14 23:19 ` Kaz Kylheku
  5 siblings, 1 reply; 10+ messages in thread
From: ericjb @ 2003-01-14 18:18 UTC (permalink / raw)


gnuist006@hotmail.com (gnuist006) writes:

> Here is the type of lines I have in a file:
> 
> junk  label="junk1/junk2/junk3/.../junkn/" more junk
> 
> I want to find every line that has
> 
> label="..."
> 
> pattern
> 
> and then I want to replace every / by _ inside the
> quotes.
> 
> For the purposes of continuity, I want the script to look like this:
> 
> cat file |
> sed commands |
> awk commands |
> 
> etc.
> 
> I do not care if it all sed or awk or in what order.

Can we assume that the only double quotes are those surrounding the
label string?  If so, you can use that to split the input file into
three files, something like:

cat file | cut -f1 -d\" > file1
cat file | cut -f2 -d\" > file2
cat file | cut -f3 -d\" > file3
sed 's/\//_/g' < file2 > file2.new
paste -d\" file1 file2.new file3 > outputfile

I didn't test this, but it might work...

-- 
Eric Backus
R&D Design Engineer
Agilent Technologies, Inc.
425-335-2495 Tel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: A very simple question on SED or AWK for a GURU, possibly a lisp script or emacs batch processing of many files
  2003-01-14 18:18 ` ericjb
@ 2003-01-14 19:34   ` Stefan Monnier <foo@acm.com>
  0 siblings, 0 replies; 10+ messages in thread
From: Stefan Monnier <foo@acm.com> @ 2003-01-14 19:34 UTC (permalink / raw)


>>>>> "ericjb" == ericjb  <ericjb@lksejb.lks.agilent.com> writes:
>> Here is the type of lines I have in a file:
>> junk  label="junk1/junk2/junk3/.../junkn/" more junk
>> I want to find every line that has
>> label="..."
>> pattern
>> and then I want to replace every / by _ inside the
>> quotes.

sed '/label=".*"/s|/|_/'

>> cat file |
>> sed commands |
>> awk commands |

I don't know if the Useless Use of Cat Award is still up for grabs,
so I'd recommend you don't bother running for it and just use
redirection instead:

  sed commands <file |
  awk commands |


-- Stefan


PS: This has nothing to do with Lisp, Emacs, or GNU, so I redirected
    the discussion to comp.unix.shell.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: A very simple question on SED or AWK for a GURU, possibly a lisp script or emacs batch processing of many files
  2003-01-14  0:03 A very simple question on SED or AWK for a GURU, possibly a lisp script or emacs batch processing of many files gnuist006
                   ` (3 preceding siblings ...)
  2003-01-14 18:18 ` ericjb
@ 2003-01-14 22:54 ` Alan Mackenzie
  2003-01-14 23:19 ` Kaz Kylheku
  5 siblings, 0 replies; 10+ messages in thread
From: Alan Mackenzie @ 2003-01-14 22:54 UTC (permalink / raw)


gnuist006 <gnuist006@hotmail.com> wrote on 13 Jan 2003 16:03:15 -0800:
> Here is the type of lines I have in a file:

> junk  label="junk1/junk2/junk3/.../junkn/" more junk

> I want to find every line that has

> label="..."

> pattern

> and then I want to replace every / by _ inside the
> quotes.

Sounds like awk could be your tool of choice.  Using gawk:

cat file |
gawk 'BEGIN {FS = "\""; OFS = "\""}; /[a-zA-Z_0-9]+=/ {gsub("/", "_", $2)}; {print}'

(or something very like it) will do the job.  Note:  I haven't tested
this.  The solution assumes that the "junk" at the beginning of each line
doesn't contain any "s.

I would guess that alternative solutions, whether in Emacs lisp or
perl or whatever would be much longer than this one-liner.  The
newsgroup comp.lang.awk might be a better place to ask such questions.
Alternatively, email me if the above gawk program doesn't "quite" work,
or you want me to explain it.

> gnuist.

-- 
Alan Mackenzie (Munich, Germany)
Email: aacm@muuc.dee; to decode, wherever there is a repeated letter
(like "aa"), remove half of them (leaving, say, "a").

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: A very simple question on SED or AWK for a GURU, possibly a lisp script or emacs batch processing of many files
  2003-01-14  0:03 A very simple question on SED or AWK for a GURU, possibly a lisp script or emacs batch processing of many files gnuist006
                   ` (4 preceding siblings ...)
  2003-01-14 22:54 ` Alan Mackenzie
@ 2003-01-14 23:19 ` Kaz Kylheku
  5 siblings, 0 replies; 10+ messages in thread
From: Kaz Kylheku @ 2003-01-14 23:19 UTC (permalink / raw)


gnuist006@hotmail.com (gnuist006) wrote in message news:<b00bb831.0301131603.34e9704c@posting.google.com>...
> cat file |

Doh!

> BTW, I can do this kind of operation inside emacs. However, I do not know
> how to write an lisp script.

In an October 2002 thread you (as gnuist007) started under the subject
line ``On refining regexp by adding exceptions systematically'' and in
the the November thread ``Lambda Calculus and it [sic] relation to
LISP'' you received some comments regarding your use of newsgroups. 
This would be a good time to re-read some of them.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2003-01-14 23:19 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-01-14  0:03 A very simple question on SED or AWK for a GURU, possibly a lisp script or emacs batch processing of many files gnuist006
2003-01-14  3:46 ` Christopher J. White
2003-01-14  7:13 ` Friedrich Dominicus
2003-01-14  8:42 ` gnuist006
2003-01-14 10:23   ` Friedrich Dominicus
2003-01-14 22:10   ` Wayne Throop
2003-01-14 18:18 ` ericjb
2003-01-14 19:34   ` Stefan Monnier <foo@acm.com>
2003-01-14 22:54 ` Alan Mackenzie
2003-01-14 23:19 ` Kaz Kylheku

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).