* uniq without sort <-------------- GURU NEEDED @ 2008-01-25 2:45 gnuist006 2008-01-25 7:56 ` Thierry Volpiatto ` (3 more replies) 0 siblings, 4 replies; 5+ messages in thread From: gnuist006 @ 2008-01-25 2:45 UTC (permalink / raw) To: help-gnu-emacs This is a tough problem, and needs a guru. I know it is very easy to find uniq or non-uniq lines if you scramble all of them and sort them. Its trivially echo -e "a\nc\nd\nb\nc\nd" | sort | uniq $ echo -e "a\nc\nd\nb\nc\nd" a c d b c d $ echo -e "a\nc\nd\nb\nc\nd"|sort|uniq a b c d So it is TRIVIAL with sort. I want uniq without sorting the initial order. The algorithm is this. For every line, look above if there is another line like it. If so, then ignore it. If not, then output it. I am sure, I can spend some time to write this in C. But what is the solution using shell ? This way I can get an output that preserves the order of first occurrence. It is needed in many problems. Thanks to the star who can help gnuist ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: uniq without sort <-------------- GURU NEEDED 2008-01-25 2:45 uniq without sort <-------------- GURU NEEDED gnuist006 @ 2008-01-25 7:56 ` Thierry Volpiatto 2008-01-25 9:11 ` Peter Dyballa ` (2 subsequent siblings) 3 siblings, 0 replies; 5+ messages in thread From: Thierry Volpiatto @ 2008-01-25 7:56 UTC (permalink / raw) To: gnuist006; +Cc: help-gnu-emacs gnuist006@gmail.com writes: > This is a tough problem, and needs a guru. > > I know it is very easy to find uniq or non-uniq lines if you scramble > all of them and sort them. Its trivially > > echo -e "a\nc\nd\nb\nc\nd" | sort | uniq > > $ echo -e "a\nc\nd\nb\nc\nd" > a > c > d > b > c > d > > $ echo -e "a\nc\nd\nb\nc\nd"|sort|uniq > a > b > c > d > > > So it is TRIVIAL with sort. > > I want uniq without sorting the initial order. > > The algorithm is this. For every line, look above if there is another > line like it. If so, then ignore it. If not, then output it. I am > sure, I can spend some time to write this in C. But what is the > solution using shell ? This way I can get an output that preserves the > order of first occurrence. It is needed in many problems. Here in python but the same can be done in lisp or shell In [13]: B = ["a", "c", "d", "b", "e", "a", "d", "e"] In [14]: A = [] In [15]: for i in B: ....: if i not in A: A.append(i) In [16]: A Out[16]: ['a', 'c', 'd', 'b', 'e'] -- A + Thierry Pub key: http://pgp.mit.edu ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: uniq without sort <-------------- GURU NEEDED 2008-01-25 2:45 uniq without sort <-------------- GURU NEEDED gnuist006 2008-01-25 7:56 ` Thierry Volpiatto @ 2008-01-25 9:11 ` Peter Dyballa [not found] ` <slrnfpki57.7nj.andrews@sdf.lonestar.org> 2008-01-29 13:16 ` Michele Dondi 3 siblings, 0 replies; 5+ messages in thread From: Peter Dyballa @ 2008-01-25 9:11 UTC (permalink / raw) To: gnuist006; +Cc: help-gnu-emacs Am 25.01.2008 um 03:45 schrieb gnuist006@gmail.com: > The algorithm is this. For every line, look above if there is another > line like it. If so, then ignore it. If not, then output it. I am > sure, I can spend some time to write this in C. But what is the > solution using shell ? Put the output to make unique into an array. Mark a duplicate with something invalid. Filter the array that all invalid entries are eliminated. -- Greetings Pete To drink without thirst and to make love all the time, madam, it is only these which distinguish us from the other beasts. – Beaumarchais ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <slrnfpki57.7nj.andrews@sdf.lonestar.org>]
[parent not found: <slrnfpkj9c.18qn.read_the_sig@mantell0.local>]
[parent not found: <7d849d0c-9d8e-44e9-b461-38657fae0a7d@b2g2000hsg.googlegroups.com>]
[parent not found: <5462c3ef-cb53-40d8-8a96-bbf624408300@v4g2000hsf.googlegroups.com>]
* Re: uniq without sort <-------------- GURU NEEDED [not found] ` <5462c3ef-cb53-40d8-8a96-bbf624408300@v4g2000hsf.googlegroups.com> @ 2008-01-28 16:51 ` thermate 0 siblings, 0 replies; 5+ messages in thread From: thermate @ 2008-01-28 16:51 UTC (permalink / raw) To: help-gnu-emacs On Jan 26, 6:35 pm, gnuist...@gmail.com wrote: > cat input|awk '!_[$0]++' <---- I am interested in understanding > this and other one liners. I show you equivalences line by line with reason for each equivalence in comment uniq without sort - a one liner w/o any pipes - based on associative array or symbol-value-table ----------------- NOTE: In tcsh each instance of NOT or ! must be replaced by \! ie, escaped. # echo -e "a\nc\nd\nb\nc\nd\nc" | # the input data awk ' ! count [ $0 ] ++ ' <=> # print $0 is the default action awk '!_[$0]++' <=> # _ is cryptic name of associativ array awk ' !_[$0]++ { print $0 } ' <=> # pattern action or true action awk ' /.*/ { if ( !_[$0]++ ) { print $0 } } ' <=> # /.*/ is any pattern, but not /*/ awk ' /.*/ { if ( !_[$0]++ != 0 ) { print $0 } } ' <=> # like C zero is the only false in awk awk ' /.*/ { if ( _[$0]++ == 0 ) { print $0 } } ' <=> ## NOTE all /.*/ can be omitted everywhere awk ' /.*/ { if ( ++_[$0] == 1 ) { print $0 } } ' <=> awk ' { _[$0]++ ; if ( _[$0] == 1 ){ print $0 } } ' <=> # omitting default pattern /.*/ awk ' /.*/ { a[$0]++ ; if ( a[$0] == 1 ){ print $0 } } ' # associative array a[index] where # index is the line and value is the # count. only if count==1 then print. perl -ne ' if ( ! $count{ $_ } ++ ){ print $_ } ' # perl has $count{} and $_ and does not # assume pattern, so no outer {} Now some lesson on history: First the speech by Mr Benjamin H Freedman at http://iamthewitness.com ??? Understanding the MOTIVE FORCE of World History from horse's mouth itself - Mr. Benjamin H Friedman was a GENIUS ??? Full Article: http://iamthewitness.com/FreedmanFactsAreFacts.html <-------- KEY DOCUMENT Steamy Excerpts: Will you be patient with me while I review here as briefly as I can the history of that political emergence and disappearance of a nation from the pages of history? In the year 1948 in the Pentagon in Washington I addressed a large assembly of the highest ranking officers of the United States Army principally in the G2 branch of Military Intelligence on the highly explosive geopolitical situation in eastern Europe and the Middle East. Then as now that area of the world was a potential threat to the peace of the world and to the security of this nation I explained to them fully the origin of the Khazars and Khazar Kingdom. I felt then as I feel now that without a clear and comprehensive knowledge of that subject it is not possible to understand or to evaluate properly what has been taking place in the world since 1917, the year of the Bolshevik revolution in Russia. It is the "key" to that problem. Upon the conclusion of my talk a very alert Lieutenant Colonel present at the meeting informed me that he was the head of the history department of one of the largest and highest scholastic rated institutions of higher education in the United States. He had taught history there for 16 years. He had recently been called back to Washington for further military service. To my astonishment he informed me that he had never in all his career as a history teachers or otherwise heard the word "khazar" before he heard me mention it there. That must give you some idea, my dear Dr. Goldstein, of how successful that mysterious secret power was with their plot to "block out" the origin and the history of the Khazars and Khazar Kingdom in order to conceal from the world and particularly Christians the true origin and the history of the so-called or self- styled "Jews" in eastern Europe. FBI bastards, where is the anthrax mailer ????? Using full names and fake telephone nos or addressses to get our trust, names from France, Germany, Italy, finland and other countries, yank bastards from the 911 controlled demolition group, which spread lies and disinformation right on and after 911 is doing their evil work of sabotaging useful discussions on the internet. These corporatist evil ones believe in DIVIDING us. They are EVIL BASTARDS. FBI never caught the anthrax mailer with fake letter and military grade anthrax because that was one of these yank bastards they were afraid to catch. please click on my profile under google groups to see videos about these yank bastards. these bastards use multiple nicks to deceive you. subtle derailment of threads, casting aspersions is their methodology. watch alex jones "terror storm" and other videos to learn what these evil bastards are upto and how they have perfected psychological techniques to manipulate you. On newsgroups there one and only one goal is to divide people and make them slave to corporations. Subject: Re: RACIST YANK BASTARDS FROM 911 CONTROLLED DEMOLITION GROUP SABOTAGING INFORMATIVE THREADS subtle derailment of threads, by casting aspersions is their methodology, using multiple nicks with fake identities, using sophisticated software and a network of proxies and remailers including TOR is their methodology. please watch alex jones video terror storm and see how these bastards using techniques by Edward Bernays who was Freud's nephew. subtle derailment of threads, by casting aspersions is their methodology subtle derailment of threads, by casting aspersions is their methodology subtle derailment of threads, by casting aspersions is their methodology subtle derailment of threads, by casting aspersions is their methodology Monica Lewdinsky Valery Plame Wilson <---- michelle blonde evil yank whose goal was various sabotages Newton Gingrich <---- BiBBle waving ADULTERER, yank bastard What about the CHILD MOLESTERS ? What about the one who goes in MINNESOTA tapping adjacent bathroom cells for gay sex ? :))))) ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: uniq without sort <-------------- GURU NEEDED 2008-01-25 2:45 uniq without sort <-------------- GURU NEEDED gnuist006 ` (2 preceding siblings ...) [not found] ` <slrnfpki57.7nj.andrews@sdf.lonestar.org> @ 2008-01-29 13:16 ` Michele Dondi 3 siblings, 0 replies; 5+ messages in thread From: Michele Dondi @ 2008-01-29 13:16 UTC (permalink / raw) To: help-gnu-emacs On Thu, 24 Jan 2008 18:45:24 -0800 (PST), gnuist006@gmail.com wrote: >I want uniq without sorting the initial order. > >The algorithm is this. For every line, look above if there is another >line like it. If so, then ignore it. If not, then output it. I am >sure, I can spend some time to write this in C. But what is the >solution using shell ? This way I can get an output that preserves the >order of first occurrence. It is needed in many problems. In shell I don't know. In Perl it's well known to be as trivial as perl -ne 'print unless $saw{$_}++' file (And it's not even the most golfed down solution!) Michele -- Se, nella notte in cui concepi' il duce, Donna Rosa, toccata da divina luce, avesse dato al fabbro predappiano invece della fica il deretano, l'avrebbe presa in culo quella sera Rosa sola e non l'Italia intera. - Poesia antifascista ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2008-01-29 13:16 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-01-25 2:45 uniq without sort <-------------- GURU NEEDED gnuist006 2008-01-25 7:56 ` Thierry Volpiatto 2008-01-25 9:11 ` Peter Dyballa [not found] ` <slrnfpki57.7nj.andrews@sdf.lonestar.org> [not found] ` <slrnfpkj9c.18qn.read_the_sig@mantell0.local> [not found] ` <7d849d0c-9d8e-44e9-b461-38657fae0a7d@b2g2000hsg.googlegroups.com> [not found] ` <5462c3ef-cb53-40d8-8a96-bbf624408300@v4g2000hsf.googlegroups.com> 2008-01-28 16:51 ` thermate 2008-01-29 13:16 ` Michele Dondi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).