collecting *unique* in loop

Discussion of Common Lisp
Post Reply
imba
Posts: 35
Joined: Sun Nov 21, 2010 2:13 pm

collecting *unique* in loop

Post by imba » Thu Nov 25, 2010 9:26 am

Hi.

In the following situation

Code: Select all

  (loop for a in l
        when cond
        collecting a)
How do I insure that every a is collected only *once*. Note: I don't want to apply something as "unique" to the result, but not to collect doublettes in the first time.

gugamilare
Posts: 406
Joined: Sat Mar 07, 2009 6:17 pm
Location: Brazil
Contact:

Re: collecting *unique* in loop

Post by gugamilare » Thu Nov 25, 2010 10:44 am

Use the function remove-duplicates.

imba
Posts: 35
Joined: Sun Nov 21, 2010 2:13 pm

Re: collecting *unique* in loop

Post by imba » Thu Nov 25, 2010 11:10 am

That isn't what I want. I want not to collect then in the first place.

Kompottkin
Posts: 94
Joined: Mon Jul 21, 2008 7:26 am
Location: München, Germany
Contact:

Re: collecting *unique* in loop

Post by Kompottkin » Thu Nov 25, 2010 12:18 pm

In that case, you will have to remember the set of collected items somehow. Like this, for example:

Code: Select all

CL-USER> (loop with collected-items = (fset:set)
               for x in '(1 2 3 1 2 4)
               unless (fset:contains? collected-items x)
                 do (fset:adjoinf collected-items x)
                 and collect x)
(1 2 3 4)
(This example makes use of the FSet library.)

Warren Wilkinson
Posts: 117
Joined: Tue Aug 10, 2010 11:24 pm
Location: Calgary, Alberta
Contact:

Re: collecting *unique* in loop

Post by Warren Wilkinson » Thu Nov 25, 2010 4:06 pm

Code: Select all

(loop with result = nil
        for a in '(a b a c a d a b a c a d)
        do (pushnew a result)
        finally (return (nreverse result)))

;; result ==> (a b c d)
If you don't care about order, you don't need nreverse at the end.
Need an online wiki database? My Lisp startup http://www.formlis.com combines a wiki with forms and reports.

Kompottkin
Posts: 94
Joined: Mon Jul 21, 2008 7:26 am
Location: München, Germany
Contact:

Re: collecting *unique* in loop

Post by Kompottkin » Fri Nov 26, 2010 7:24 am

Warren Wilkinson wrote:

Code: Select all

(loop with result = nil
      for a in '(a b a c a d a b a c a d)
      do (pushnew a result)
      finally (return (nreverse result)))
Yes, that's a simpler approach. It does have run time quadratic in the number of distinct items, though. This may or may not be an issue, of course, depending on the situation.

imba
Posts: 35
Joined: Sun Nov 21, 2010 2:13 pm

Re: collecting *unique* in loop

Post by imba » Fri Nov 26, 2010 10:47 am

Thank you very much! So what is faster: remove-duplicates or Warren Wilkinson's approach?

Kompottkin
Posts: 94
Joined: Mon Jul 21, 2008 7:26 am
Location: München, Germany
Contact:

Re: collecting *unique* in loop

Post by Kompottkin » Fri Nov 26, 2010 3:31 pm

imba wrote:So what is faster: remove-duplicates or Warren Wilkinson's approach?
Some naive testing suggests that REMOVE-DUPLICATES is significantly faster (0.03s vs. 11s for 100,000 items generated by RANDOM). In fact, it even seems to handle 1,000,000 items much better than I expected. One has to be very careful with such microbenchmarks, though, especially on SBCL, whose compiler is wickedly smart.

What's really weird, though, is that DELETE-DUPLICATES is slower than REMOVE-DUPLICATES by a couple of orders of magnitude. A bug in this version of SBCL maybe?

It probably won't make much of a difference if your input list is fewer than a couple of thousand items long, anyway. You know what they say about premature optimisation...

Code: Select all

CL-USER> (time (remove-duplicates (loop for x from 1 to 100000 collect (random 1000000000))))
Evaluation took:
  0.027 seconds of real time
  0.027461 seconds of total run time (0.025240 user, 0.002221 system)
  100.00% CPU
  ...
(960529751 650859964 696275401 493130535 258563691 736847323 858363769 45617352
 165142801 383460513 ...)

Code: Select all

CL-USER> (time
          (loop with collected-items = (make-hash-table)
                for i from 1 to 100000
                for x = (random 1000000000)
                unless (gethash x collected-items nil)
                  do (setf (gethash x collected-items) t)
                  and collect x))
Evaluation took:
  0.049 seconds of real time
  0.047696 seconds of total run time (0.042030 user, 0.005666 system)
  [ Run times consist of 0.010 seconds GC time, and 0.038 seconds non-GC time. ]
  97.96% CPU
  ...
(238510722 35471481 364524553 593573298 940395904 417695290 814711312 20768734
 121981963 754212048 ...)

Code: Select all

CL-USER> (time
          (loop with collected-items = (fset:set)
                for i from 1 to 100000
                for x = (random 1000000000)
                unless (fset:contains? collected-items x)
                  do (fset:adjoinf collected-items x)
                  and collect x))
Evaluation took:
  0.491 seconds of real time
  0.478523 seconds of total run time (0.424767 user, 0.053756 system)
  [ Run times consist of 0.170 seconds GC time, and 0.309 seconds non-GC time. ]
  97.56% CPU
  ...
(546310694 336599977 128592735 175916988 847899506 834855340 777332112
 579897313 166227745 363405174 ...)

Code: Select all

CL-USER> (time
          (loop with result = nil
                for x from 1 to 100000
                do (pushnew (random 1000000000) result)
                finally (return result)))
Evaluation took:
  11.351 seconds of real time
  11.245011 seconds of total run time (11.209439 user, 0.035572 system)
  99.07% CPU
  ...
(129996452 83142730 830379502 254691783 321222762 91109236 181730054 188146466
 486221870 191407494 ...)

Code: Select all

CL-USER> (time (delete-duplicates (loop for x from 1 to 100000 collect (random 1000000000))))
Evaluation took:
  98.663 seconds of real time
  98.295298 seconds of total run time (98.177997 user, 0.117301 system)
  99.63% CPU
  ...
(515034309 845470080 999353875 760234271 406270440 283415072 245652264
 343700715 42652215 722056408 ...)

Warren Wilkinson
Posts: 117
Joined: Tue Aug 10, 2010 11:24 pm
Location: Calgary, Alberta
Contact:

Re: collecting *unique* in loop

Post by Warren Wilkinson » Sat Nov 27, 2010 2:07 am

Its funny you mention the destructive version being slower. In indianerrostock's post viewtopic.php?f=2&t=901 he invited others to try to beat his solution to a problem of replacing "#NAME#" tokens in a template string with the names of sports, repeated several times.

His initial solution of (search)ing for "#NAME#" and outputting a new string was many times faster than my highly optimized version that pre-recorded the "#NAME#" locations and destructively modified a single string. Mind blowing stuff.

On topic though, if your list was sorted you can do faster duplicate detection, but it still might not be faster than remove-duplicates, which is a function I actually didn't know existed... I used

Code: Select all

(defun unique (list) (reduce #'adjoin list :initial-element nil))
Need an online wiki database? My Lisp startup http://www.formlis.com combines a wiki with forms and reports.

Post Reply