I've written an algorithm to count the number of inversions in a list of integers (pairs of integers that are out of order), but it runs very slowly. It takes ~400 seconds to run on a list of 100,000 integers. In contrast, the same algorithm written in Java runs in less than a second.

The algorithm itself is basically merge-sort, a binary recursive call, but counting the number of merges that would need to be done without actually doing them.

This is my basic Lisp form:

- Code: Select all
`(defun count-inversions (lst &optional (predicate #'<))`

(if (or (null lst) (null (cdr lst))) ;recursive base case

0

(let* ((half (ceiling (/ (length lst) 2)))

(left-list (subseq lst 0 half))

(right-list (subseq lst half)))

(+ (loop for a in left-list ; loop over the left and right halves of the list

summing (loop for b in right-list

counting (not (funcall predicate a b)))) ; count inversions between left and right lists

(count-inversions left-list predicate)

(count-inversions right-list predicate))))) ;recur on left and right lists

And this is that same form with various type declarations and optimization, it shaves about sixty seconds of the running time, but it still takes almost six minutes:

- Code: Select all
`(defun count-inversions (lst &optional (predicate #'<))`

(declare (optimize (speed 3)

(compilation-speed 0)

(debug 0)

(safety 0)))

(declare (type list lst))

(declare (type (function (fixnum fixnum)) predicate))

(if (or (null lst) (null (cdr lst)))

0

(let* ((half (the fixnum (ceiling (/ (the fixnum (length lst)) 2))))

(left-list (subseq lst 0 half))

(right-list (subseq lst half)))

(declare (type fixnum half))

(declare (type list left-list))

(declare (type list right-list))

(+ (the bignum (loop for a in left-list

summing (loop for b in right-list

counting (not (funcall

(the function predicate)

(the fixnum a)

(the fixnum b))))))

(the bignum (count-inversions left-list predicate))

(the bignum (count-inversions right-list predicate))))))

I'm using SBCL (+emacs +slime) on Ubuntu.

Why does this take so long? Is there anyway to get the running time comparable to Java? I have read that properly optimized CL can be competitive with C, so I would hope that this algorithm can be made better than a hundred times slower than Java.

Any help is appreciated,

tensorproduct