As a follow up to yesterday’s post about using intermediate datastructures, I wanted to add some concrete numbers to my Big-Oh runtime statements.
I came in to work this morning and got my hands on what I thought was a relatively large sample set of data, and setup an operation that would populate an array with 2000 items, and them “keep” items from an array with 350 elements. The result was that 300 elements were left in my array.
The original alorithm I had to work with was as follows:
keep( source, keepers ) { for each element S in source found = false; for each element K in keepers if ( S == K ) found = true; break; if ( not found ) delete S from source }
So source
is size 2000, keepers
is size 350, and I
know from observation that I’m going to delete (2000-300) elements from source.
The overall (worst-case) runtime then is 2000 * 350 + 2000 * 1650 = 4,000,000 steps!
My re-implementation using intermediate datastuctures looks roughly like this:
keep( source, keepers ) { Populate set S(s) with elements of source Populate set S(k) with elements of keepers for each element E in source if ( E is not found in S(k) ) remove E from S(s) Reset source array to be empty with size length(S(s)) Copy contents of S(s) into source }
The runtime here then is: 2000log(2000) + 350log(350) + 2000*log(350) + 1650 = 43,625 steps.
Now in my book, 43625 if significantly fewer steps than 4,000,000. But let’s face it, when your average user is running well over 2 billion cycles a second, and you’re talking about arrays of pointers, four million operations is nothing :-\.
I guess it’s time to hunt for a much larger supply of data :-\