It never ceases to amaze me the drastic impact our choice of datastructures has on application performance, and how little thought many developers seem to give it. I was charged with the task of optimizing a specific feature of our application at work this week involving set logic — since set logic is so common and well defined, I was nervous that there’d be little improvement to be had short of optimizing how the data in question was physically stored. Much to my horror, I found several blocks of code that did something like this:
void ( CObArray &aryObjects1;, CObArray &aryObjects2;) { int i, j; for ( i = aryObjects1.GetSize() - 1; i >= 0; i-- ) { CObject * obj1 = aryObjects1.GetAt(i); for ( j = aryObjects2.GetSize() - 1; j >= 0; j-- ) { CObject * obj2 = aryObjects2.GetAt(j); if ( obj1 == obj2 ) { // do something } } } }
So right off the bat I have two major complaints:
- Who picks arrays to perform set logic? Especially a CObArray — use a SET CLASS for SET LOGIC, I BEG OF YOU.
- Why are we looping backwards? There’s nothing wrong with it, but it’s much more intuitive to start with 0 and work up.
Lets be more concrete though and talk about the specific method I worked on first — set difference. The method
takes 2 CObArray instances, and removes all occurrances of items in the first from the second. The do something
block above then, looked like this:
delete obj2; aryObjects2.RemoveAt( j );
While this seems innocent enough, that CObArray::RemoveAt(int)
call just bumped your runtime from a
horrendous O( n^2 )
to an even worse O( n^3 )
!! Remember, this is an array, not a list –
therefore deletion time is O( n )
…
My immediate though was to just swap out the datastructure for something better, like a set, but as I found out, you can’t always come into production code and start swapping out structures. There were thousands of places in the application that relied on this array being a CObArray instance :-\. So I decided to make use of an intermediate datastructure to optimize the task at hand.
The STL provides a number of template classes implementing various datastructures. I’ll show you my re-write first, and then we’ll discuss it’s subtletees:
struct MyComparator { bool operator()( CObject* obj1, CObject* obj2 ) { return obj1->Compare(obj2) < 0; } } typedef std::set< CObject*, MyComparator > ObjectSet // Removes all objects occurring in aryObjects1 from // aryObjects2 in O( n * log(n) ) time. void RemoveAll( CObArray &aryObjects1;, CObArray &aryObjects2; ) { ObjectSet myset; for ( int i = 0; i < aryObjects2.GetSize(); ++i ) { myset.insert( aryObjects2.GetAt(i) ); } ObjectSet::iterator itr; for ( int j = 0; j < aryObjects1.GetSize(); ++j ) { itr = myset.find( aryObjects1.GetAt( j ) ); if ( itr != myset.end() ) { delete *itr; myset.erase(itr); } } aryObjects2.RemoveAll(); aryObjects2.SetSize( myset.size() ); int i = 0; for ( itr = myset.begin(); itr != myset.end(); itr++ ) { aryObjects2.SetAt( i++, *itr ); } }
It works by first creating a set with all the elements from aryObjects2
. This takes O(n logn) time.
I then iterator aryObjects1, searching for each item in the set I just built. This takes O(logn) per lookup. If I
found the item, then I delete it from the set, which is a O(logn) operation as well. The final step is to copy the
items from the set back into the target array. Note that I allocate the memory space all at once to avoid having to
reallocate and copy the array items each time it grows.
This last point can cause major problems for you as well. I used a technique similar to this one to optimize an
AddAll method as well — the add all had an O( n^3 )
loop to detect duplicates, and then at the end
did something like this:
aryObjects1.RemoveAll(); for ( int j = 0; j < aryTemp.GetSize(); j++ ) { aryObjects.Add( aryTemp.GetAt(i) ); }
While this looks like a O(n)
on the surface, my handy debugger showed me that it was actually
“growing” the array one insert at a time — so for each insert, it allocated new memory space, copied the entire
array over, and then inserted the item. That’s O(n^2)
! By simply realizing that we already knew how
big the array was going to be, and statically allocating it up front, we can eliminate an order of magnitude
from the runtime (that one’s for you, Doug ;))!
So by simply introducing an intermediate datastructure to perform a specific task, you can greatly improve your runtime performance, while not having to recode for a new datastructure.