Pixelate:Issue 2/Code optimizations

From Allegro Wiki

Jump to: navigation, search

Contents

Part I: Introduction

Allegroids,

In this little article, I will introduce some of you to the concept of code optimization. I will also point you to some tools and techniques to help you get more performance out of your program.

Plan well

This must be the most important part of them all. If you plan well your code, you can structure it in ways that will benefit the execution speed. It's at this stage that you will consider your data layout and the algorithms to manipulate it. Needless to say, a good algorithm will beat a poor one, even if the latter has been hand optimized in assembly. This is better demonstrated in Part II of this guide.

If you can, research better algorithms. For example. say plan to sort things at one point in your program. This could be a list of polygons, player stats, and so on. Have a look on the 'net, ask knowledgeable friends, or even (gulp) use your local library, and search for a better method. Most likely some other programmer has already tackled the problem and has found a better solution. Chances are this solution is better than anything else you'll ever come up with. Of course, understanding the algorithm is even better, as you'll be able not only to adapt it to your particular situation, but also perhaps improve on it by making relevant simplifications.

Another aspect of good planning is to make sure you don't do any redundant or superfluous work. Allow me to elaborate using my own personal experience on the matter. For my Baccalaureate's computer class, we had to build a project in roughly 6 months, and ship it to France to be evaluated. Since it was a team project, and both my partner and myself thought of ourselves as being (relatively) very good programmers at the time, we decided to do a 3D game. Big Mistake. But let's not dwell on that. What I was getting to is that our implementation of the 3D engine would, for every frame, build up the polygon list to be displayed, sort them by average z value (so that close polygons overdraw the ones that are further away), then do all that 3D transformation voodoo, and finally send them to Allegro's rasterizer to be drawn. Obviously we were writing the project as we went on. We were barely getting 25 fps in 320x200x8 on a Cerleron 466 MHz and wondered how come it was so slow. After a bit of analysis we found our bottleneck. The polygon list would almost never change during the game, and if it would, it would be by very small amounts. Yet, we would rebuild the entire list from our object data to a polygon list, *each* and *every* frame! The solution we adapted is to only rebuild the polygon list when it was modified. This magically boosted the fps rate to about 30. A 20% improvement! Yet it was not enough. As you can see, not planning out game or application can have consequences on performance.

Being at the right place and at the right time

You toil for weeks at hand-crafting a function in assembly using every trick in the book, and manage to double its speed. But what if that function takes only 5% of CPU time? Then you only gain a small 5% improvement. However, if a routine is taking 80% of the CPU time, then focus your optimization there. If you can make it 6% faster, you've already gained the same 5% as before! (And with minimal effort this time). This is why it's essential that you know when and where to apply your optimisation techniques. Now you ask, how can I know what is too slow, what takes the most processor time? It's often difficult (although possible) to do it simply by looking at the source code. This is not what I'm suggesting - it's rather awkward. Instead, get a good profiler. A profiler is a program that will analyze the code while its running and determine its speed parameters for you. Both DJGPP and Mingw come with one, and so does MSVC.

For gcc compilers (such as Mingw and DJGPP), all you need to do is compile your project with the -pg option. This magic switch will instruct the compiler to add code that will profile your program. Now you only need to run it for the data to be generated. You'll notice gcc created a file called gmon.out. This is the profiling data, although it's not in any human-readable form. You need to run gprof on your program for the conversion to be made. The following command will extract symbol information, compose the log file and write it to a file called 'log.txt' gprof mygame.exe > log.txt

Alternatively, if you use MSVC, then you need to compile your project with both the "Enable profiling" and "Generate Map data" options. Now open a command prompt in your program's directory and run:

ftime myprog > log.txt

Note that you may need to run vcvars32 beforehand, depending on your setup. That batch file will prepare the program for profiling, run it, then create the profiling data and write it in 'log.txt'

You can now examine this generated file to determine what is eating all those cycles. Here's a clip from the profiling data for Zasx

(using MSVC**):

Func Func+Child Hit Time % Time % Count Function

3397.628 26.2 3397.628 26.2 31159 _draw_trans_sprite (ald3935.dll)
1923.454 14.8 1923.454 14.8 420347 _putpixel (ald3935.dll)
1823.861 14.1 1823.861 14.1 2 _set_gfx_mode (ald3935.dll)
1088.186 8.4 1089.607 8.4 58248 _check_pp_collision_normal (ppcol.obj)
716.736 5.5 716.736 5.5 8272 _textout (ald3935.dll)
502.100 3.9 2749.555 21.2 4106 _draw_stars (star.obj)
407.832 3.1 407.832 3.1 322 _rotate_sprite (ald3935.dll)
405.524 3.1 405.524 3.1 4116 _textprintf_right (ald3935.dll)
352.158 2.7 352.158 2.7 11688 _line (ald3935.dll)
318.976 2.5 318.976 2.5 1 _install_sound (ald3935.dll)
205.914 1.6 205.914 1.6 12176 _masked_blit (ald3935.dll)
161.990 1.2 161.990 1.2 20530 _rectfill (ald3935.dll)
148.057 1.1 8233.198 63.5 4106 _DrawGame (main.obj)
125.587 1.0 125.587 1.0 4126 _textprintf (ald3935.dll)


This is only a small part of it. The whole thing is about six pages long. Notice that the function taking up the most time is Allegro's draw_trans_sprite. In fact, I use it for doing additive coloring. But let's not occupy ourselves with that - there is very little we can do to speed it up without writting our own version. Instead, let me direct your attention to the DrawGame function. Although the function itself takes 1.1% of the CPU's time, its children (or the functions that it calls) take up a whooping 63.5%! So now the drawing code occupies most of CPU time. Now we should concentrate our optimizations effort on reducing the work done by the graphics display code - Any small change will have a large effect on the total frame rate as we will see later on.

  • At the time of this writing, I couldn't get Mingw or DJGPP to properly output profiling information. If I ever get that to work I'll post it in a future article. Again, sorry for the inconvenience.

Working the algorithms

The first idea is that you need to pick algorithms which have a smaller complexity. Complexity is usually denoted by Big-Oh - or a function showing its Order of Magnitude relative to input. Let me explain using a (rather extreme) example. Let's assume you just wrote your polygon rasterizer and most of your 3D engine. Now comes the time to sort the polygons to draw them back to front so that the scene appears correctly (the polygons in front will overdraw those in the back). You haven't done much planning nor researched possible algorithms for the sort. So you decide to write down the first thing that comes to mind - the dreaded Bubble Sort (tm). The code for this algorith looks something like:

for (i = 0; i < num; i++) {
    for (j = 0; j < num; j++) {
         if (poly[i]->z > poly[j]->z)
              swap(poly[i], poly[j])
    }
}

This code will loop through every polygon and make sure it's average 'z' value is smaller than all the ones after it. If not, then it will simply swap them. Simple, no? Now this algorithm has a Big-Oh of O(n^2) (that's n squared). This means that as your number of polygons increase, the time it takes to sort them will increase at the rate of the *square* of n. Hence, if you double the number of polygons, you quadruple the time it will take to do the sort. This is clearly unacceptable, even for a small 'n'.

A better algorithm is required. Now let's assume you never heard of qsort (which has an average complexity of n*ln(n), much much better than our n^2). The previous algorithm will go through every possible combination of 'i' and 'j' to sort the array. But we don't need to! If we know part of the array is already sorted, we don't have to run through it! However, simply starting 'j' at 'i' would not work in our case as it's possible that a small value appears later in the array. The idea is that if we can find the smallest value, and insert it at the first position, we are guarantied that the rest of the array will contain values which are greater than it (obviously). So we simply repeat the process for every number. Let's say we have the array: 4 6 3 2 We know that '2' should be the first element, so we swap it with '4'. This gives us: 2 6 3 4. Now we know that anything smaller than '2' cannot appear after the first position in the array. Thus the first element of the array is properly sorted. Now, we must look for the smallest value in the rest of the array, and insert that at its correct position. The smallest of {6, 3, 4} is 3 of course. Since we started counting at the second element, we will place '3' there. Now our array looks like: 2 3 6 4. The first two elements are sorted! If we repeat once more, we get the full sorted array of 2 3 4 6. The code for this algorithm is trivial:

#define SIZE 4
int array[SIZE] = {4, 6, 3, 2};
int min_val = 0;
int min_pos = -1;

for (i = 0; i < (SIZE - 1); i++) {
    /* Find the smallest value */
    min_pos = -1;
    for (j = i; j < SIZE; j++) {
         if (array[j] < min_val || min_pos == -1) {
         min_val = array[j];
         min_pos = j;
         }
    }
    swap(array[i], array[min_pos]);
}

Ok, it's not as simple as the previous one. Let's analyze it anyway. Since we only go through the second loop SIZE - i times, we end up running the inner loop n(n-1)/2 times. (You can count it yourself if you wish). Much better than before! However, this still has a complexity of O(n^2), so it's not *that* much better.

Next Time

  • Minimizing work done on graphics
  • Searching and Sorting


Send complaints to void_star_@excite.com


Issue 2: The Danger Issue
Previous:
Newbie coders diary
Present:
Code optimizations
Next:
Particle Tutorial
Personal tools