|
when speed matters
Multiplication on a Z80
|
|
Most programmers won’t care about the fact that multiplication wasn’t built into the processor. If you use a higher level programming language it will most certainly provide the necessary commands or code libraries which will solve multiplication arithmetics for us. The only people who will worry about the missing CPU instructions are the people who directly are affected: the assembly language programmers. In our MSX case the only ones who really worry about multiplication are demo-programmers; they need a really fast solution for performance reasons. More normal applications can still use a relatively slow multiplication, since once assembled, ML is still the most performant and for the occasional multiplication one won’t see the difference between 40 or 140 microseconds. If you need however thousands of multiplications like in realtime 3D algorithms or a realtime spectrum analyzer this 1000 times 100 microseconds will be a difference of 100 milliseconds between possible screen updates. With the fast code you get at maximum 1/40 ms = 25 Hz, so 25 frames per second. With the slow code you will obtain 7 frames per second. This means that your framerate can vary 357 percent depending on the code!
The solutions
Solution 1
This is almost literally the definition of multiplication like most of us have learned in elementary school. 5 times 3 equals 3+3+3+3+3, 8 times 2 equals 2+2+2+2+2+2+2+2 etc. It is clear that this approach can be optimized quite a bit, like making sure that register B contains the smallest value. Extra functionality can be easily implemented like adapting the routine so that it can multiply if there are one or two bytes in two’s complement format and accordingly outputting a two’s complements value in HL.
Advantages
Disadvantages
Solution 2
If we have the data stored in this format we can use the following code to multiply:
Advantages
Disadvantages
The perfect compromise
First, the lookup table. This one must start at #xx00. Most demo programmers will already have the habit of aligning lookup tables this way. It speeds up the indexing since there is no offset to take into account and all lookups can be calculated using boolean operations on addresses. We did this in the previous example also, it was not by coincidence that the lookup table started at #4000.
The precalced table may seem long and boring — which it is actually — but the code itself is much shorter and cleaner:
Some people would by now feel the urge to take paper and pencil and they would try to figure out how this works. If you are one of those people I’ll give you a hint and afterwards you will find it much easier to figure out how it works. Ok, here is the hint. MULTAB is made using the folowing pseudo-code:
Those who feel they can tackle the program alone can start working now. For those who didn’t find a piece of paper or whose pencils are blunt I will explain the code. Nobody would read this far and then appear too lazy to figure it out himself, right? Now hold on, here we go. Everybody has once during his scolar career heard about the ‘special products’. And we learned that (a+b)(a-b)=a²-b² and that (a+b)²=a²+2ab+b². First of all, we make a lookup table for the function f(x)=(x²)/4. The trick is that now we can say that a*b=f(a+b)-f(a-b). Let’s prove that this is true. We are going to work out this function:
Let’s use a simple example to make it clearer:
Our pre-built list contains the values for x²/4 for all x’s in the range of -128 up until +127. So the constraints for the registers A and D in our program are -128 <= A+D <=127 and -128 <= A-D <=127. If we are sure that A and D are always between -64 and +63 these conditions will always be true. A little remark however for those who saw the light. Our lookup table doesn’t contain x²/4 but int(x²/4)! So for the case of 3 we don’t get 2.25 but 2 as result of the lookup. Rest assured however that this doesn’t affect the final result. I hope that this article has made it clear that, even on a slow CPU, fast solutions exist to what seem to be long and tedious problems. Just as in real life, there are multiple solutions to one and the same problem, every solution having its good and bad sides. For those who were looking for a fast multiplying routine, feel free to use the given solution. I’m waiting to see the results of your imagination. |
|