Monday, June 17, 2013

What are switch statements under the hood?

Location: Salt Lake City, UT, USA
If you don't know what a switch/case block is (often called a switch statement, or switch condition), then this post will likely not be of any use to you.  I apologize, and promise that this will be the last very technical post I make for a great while.  If you're a programmer and don't know what I am talking about, break out google and learn about switches, then come back so you know how they work and can properly take advantage of this wonderful tool!

Switches are great!  Any time that you have several 'if / else if / else' clauses chained together to represent a number of potential paths, you have a great candidate for a switch.  At the very least a switch statement will likely give you code that is much more clear and easy to read.  They are concisely and explicit.  When given the opportunity to shine, a switch can dramatically improve the speed of your code.

To see how switches can improve your code you need to understand how they are being performed 'under the hood', and to understand this we have to actually understand a little about the machine code that your source code is compiled into.  We do this typically with an assembly language, but I will not assume you are fluent in assembly.  Instead I will try to describe what is happening.

Everything in your program is stored in memory before making it to the processor.  Instructions (commands), variables, strings, everything.  Processors only understand rather simple commands, things like move this value from memory location A to memory location B, jump to instruction located at memory location B and continue processing from there, compare values in A and B, add two values, and other basic operations.  These basic operations can be stringed together to accomplish some higher level task.  For instance, you could write a basic assembly program to find the absolute value between A and B:

0Test if A is greater than B
1If test is true jump to address 4
2Subtract A from B (answer is in B)
3Jump to address 5
4Subtract B from A (answer is in B)
5Return to caller

When we are finished running this code, the result is stored in B (A and B both represent registers, memory on the processor to store things we are currently operating on, usually 4 to 8 bytes each).  We could have accomplished the same thing other ways, for instance, we could have just subtracted them, tested if the answer was less than 0 (negative), and if so we could have multiplied by -1, or even flipped the bits and added one.  In programming there are almost always several ways to do the same thing.

Now that we understand a little bit about how assembly works we can talk about something called a jump table.  A jump table is not something explicitly offered by assembly.  Instead it is a logical programming construction, something someone made up as a way to use assembly to create something useful.  To see how they work, it helps to lay out a scenario in which we would use them.  For our example we will say that we have a game, and the player should be able to choose between 6 different classes.  In C we might make some constants for each class to map the class name to a number which will be used internally to determine the class, in Java we may use an enum to do the same thing.  Regardless of how we represent it, we will abstract our classes to numeric codes 0 through 5.  Here's how this may look as an if/else if block:

if (chosenClass == WARRIOR)
    hitPoints = 50;
else if (chosenClass == MAGE)
    hitPoints = 15;
else if (chosenClass == WARLOCK)
    hitPoints = 20;
else if (chosenClass == ROGUE)
    hitPoints = 30;
else if (chosenClass == NINJA)
    hitPoints = 40;
else if (chosenClass == BARD)
    hitPoints = 10;

Note: ALL CAPS represent enums or constants, numbers, 0-5.

A switch form of the above code might look like this:

switch (chosenClass) {
    case WARRIOR: hitPoints = 50; break;
    case MAGE:    hitPoints = 15; break;
    case WARLOCK: hitPoints = 20; break;
    case ROGUE:   hitPoints = 30; break;
    case NINJA:   hitPoints = 40; break;
    case BARD:    hitPoints = 10; break;

So I hope that by looking at my previous example of pseudo assembly, you can see the obvious way that this may be implemented in machine code.  A few comparisons for equality and jumps to different addresses and you're done.  But since I am writing about it, there must be a better way.  Let me propose that instead of doing comparisons, we do some calculating.  We have 6 total choices, and the range is 0 to 5, this is important.  Note that there are no gaps, though this is not a necessity.  Also recognize that once we have made a choice, we have our 6 possible outcomes and that each of these in machine code is two operations: one to set a memory location or register value, and another to jump past our choices (the equivalent of break), twelve total lines of code.  We now can write something with this logic (the chosenClass is held in A, hitPoints is held in B):

00Multiply A by 2 and store the result in A
01Jump to address stored in (A + 2)
02Set B to 50
03Jump to address 14
04Set B to 15
05Jump to address 14
06Set B to 20
07Jump to address 14
08Set B to 30
09Jump to address 14
10Set B to 40
11Jump to address 14
12Set B to 10
13Jump to address 14
Our code has a really cool feature to it.  Where the logic in our C/Java style code using if/else blocks may have to test our condition up to six times, our assembly code is able to identify the correct code with two basic math operations and a direct jump to a new memory location.  No matter which choice is made, we will never execute more than four instructions.  In computer science we refer to this as constant complexity, and this is really the holy grail of algorithms.  Anytime you see complexity go constant like this, you know you are looking at something special.

So how is this working?  First we look at how large the commands that we are jumping to are, in this case they are each 2 instructions, Set B to a value, and jump beyond the switch (to instruction address 14).  To account for this we multiply A by 2.  We then look at where these commands start, they start at address 03, so we add 2 to A (2 because we are using 0 as a number), and this defines where we jump to in memory.  If the switch input is 3, we multiply by 2, add 2, and we find that we should be jumping to address 08, set B to 30, which properly corresponds to our ROGUE, which is correct for an input of 3.

But this is only a simplified version of a jump table, contrived to show just the basic logic and illustrate how jump tables avoid complexity and as a result can substantially speed up your code.  Our logic here is great to illustrate those points, but it is terribly limited in application.  What if the code that we executed on a matched condition was not uniform in length?  Suddenly our trick of multiplying by 2 fails!  What if the numbers don't start at 0, or 1, what if they start at 1430 and go to 1634, a range of 204?  Our code again breaks.  What if there are gaps in the numbers used?  Using my example code, this quite real possibility is completely undefined -- entirely unacceptable for computers.  Being undefined simply means that we have to define some behavior which will consistently generate correct results.  For our needs this is actually quite easy, but to reveal our strategy we first have to resolve the first issue, that of varying length conditional blocks, the code we want to run when we match is not of uniform length.  We can easily handle ranges that do not start at 0 or 1 by converting them to our range.  1430 to 1634 is the same as 0 to 203, so we can do a little basic subtraction and convert ranges, and again, all this is done in constant complexity!

We can solve the issue of non uniform lengths by using an intermediary list which has some information about how these different lengths impact the addresses at which each starts.  As with everything on a computer, our list goes into memory with each choice being a memory address (4-8 bytes), more specifically, the memory address of one of the instruction blocks.  We then can calculate a position into this list, and retrieve the memory address to jump to for the associated instructions.  We can count on the memory locations being either a multiple of 4 or 8 in our list, and we now can use our simple logic above to jump to the correct locations.  Let me illustrate this with code, using 4 as the multiplier to simulate a 32 bit system:

00Multiply A by 4 and store the result in A
01Retrieve the value stored in our LIST at offset A, store it in A
02Jump to address stored in A
03Set B to 50
04Set C to 12
05Jump to address 20
06Set B to 15
07Jump to address 20
08Set B to 20
09Set C to 18
10Jump to address 20
11Set B to 30
12Jump to address 20
13Set B to 40
14Set C to 100
15Jump to address 20
16Set B to 10
17Jump to address 20
18NO OPERATION (processor just does nothing for this instruction)
19Jump to address 20
Our LIST looks like this:

So again, lets say we want to pick class 3, we multiply that by 4, and get 12.  We look up the address stored at offset 12 in the LIST, and find address 11.  We jump to the address stored in our list, 11 and we find we should set B to 30 and jump past to the end.  Again, this corresponds to the ROGUE, number 03.  Our algorithm is correct, and we now have greatly improved the usability of our logic at the cost of a little extra memory.  We retain our constant complexity, which is key.  No matter how large a list of conditions we are working with, we will never had to run more than three instructions to get there, and never more than 5 or 6 to be finished!

Our last task is defining an action when the values have gaps.  This also allows us to handle a default case if we desire.  The first thing to notice about the code above is that we have more items in our list than we have possible choices.  We have added an address at offset 24, which would correspond to a 7th element.  This is a default case, and corresponds to instruction address 18, where we just do nothing and then jump to the end (our code has no default case).  Handling gaps is simple, we just add an entry in our LIST for that number, and set the value to be the instruction address after our code, in this case address 20.  That way if we reach a number with no corresponding instructions, we can simply jump right past our code and continue processing.  Again, all of this maintains constant complexity, our ultimate goal!

An astute observer may notice a flaw in this logic.  What if our range is huge with a lot of gaps.  What if we have conditions to match 0, 4, 10, 50, 945328, and 1 << 63?  We have 6 possible values, but a range that goes from 0, to some obscenely large value (2^63).  Not only does do our gaps occupy substantially more space than our valid options, but our list must have 2^63 entries!  You could never hope to hold all this in the few gigabytes (or event terabytes) of memory available to us.  Luckily modern compilers are smart enough to recognize the conditions which will lead to excessive wasted space like this, and will only compile code into jump tables when it can determine that the range and gap density is low enough to make a jump table efficient.  In other words, not all conditional chains or switches will result in a jump table.  Only those which will lend themselves to increased efficiency, at a cost that is not disproportionately large to memory, will be converted into jump tables.

I think that's about it, but if you think I left something out, notice any mistake, have questions, or anything at all, leave a comment!

No comments:

Post a Comment