Search

Thursday, June 20, 2013

Judicious Process vs. Due Process

Location: Salt Lake City, UT, USA
Most people living in the US have heard of due process, and though few of us have probably studied the term, I think most of us have an idea what it means. Or so I thought, before I read Eric Holders speech where he outlines the legal justification behind sanctioned killings of US citizens, bypassing the courts.  I'll get to that a little later because the whole topic made me go back and do some reading, and though much of what I found was written in legaleze, I put the time in to try to figure out what it all said.

I don't think it helped though, because I am right back where I started.  I still think due process just refers to all the rights afforded to everyone.  Free speech, trial by jury, remaining silent, representation, among others.  There are a few different contexts in which the term is used, which can be confusing, but luckily, to address Mr. Holder's use of the term, we only need to look at how it is used in the constitution:
No person shall be held to answer for a capital, or otherwise infamous crime, unless on a presentment or indictment of a Grand Jury, except in cases arising in the land or naval forces, or in the Militia, when in actual service in time of War or public danger; nor shall any person be subject for the same offense to be twice put in jeopardy of life or limb; nor shall be compelled in any criminal case to be a witness against himself, nor be deprived of life, liberty, or property, without due process of law; nor shall private property be taken for public use, without just compensation.
This is, in fact, the entire fifth amendment to the constitution, part of our rights as citizens of the US.  I have bolded the section which references due process of law.  It's odd that the entirety of the amendment is quite explicit and direct, but the question of due process has always been more left to context than explicitly understood.

Now lets look at the meat of what Mr. Holder had said:
Some have argued that the President is required to get permission from a federal court before taking action against a United States citizen who is a senior operational leader of al Qaeda or associated forces.  This is simply not accurate.  “Due process” and “judicial process” are not one and the same, particularly when it comes to national security.  The Constitution guarantees due process, not judicial process.
Now, first I have to say that this paragraph is but one in a very long speech.  There is some context there.  He talks about the successes we have had in prosecuting terrorists, and all the plots that have been averted, and he even says a few things I agree with (damn few).  But the quote above is something I find very disturbing, and I think deserves a lot more attention than it got.

If you haven't deciphered it yet, he is basically saying that the 5th amendment doesn't guarantee that if you are accused of something serious, say killing someone, you will face a jury, or even get a court hearing.  That is what is meant by Judicious.  Instead, what Mr. Holder is saying is that due process is an abstract idea, that it simply means that some entity has to review it.  I would assume (perhaps hope is a better term) that there is some standard being applied, like that the entity given this power be impartial, at least I think that is being implied.  He goes on to make a historical argument that the president has traditionally been charged with maintaining national security.

THE ABOVE PARAGRAPH SCARES THE SHIT OUT OF ME!  In my mind we're basically being told that now the president has the legal authority to decide which citizens are a threat to our country, and summarily execute them.

To form a more specific rebuttal, I will start by pointing out that the entire 5th amendment is one sentence.  One cannot look at the term "due process of law" without also looking at the first few words, "No person shall be held to answer for a capital, or otherwise infamous crime, unless on a presentment or indictment of a Grand Jury".  There is a clear call for an indictment (an instrument of the judiciary), and a Grand Jury.  I think this is probably a foundation for a legal argument that I just don't have the experience to produce.  But I think the implications of Mr. Holders opinion reach so far that we as citizens have an obligation to stand up and oppose it.

Some people may look at this debate and not understand why it matters so much.  They look at the context and think that killing al-qaeda is probably a good thing, and they may be right, but the implications of what Mr. Holder is saying go deeper, in fact they question a fundamental understanding of one of the underpinnings of our society, the need for courts.  This opinion invalidates the idea that accused people are guaranteed the right to a trial in a court and judgement by a jury.  Due process becomes this abstract idea that something somewhere will review the situation to make sure it is all nice and legal.  It's the main ingredient required to form a tyranny.

Since I am not a lawyer, I can't give the eloquent rebuttal that I'm sure Mr. Holder would, I certainly can't cite the precedents.  But I can tell you one thing.  We are having out rights stripped from us by people like Eric Holder, and we the people need to stand up and start making some noise because the deeper we get the harder it's going to be to climb back out.  If we don't fight to preserve the idea that we were founded upon, this incredible American experiment is going to fail.

Tell me why I am wrong!

Tuesday, June 18, 2013

Colorado River - Ruby / Horsethief Canyons - Floating!

Location: Grand Junction, USA
Again, we got a group together, loaded up the boats, and hit the Colorado River to raft through Ruby and Horsethief canyons.  We went for twice as long as our previous trip, and I took some video over two of the float days.  Below is the video.  If you are looking for more textual information about this trip, see my previous post on the area!




Feel free to leave feedback!

Monday, June 17, 2013

What are switch statements under the hood?

Location: Salt Lake City, UT, USA
If you don't know what a switch/case block is (often called a switch statement, or switch condition), then this post will likely not be of any use to you.  I apologize, and promise that this will be the last very technical post I make for a great while.  If you're a programmer and don't know what I am talking about, break out google and learn about switches, then come back so you know how they work and can properly take advantage of this wonderful tool!

Switches are great!  Any time that you have several 'if / else if / else' clauses chained together to represent a number of potential paths, you have a great candidate for a switch.  At the very least a switch statement will likely give you code that is much more clear and easy to read.  They are concisely and explicit.  When given the opportunity to shine, a switch can dramatically improve the speed of your code.

To see how switches can improve your code you need to understand how they are being performed 'under the hood', and to understand this we have to actually understand a little about the machine code that your source code is compiled into.  We do this typically with an assembly language, but I will not assume you are fluent in assembly.  Instead I will try to describe what is happening.

Everything in your program is stored in memory before making it to the processor.  Instructions (commands), variables, strings, everything.  Processors only understand rather simple commands, things like move this value from memory location A to memory location B, jump to instruction located at memory location B and continue processing from there, compare values in A and B, add two values, and other basic operations.  These basic operations can be stringed together to accomplish some higher level task.  For instance, you could write a basic assembly program to find the absolute value between A and B:

AddressCommand
0Test if A is greater than B
1If test is true jump to address 4
2Subtract A from B (answer is in B)
3Jump to address 5
4Subtract B from A (answer is in B)
5Return to caller

When we are finished running this code, the result is stored in B (A and B both represent registers, memory on the processor to store things we are currently operating on, usually 4 to 8 bytes each).  We could have accomplished the same thing other ways, for instance, we could have just subtracted them, tested if the answer was less than 0 (negative), and if so we could have multiplied by -1, or even flipped the bits and added one.  In programming there are almost always several ways to do the same thing.

Now that we understand a little bit about how assembly works we can talk about something called a jump table.  A jump table is not something explicitly offered by assembly.  Instead it is a logical programming construction, something someone made up as a way to use assembly to create something useful.  To see how they work, it helps to lay out a scenario in which we would use them.  For our example we will say that we have a game, and the player should be able to choose between 6 different classes.  In C we might make some constants for each class to map the class name to a number which will be used internally to determine the class, in Java we may use an enum to do the same thing.  Regardless of how we represent it, we will abstract our classes to numeric codes 0 through 5.  Here's how this may look as an if/else if block:

if (chosenClass == WARRIOR)
    hitPoints = 50;
else if (chosenClass == MAGE)
    hitPoints = 15;
else if (chosenClass == WARLOCK)
    hitPoints = 20;
else if (chosenClass == ROGUE)
    hitPoints = 30;
else if (chosenClass == NINJA)
    hitPoints = 40;
else if (chosenClass == BARD)
    hitPoints = 10;

Note: ALL CAPS represent enums or constants, numbers, 0-5.

A switch form of the above code might look like this:

switch (chosenClass) {
    case WARRIOR: hitPoints = 50; break;
    case MAGE:    hitPoints = 15; break;
    case WARLOCK: hitPoints = 20; break;
    case ROGUE:   hitPoints = 30; break;
    case NINJA:   hitPoints = 40; break;
    case BARD:    hitPoints = 10; break;
}

So I hope that by looking at my previous example of pseudo assembly, you can see the obvious way that this may be implemented in machine code.  A few comparisons for equality and jumps to different addresses and you're done.  But since I am writing about it, there must be a better way.  Let me propose that instead of doing comparisons, we do some calculating.  We have 6 total choices, and the range is 0 to 5, this is important.  Note that there are no gaps, though this is not a necessity.  Also recognize that once we have made a choice, we have our 6 possible outcomes and that each of these in machine code is two operations: one to set a memory location or register value, and another to jump past our choices (the equivalent of break), twelve total lines of code.  We now can write something with this logic (the chosenClass is held in A, hitPoints is held in B):

AddressCommand
00Multiply A by 2 and store the result in A
01Jump to address stored in (A + 2)
02Set B to 50
03Jump to address 14
04Set B to 15
05Jump to address 14
06Set B to 20
07Jump to address 14
08Set B to 30
09Jump to address 14
10Set B to 40
11Jump to address 14
12Set B to 10
13Jump to address 14
Our code has a really cool feature to it.  Where the logic in our C/Java style code using if/else blocks may have to test our condition up to six times, our assembly code is able to identify the correct code with two basic math operations and a direct jump to a new memory location.  No matter which choice is made, we will never execute more than four instructions.  In computer science we refer to this as constant complexity, and this is really the holy grail of algorithms.  Anytime you see complexity go constant like this, you know you are looking at something special.

So how is this working?  First we look at how large the commands that we are jumping to are, in this case they are each 2 instructions, Set B to a value, and jump beyond the switch (to instruction address 14).  To account for this we multiply A by 2.  We then look at where these commands start, they start at address 03, so we add 2 to A (2 because we are using 0 as a number), and this defines where we jump to in memory.  If the switch input is 3, we multiply by 2, add 2, and we find that we should be jumping to address 08, set B to 30, which properly corresponds to our ROGUE, which is correct for an input of 3.

But this is only a simplified version of a jump table, contrived to show just the basic logic and illustrate how jump tables avoid complexity and as a result can substantially speed up your code.  Our logic here is great to illustrate those points, but it is terribly limited in application.  What if the code that we executed on a matched condition was not uniform in length?  Suddenly our trick of multiplying by 2 fails!  What if the numbers don't start at 0, or 1, what if they start at 1430 and go to 1634, a range of 204?  Our code again breaks.  What if there are gaps in the numbers used?  Using my example code, this quite real possibility is completely undefined -- entirely unacceptable for computers.  Being undefined simply means that we have to define some behavior which will consistently generate correct results.  For our needs this is actually quite easy, but to reveal our strategy we first have to resolve the first issue, that of varying length conditional blocks, the code we want to run when we match is not of uniform length.  We can easily handle ranges that do not start at 0 or 1 by converting them to our range.  1430 to 1634 is the same as 0 to 203, so we can do a little basic subtraction and convert ranges, and again, all this is done in constant complexity!

We can solve the issue of non uniform lengths by using an intermediary list which has some information about how these different lengths impact the addresses at which each starts.  As with everything on a computer, our list goes into memory with each choice being a memory address (4-8 bytes), more specifically, the memory address of one of the instruction blocks.  We then can calculate a position into this list, and retrieve the memory address to jump to for the associated instructions.  We can count on the memory locations being either a multiple of 4 or 8 in our list, and we now can use our simple logic above to jump to the correct locations.  Let me illustrate this with code, using 4 as the multiplier to simulate a 32 bit system:

AddressCommand
00Multiply A by 4 and store the result in A
01Retrieve the value stored in our LIST at offset A, store it in A
02Jump to address stored in A
03Set B to 50
04Set C to 12
05Jump to address 20
06Set B to 15
07Jump to address 20
08Set B to 20
09Set C to 18
10Jump to address 20
11Set B to 30
12Jump to address 20
13Set B to 40
14Set C to 100
15Jump to address 20
16Set B to 10
17Jump to address 20
18NO OPERATION (processor just does nothing for this instruction)
19Jump to address 20
Our LIST looks like this:

OffsetValue
0003
0406
0808
1211
1613
2016
2418
So again, lets say we want to pick class 3, we multiply that by 4, and get 12.  We look up the address stored at offset 12 in the LIST, and find address 11.  We jump to the address stored in our list, 11 and we find we should set B to 30 and jump past to the end.  Again, this corresponds to the ROGUE, number 03.  Our algorithm is correct, and we now have greatly improved the usability of our logic at the cost of a little extra memory.  We retain our constant complexity, which is key.  No matter how large a list of conditions we are working with, we will never had to run more than three instructions to get there, and never more than 5 or 6 to be finished!

Our last task is defining an action when the values have gaps.  This also allows us to handle a default case if we desire.  The first thing to notice about the code above is that we have more items in our list than we have possible choices.  We have added an address at offset 24, which would correspond to a 7th element.  This is a default case, and corresponds to instruction address 18, where we just do nothing and then jump to the end (our code has no default case).  Handling gaps is simple, we just add an entry in our LIST for that number, and set the value to be the instruction address after our code, in this case address 20.  That way if we reach a number with no corresponding instructions, we can simply jump right past our code and continue processing.  Again, all of this maintains constant complexity, our ultimate goal!

An astute observer may notice a flaw in this logic.  What if our range is huge with a lot of gaps.  What if we have conditions to match 0, 4, 10, 50, 945328, and 1 << 63?  We have 6 possible values, but a range that goes from 0, to some obscenely large value (2^63).  Not only does do our gaps occupy substantially more space than our valid options, but our list must have 2^63 entries!  You could never hope to hold all this in the few gigabytes (or event terabytes) of memory available to us.  Luckily modern compilers are smart enough to recognize the conditions which will lead to excessive wasted space like this, and will only compile code into jump tables when it can determine that the range and gap density is low enough to make a jump table efficient.  In other words, not all conditional chains or switches will result in a jump table.  Only those which will lend themselves to increased efficiency, at a cost that is not disproportionately large to memory, will be converted into jump tables.

I think that's about it, but if you think I left something out, notice any mistake, have questions, or anything at all, leave a comment!

Tuesday, June 11, 2013

Keeping Fish and the Aquatic Nitrogen Cycle!

Location: Salt Lake City, UT, USA
I don't know about the rest of you, but I love fish!  I have been an avid enthusiast and hobbyist since I was young.  My father and I were both introduced to the hobby at the same time, on a whim stopping in a pet store where the manager was a Cichlid fanatic.

Marlin the clownfish
One of the big barriers to keeping fish is knowing how to keep them alive.  Unfortunately there is a lot of misinformation about keeping fish out there.  Really, to keep fish, you only need two things besides the obvious of water and something to hold the water.  You need to regulate the temperature (usually a heater is quite sufficient), and you need to filter the water.  I've already said about all there is to say about regulating temperature.  The only thing to add is that some tanks, usually with very powerful lights, will require chilling instead of heating.

Marine aquarium filtration (salt water)
There are several types of filtration: mechanical, chemical, and biological.  Mechanical filtration is what most people think of when they think of filters.  This involves physically trapping non-dissolved matter, usually in some sort of mesh.  Though this does a great job of removing larger visible impurities, the primary benefit to mechanical filtration is that it can help keep your water more clear, which makes little difference to the fish but can increase your enjoyment in watching them significantly.  Mechanical filters can range anywhere from a foam pad, to a micron filter further reduced by diatomaceous earth, the latter of which will even filter out mid-water algal and bacteria blooms, and even a method called foam fractionation which is used in marine aquariums.

Chemical filtration uses substances that absorb impurities from the water.  The most common substance used is activated carbon, though other substances might be used depending on what chemical absorption is required.  This can appear like biological filtration in some regards, you can remove some of the same substances that biological filters take care of, but it accomplishes this by absorbing the impurities.  What most people don't understand about chemical filtration is that it is a band-aid for a real issue.  There is no way to know how effective it is being, and once it fills up, it starts to leech chemicals back into the water.  Chemical filtration, in my opinion, should only be used for very short periods of time between changing (8-24 hours) in emergency situations while you are working to fix the underlying cause of the issue.

Cleaner shrimp working on a polyp coral
So that leaves us with biological filtration, the core of keeping fish, and the core of biological filtration is something called a nitrogen cycle.  Understanding this is really the key foundation of keeping anything alive that lives in water.  It's all you really need to know to keep freshwater fish alive, and the first stepping stone toward keeping marine fish (salt water) and invertebrates, like corals and anemones.  There are a lot of things that go into water, but true waste is primarily nitrogen, in the form of ammonia.  Even uneaten food and dead organisms rotting are ultimately producing ammonia.

Ammonia is a poison to fish, and a high enough concentration will lead to all sorts of problems resulting in a dead tank.  We MUST filter this out!  We look to nature and find that there are aquatic bacteria which actually eat ammonia!  Nitrosomonas basically eat ammonia and produce another form of nitrogen called nitrite(s).  This form of nitrogen is not as toxic as ammonia, but is still a toxin in a fish environment, and must be removed.  Again we look to nature and find another little aquatic bacteria called Nitrobacter which eats nitrites.  Like it's cousins, Nitrobacter produces a nitrate as a byproduct.  Nitrate is a substance you might have heard of before.  It is a form of nitrogen commonly used in fertilizers, and is relatively non-toxic.  This whole process of converting ammonia to nitrate is the nitrogen cycle, and basically all life depends on it.

Understanding this transfer from ammonia to nitrite, then nitrite to nitrate, is the foundation for understanding how nature filters water.  But how do we apply this to our tanks?  We look at how these bacteria live in water, and we find that they flourish best when they have a lot of surface area to colonize (rocks and sand perform this in nature) and a lot of oxygen, which in aquatics means swift moving water with a lot of surface agitation and a large area for exchange with the air.  So in your aquarium you need good water to air surface area (go wide or long, not tall), have good water movement (most fish really want a lot more than most are given), and agitate the surface a little.  It is not the air bubbles rising that help get air in water, it is their breaking at the surface that helps promote gas exchange.

20 gallon long, a lot of surface area.  Notice the rippling effect, great for gas exchange!
Provide a lot of surface for bacteria to live on.  Under-gravel filters of days-gone-by use the gravel as an actual medium for bacteria to grow, but those filters tend to trap waste under the filter plates and it never gets removed, and the flow is not strong enough to provide oxygen.  This results in an anaerobic (non-oxygen breathing) bacteria colony, which you do not want.  Instead, fill the tank with porous rock, with a lot of surface area, and get good Powerhead style pumps to keep things moving.  Lace and some lava rocks work well, I suggest you consult with your local fish store to pick out the right rocks that won't leech bad stuff in your water.  There are also a lot of filters that provide mediums for bacteria growth, from Marinelands power wheels, to bio-balls, and canister inserts, but I prefer to stick with the more time tested approach of having the system support itself (I do advocate sumps, separate tanks hooked together, to help increase water volume, and move some of the bacteria supporting rocks out of the main display tank).

Armed with well oxygenated water, good strong water flow, and plenty of rocks for bacteria to colonize, your tank will process its own waste into harmless nitrate that can be removed by aquatic plants, or with regular water changes.  Failure to remove nitrate can lead to algae growth.

Aquarium Sump
A few things to keep in mind.  Bacteria takes some time to grow, and will not grow until there is an excess of food (ammonia and nitrite).  When first starting out it is important to understand that you will have to cultivate this bacteria up slowly as the tank matures.  This initial 'cycle' typically takes 2-3 months (but can vary from 1 month to a year, it all really depends a on a lot of factors, assume 2-3 months).  To get ammonia in the tank you need something alive.  Just one fish per 20-40 or so gallons will work well, assuming they are fairly small, less if they are larger fish, and you should already have a vision for what you want in the tank, so you should know which fish to get (least aggressive variety is best to start with).  Get a test kit for ammonia and check it every 3-7 days.  You should see the ammonia go up, and plateau for a week or two.


You should be like a month in already and be frustrated at how boring your tank is so far.  That is OK!  Just sit on that emotion and in the end you will come out with a work of art and a piece of pure beauty that everyone ooh's and ahh's over.  Get a nitrite test kit and you should start to see that the nitrite is going up slowly.  Keep an eye on the nitrite, and watch how after a week or two the ammonia has dropped down to fairly safe levels as the nitrite rises.  You could add a few more fish at this point, but only a few, go really slow.  The nitrite will go through the same pattern which ammonia did, it will plateau, then start to drop as new the bacteria colony grows to consume the excess resources.  You want to get a nitrate test as well, and again, you can watch nitrate rise as nitrite fall to safe levels.  Once this is done you have 'cycled' the tank, and have a full nitrogen cycle going on!  Pretty cool huh?

A little Blenny!
Again, go slow from here.  each time you add more fish that ammonia is going to spike again, more bacteria will grow, ammonia will drop as nitrite grows, then nitrite drops as nitrate grows.  Too much of a swing and your fish will suffer, and likely get sick.  So take it slow, give yourself a good year, maybe even two, to fully stock the tank, and you will end up with something you love instead of an algae infested death trap that you loath to look at.

Six-Line Wrasse

If you have any questions about fish or fishkeeping, ASK!






Monday, June 3, 2013

Java - Pass by Reference Value!

Location: Salt Lake City, UT, USA
To maintain a little bit of the geeky theme I have had going this last week, I wanted to talk a little about something I think is often misunderstood by those beginning in programming, and can be especially confusing to those moving to Java from other languages: the subject of reference vs value.

The primary place where I see this confused is in method parameter passing, but a better understanding of this can pay off in other areas as well.  Also, one point of note, the terms function and method are interchangeable in the topic of computer science. Function is the traditional term, borrowed from mathematics, and used in languages like C.  Method is simply an alternate term, and is favored in Java.  The terms are interchangeable in this context.

Before we start to discuss the differences between them, we should ensure we understand the terms:

Value is what we would call an immediate.  It is a section in memory where an actual value sits.  It could be any data type, an int, double, char[], or even a struct.  On the inside this is just a set amount of memory, contiguous bytes, the number of which depends on the size (or width) of the datatype being stored there.

Reference is simply a memory address which points to a 'value'.  References are held in variables, in C/C++ they are called pointers, which is an ideal name for them, as they literally point to somewhere else in memory.  I will continue to call them pointers here.  The size of a pointer is simply the size (or again, width) of the memory addresses on your hardware (either 4 or 8 bytes typically), and will never be any different because it simply stores a memory address that points to some value.

Values are fairly straightforward to understand, but references are more difficult, mainly because references are also values, but special values, memory values.  This means that references can point to other references, forming a chain which hopefully ends at a real object that has been allocated in memory.  A reference which does not point to a valid allocated object in memory is considered a 'null pointer', it should not be used other than to write a valid address to.



The most succinct way I can think of these relationships is that a value is an actual value that is stored in memory.  A reference is the address to a memory location where a value is stored.  And a pointer is a special type of value, it only holds references (memory addresses), it's the manifestation of a reference.



One of the useful reasons for having references is to facilitate parameter passing, and that is primarily what I intend to address today.

Say you have a exponent or power function, it takes int A and B and returns a long, A to the power of B (A^B).  On our imaginary system an int is 32 bits and a long is 64, so we have a total of 64 bits of input to our function, or 8 bytes of data, in computer speak this is basically nothing.  To run this function we allocate 8 bytes, copy over the values that the method caller provided, and crunch the numbers.  When it's time to return our long (which we had to allocate within our function) we again have a small amount of data and we simply copy it over to a location where the calling function knows to find it.  This method of passing parameters to a function is called pass by value.  We are literally passing the value of the variable provided to the function (or more precisely, an identical copy of that value).

Now say you have a sort function, it takes a linked list, and has no return value (void return type), it sorts the list provided in-place.  When sort function is called what happens?  Given the scenario in the previous paragraph we would assume that enough memory would be allocated to hold the linked list, the values held in the original list would be copied over to this new list, and we would sort the values within this new list.  But this has a few issues.  For one, we couldn't have a return type of void because without returning, we have no way of getting our copy of the list back to the caller.  But even if we return our new list, what if our list is big?  I mean really big?  What if our list was more than half our available memory and we couldn't copy the whole thing to pass to the function?

Do we really require twice as much space as our list requires to call a sort method?  Of course not!  This is where references come in.  Instead of copying our list over to our function, we can copy the address of our list which is what we mean by a reference, and then the method can access it without having to copy or move it at all.  Remember, a reference is typically 32 to 64 bits, basically nothing compared to even a small linked list.  This is referred to as passing by reference since we don't copy the actual value we want to operate on, we pass its memory address so that the function can work directly on the object.

So you can probably see why we need pass by reference, but why then do we bother with pass by value?  The main reason that I am aware of is that references do not allow a caller to protect their data from a function.  When a function is passed a reference there is a risk that the function will modify the input data which the calling function is still using, and this could cause the calling function to violate its assumptions about the state of its own data.  When you pass by value you guarantee that none of your parameters are modified without your knowledge, as the called function is only operating on copies.  There is also a question of optimization.  If you have a function that takes a byte as input, passing by reference might use four times as much memory to pass parameters as passing by value, on a 64 bit system it might be 8 times.  Just as we might want to optimize functions that handle very large inputs, we also should be able to optimize functions with small inputs.

Java does away with the question of reference vs value by making all primitive data types (char, short, int, long, float, double, etc...) pass by value, and all non-primitives pass by reference-value.  Java has no pointers because every reference to a non-primitive type is always a pointer.  There is no special syntax involved because this is simply how all non-primitive variables work, they are all pointers to the object.  So what is passed to a method in java is a copy of that reference, or in other words a totally new variable (which under the hood is a pointer), with a copy of the address to the variable passed to the function.  This distinction can be a little tricky to sort out at first, and difficult to describe in text, so you might want to read that a few times...

Some of the more insidious 'bugs' I have seen in beginning students Java code is related to not understanding how method parameters are passed under the hood.  How modifying the contents of an object itself will be reflected in the calling methods copy of the input, but something like reassigning the variable to a new object entirely will result in no change to the object passed in by the caller.  In other words there is a big difference between:

public static void copyList(LinkedList a, LinkedList b) {
    a.clear();
    a.addAll(b);
}

and

public static void copyList(LinkedList a, LinkedList b) {
    a = b;
}

The first example above will result in the calling function having two lists that contain the same nodes in the same order, as we have simply performed operations on the original lists provided by the caller.  In the second example we have assigned the address for object b to the variable named a.  the variable a now points to the same object that the variable b points to.  In fact, within the second method we no longer even have a reference to the list originally passed in as parameter 'a' at all!  Our second method has not changed the original inputs as expected, it simply changed how the method references those lists, duplicating the reference to the list passed in as 'b' and losing the reference to the list passed in as 'a'.

This is a subtle but important distinction to understand if you hope to write functional Java code.  If you are struggling, leave a comment and I might be able to help make it more clear!

I had intended to write a little about the stack and the heap (dynamic vs static memory) and why this distinction of value vs reference exists in the first place, but this post has already grown larger than I would like, so I will save that discussion for another day!

Sunday, June 2, 2013

Java file organization - part 2

Location: Salt Lake City, UT, USA
So last post I showed how I built a few quick file utilities in Java.  If you will recall, I have a very large music collection I copy to (currently) 8 usb thumb drives for use in my car.  I do not want bands broken up between disks, and I want to avoid copying files that my car stereo does not read.  So I wrote a few Java utilities to help me automate this task as my music collection grows.

This post I am going to show two more utilities, and then I'll show how I use these methods to build my disks.  We will need a method to copy files and a method to delete files (to clean out the old files when I rebuild the disks).  I'll start with the remove method because it is the smallest.

 private static void removeAllChildren(File root) {
  LinkedList<File> toRemove = Tools.getAllChildren(root, false, true,
    false, false), toCheck = Tools.getImmediateChildren(root, true,
    false, false, false);

  Iterator<File> i = toRemove.iterator();
  while (i.hasNext())
   i.next().delete();

  toRemove.clear();
  while (!toCheck.isEmpty()) {
   File file = toCheck.pop();
   LinkedList<File> temp = Tools.getImmediateChildren(file, true,
     false, false, false);
   toCheck.addAll(temp);
   toRemove.addAll(temp);
   toRemove.add(file);
  }

  Collections.reverse(toRemove);
  i = toRemove.iterator();
  while (i.hasNext())
   i.next().delete();
 }


Basically we give it a folder, and it will remove all children.  Our general strategy is to first collect all files and simply remove them.  Next we iterate through toCheck and collect all the remaining directories and store them for removal as we continue to check them.  We also add the current directory to the list to remove.  This is similar to the logic we used in the last post to build our folder listing.  Finally, we reverse toRemove so that we are removing the deepest nested directories first, and we iterate again to remove them all.  This should all be rather straightforward.

The next method is a bit more complicated, but not too bad.  We're using nio channels to perform the actual copying of the contents.

 private static long copyFile(File fromFile, short diskNum, File newPath,
   File oldPath) {

  String disk = Short.toString(diskNum);
  if (diskNum < 10)
   disk = "0" + disk;

  String path = fromFile.getAbsolutePath().replace(
    oldPath.getAbsolutePath(),
    newPath.getAbsolutePath() + File.separator + disk);

  File toFile = new File(path.replace(fromFile.getName(), ""));
  if (!toFile.exists())
   toFile.mkdirs();

  toFile = new File(path);
  try {
   toFile.createNewFile();
  } catch (IOException e1) {
   // Choose an action here!
   return 0;
  }

  FileInputStream source = null;
  FileOutputStream destination = null;
  long size = 0;

  try {
   source = new FileInputStream(fromFile);
   destination = new FileOutputStream(toFile);

   FileChannel sourceFileChannel = source.getChannel();
   FileChannel destinationFileChannel = destination.getChannel();

   size = sourceFileChannel.size();
   sourceFileChannel.transferTo(0, size, destinationFileChannel);

  } catch (FileNotFoundException e) {
   // Choose an action here!
   return 0;
  } catch (IOException e) {
   // Choose an action here!
   return 0;
  } finally {
   try {
    source.close();
    destination.close();
   } catch (IOException e) {
   }
  }
  return size;
 }

This method will return a long representing the size of the copied file (0 if the copy fails).  The input is the file which is being moved, a disk number (used to name the target directory), a new root path, and the old root path.  These paths are used to determine where to copy the file to.  For instance, I have my music stored as /usr/home/me/music/<arrtist>/<album>/, so the oldPath would be /usr/home/me/music/.  I want to store these disks to /mnt/<disknum>, so the newPath would be /mnt/.  Everything else about the path would be preserved, so if our old file was /use/home/me/music/coolband/bestalbum/bestsong.mp3, it would be copied to /mnt/01/coolband/bestalbum/bestsong.mp3 (assuming this is destine for the first disk).

The meat of this method starts out by building the name for the new path, creating any necessary directories which contain this file, and creating the actual file.  From there we connect IO streams to the source and destination files, and retrieve their channels for reading and writing.  Once we have the size stored, the actual instruction to transfer the file is basic, we call the transferTo method on the output target, and give it the start and end position as well as the input source channel.  The rest of the method is exception handling in case something goes wrong, and we finish by returning the size of the transfer.  It is interesting to note that the file copy itself is only one line, and even with setting up the IO, it's still less than 10.  Half if not more of the coding work involved is in admin work like generating path names and creating parent directories.

Armed with all these tools we are ready for the main function!

 private static long maxSize = 64000500000L;
 private static short maxDir = 998;
 private static int maxFiles = 20480;
 private static File rootDir = new File("/path/to/masters/");
 private static File newLocation = new File("/path/to/copy/to/");

 public static void main(String[] args) {

  long currSize = 0;
  int totalFiles = 0;
  short totalDirs = 0;
  LinkedList<LinkedList<File>> diskList = new LinkedList<LinkedList<File>>();
  LinkedList<File> disk = new LinkedList<File>();
  diskList.add(disk);

  LinkedList<File> headList = Tools.getImmediateChildren(rootDir, true,
    false, false, true);
  Collections.sort(headList);
  Iterator<File> i = headList.iterator();

  while (i.hasNext()) {

   File head = i.next();
   LinkedList<File> files = Tools.getAllChildren(head, false, true,
     true, true);
   LinkedList<File> dirs = Tools.getAllChildren(head, true, false,
     true, true);

   short numDirs = (short) dirs.size();
   int numFiles = files.size();
   long size = 0;
   for (File file : files)
    size += file.length();

   if (currSize + size > maxSize || totalFiles + numFiles > maxFiles
     || totalDirs + numDirs > maxDir) {
    disk = new LinkedList<File>();
    diskList.add(disk);
    currSize = 0;
    totalFiles = 0;
    totalDirs = 0;
   }

   disk.addAll(files);
   currSize += size;
   totalFiles += numFiles;
   totalDirs += numDirs;
  }

  removeAllChildren(newLocation);

  for (short diskNum = 1; !diskList.isEmpty(); diskNum++) {
   i = diskList.pop().iterator();
   while (i.hasNext())
    copyFile(i.next(), diskNum, newLocation, rootDir);
  }
 }

We start out by defining some limitations imposed by the hardware involved.  The maxSize is a limit of my usb thumb drives (approx 64*10^9), and the maxDir and maxFiles is a limitation of my player.  The rootDir is where we will build our disks from, our master list.  The newLocation is where we will copy our files to build our disks.

For the meat of our method we define a few numeric types to track the size, number of files, and number of directories in each disk; we define a diskList to hold each listing of files which will make up each disk.  Note that this is a nested list, meaning it is a linked list of linked lists.  We have created one linked list that stores other linked lists of files, and we will add them to it as we assemble them.  Next we create our first file list (representing a disk) and add it to the disk list.  The last step before we start to assemble our disks is to gather a list of all the artists, which each have a separate directory under the rootDir, we call this list the headList, and we sort so that the disks are split up alphabetically by artist/band.

Once this is all setup we simply iterate through each artist/band and collect a list of their files, directories, and a total of the size.  If they overflow our current disk we start on a new disk, otherwise, we add them to our current file list, and move on to the next.  Once this is all done and every disks file list is assembled we remove everything from our newLocation (this assumes that you have old disks that were built that will be superseded by the ones about to be copied), and we copy all the files to their associated disks.  Note that this last for loop would be faster if it iterated through diskList instead of using pop() because pop() incurs the overhead of un-linking a list node, though compared to the time that the copy itself takes, this is negligible...

There are lots of other things you an do to automate maintenance on large digital collections like this.  Do you have any ideas?

Handy way to negate a 2's compliment number: (~n)+1