Search

Friday, July 19, 2013

More geekyness - How is memory on a computer used?

Location: Salt Lake City, UT, USA
Since it had been a while since I had a geeky post, I figured it was time.  Following with the theme of the low level, I am going to delve into memory a bit deeper than I have in the past.  This will be a two-part post.  This post will cover the hardware and operating system view of memory.  My next post will cover the view of memory which programs see.

On your computer you have some amount of memory.  You might have 2-4 GB or a little less, or a lot more, but beyond the abstract idea that your programs go in there when they run, you likely have little idea what your computer actually does with this memory.  You may have seen something about virtual memory, seen something about swapping, or even heard people talking about locality (I'm talking spacial locality), and not understood what they meant.  I aim to clear up some of this confusion.  This post is from a programming perspective.

The memory on your computer, typically refered to as RAM (random access memory), is one large array of contiguous bytes.  If you have 2 GB of memory, you have 2*2^30 (two times two to the thirtieth power) total bytes, or 2147483648 bytes numbered from 0 to 2147483647.  This numbering forms an address system for the memory array, thought of as Physical Memory.  This is a very logical way to address memory, and lends itself to very fast hardware operations, however this method of addressing poses a problem for applications running on the computer.  To understand why, we must look at how programs are actually run by your hardware (I will go into much more detail on this subject in my next post).

When a program is run on your computer the code which makes up the program (called the code segment) is loaded into memory.  Spread liberally throughout this code are commands that jump to new memory addresses, load values from memory, save values to memory, and generally manipulate the memory in any number of ways.  If your computer is only running one program this is not a problem, but if you happen to run a second program, the computer will have to place it in a different part of memory.  Since all modern computers are able to run more than one program at a time, the programs will never know where in memory they are, and programs have no way to know ahead of time where memory values will be, something a program must be able to do because these addresses must be defined and resolved at compile time, before the program is run.  To resolve this issue Intel introduced a new architecture to their chips called Virtual Memory (not to be confused with your operating systems swap file, which is sometimes called Virtual Memory by Windows users).

Virtual memory is an addressing scheme where each running program is presented with its own address 'space'.  This is an abstract representation where each program running (or more properly put, each process) is presented with a memory space that appears to be its own computer.  This memory space is not real, in the sense that it does not directly correspond to the physical memory installed on the computer, it is a completely logical memory space as opposed to physical, an abstraction.  Since there is a disconnect between the physical memory, and what each program sees, a special piece of hardware, called a memory management unit (MMU) is added to the computer to translate these logical addresses to a physical space in your memory array.  Hardware translation is important because it is very fast.  This way each program can address its own memory as if it were the only program running, but your physical memory can be used to handle many of these programs running at once.  Virtual memory and the increasing use of multi-tasking operating systems are a large contributing factor to increasing memory requirements.

So on top of this physical memory we have a layer of abstraction giving us logical addressing.  But there is another traditional issue in computer memory which must be addressed before we can use (program for) a modern operating system, that is the issue of fragmentation and total space.  What do I mean by fragmentation?  When we are laying out programs in memory, we need to make sure those programs do not run out of space.  We also must ensure that all the memory for a program is near the other memory for that program, a concept called spacial locality.  This is important because many operations within a program will move much faster if the memory accessed is sequential (because memory is transferred in chunks), and cache schemes tend to depend to some degree on spacial locality.  To achieve this we over-allocate memory to each program leading to unused memory that is unavailable, fragmented.  Also there is the issue of running out of memory all together.  Programs regularly allocate memory.  There are many programs running on every computer which will cause it to crash if this request for memory is unsatisfied, and as we run more and more programs we use more and more memory.

To deal with these issues by making it appear that we have nearly unlimited memory (limited only by the address width) we turn to a vast but slow array of memory, your computers hard disk.  To accomplish this we introduce the abstract idea of a page, a limited range of memory of a predefined size which we can assign to a single process (program).  Pages could be of any size, but as with anything in memory are typically sized to be a power of 2, aligning properly in memory with addresses that are powers of 2.  We can then move this page (a limited range of memory) back and forth from the physical memory (via logical addresses) to the hard drive and vice-versa depending on how frequently it is needed and how full the physical memory is.  This memory paging is controlled by the operating system which builds and maintains a table of all the pages of memory allocated and manages the access and location of the data stored in that memory whether it be in physical memory, or stored on disk.  The file on the disk which holds the memory not located in physical RAM is called a page file (proper name) or swap file (descriptive).  Many people improperly or informally refer to this concept of paging as virtual memory, though this can lead to confusion as I mentioned eariler.

So if you made it this far, at the very least, the next time you get an error that refers to a page or paging you will know what is causing the issue...  Then you will restart your computer like everyone else!

The only other thing that I can think to say on the subject of virtual memory and paging is that they are working together.  Your computer, consisting of a CPU, MMU, memory and busses (which transmit data between places), handles the virtual memory translation, an address translation, and this logical view presented by the hardware is what is being manipulated by the software, or the logical system on your computer, including your operating system.  In fact, your operating system is responsible for talking to the hardware, and it is prudent to note that the OS can control how the hardware functions on this level (through user rings, and processor modes, I will not go into this here).  The operating system is what created pages.  So if logical memory is an abstraction of physical memory, pages are an abstraction of logical memory.


Computers are a layered collection of abstractions!


I will leave you with that thought, and I will follow up this post with another about how programs work with memory, and how applications are structured in memory.  If you had trouble following my post on switch statements and jump tables, this next post will help clear it up a lot!

Tuesday, July 16, 2013

Fast lane or passing lane?

Location: Salt Lake City, UT, USA
Something that irks me, and I know I am not alone, are people who monopolize the passing lane.  I know I am not alone because this is one of the few things I consistently complain about to those around me, and everyone I know immediately agrees and they commiserate with me.  However, where I will complain about misuse of a passing lane, most people I know will complain about misuse of a fast lane, and therein I believe is part of the problem.

Fast is an abstract idea.  Ask 10 people what is fast, and you will likely get 10 different answers.  By thinking of the far left lane as a fast lane you open this lane up to everyone's individual idea of what is fast.  This could be everything from what someone feels is the appropriate speed for conditions, to the speed limit, to as fast as a car can be pushed.  The result, the left lane becomes the same as every other lane, a mire of traffic to impede anyone who may want to travel faster than the slowest inhabitant.

This is the importance of defining this lane as a passing lane (which is often how state law will define the left lane in a multi-lane highway).  Passing is not an abstract idea.  You either are passing cars on your right, or you are not.  This creates a very clear standard, if you are not passing cars then you should not be in the passing lane.  If you are blocking the passing lane by not passing cars you are being an anti-social ass.

Short post today!  What do you think of people monopolizing the passing lane?  What do you call it?

Wednesday, July 10, 2013

Benefiting disproportionately from the common wealth

Location: Salt Lake City, UT, USA
Something that has been bandied around a lot for the past decade or so is this idea of 'job creators'.  This typically is used as a political term to describe an economic strategy that came about in the late 70's and early 80's, the idea of 'trickle down economics'.  This is the idea that the wealthy should be allowed to keep much more of their money than previously they had, the justification being that they would take all that money and invest it in things that would create jobs and raise the standard of living for everyone.

At first thought, this may seem like a logical strategy.  But it simply is a pipe-dream, as history has shown.  In fact, since that time the rich have gotten richer, and the middle class has stagnated, a fact that is clearly seen when one looks at the statistics over the last 30 or so years.  The reality is that the prosperity of our nation is fueled by the consumption of the middle class, and if the middle class incomes do not keep up with inflation, our economy cannot grow.  It can appear to grow, as we have now where so much wealth is created on paper out of thin air, but a stagnant middle class is a stagnant economy, no matter how rich the upper 0.1% appears on paper.  Not to say that means that the rich are all just as poor as us, far from it, this paper inflated wealth is spread nicely around among us all, many of us hold it through retirement accounts.

The struggle between income taxes is one that has been in the news recently, and I will not rehash it further, but instead I want to draw attention back to a tax issue which almost always is overlooked, the estate tax.  Opponents of this tax will refer to it as the death tax, but in honesty this would be similar to referring to a will as "the dead guys demands."  The reality is that the tax is leveled against the remaining estate of the most wealthy in our society.  These are imposed against estates valued at over 5 million dollars, more than twice what most people would make in their lifetimes.  This is important because most rational people understand the arguments against the estate tax, as they are not irrational arguments.

Why should someone be penalized because they are successful?  That is what all rational arguments boil down to on this issue, and it is a valid point.  I have heard many people attempt to respond to this question, and in my opinion, all but one group has failed to make an argument that acknowledged the moral question raised.  To understand why this argument is incorrect, I must show that the estate tax is not a penalization, but a repayment of a debt to the common wealth that we all create, own, and are responsible for.

I steal a statement from Bill Gates Sr. (yes, the father of the richest man on earth, at the moment), and assert that the rich deserve to pay disproportionately in taxes because the rich have benefited disproportionately from the common wealth.  What Mr. Gates is talking about here is important because it hints at one of the other true source of the wealth which we are capable of creating, the public infrastructure.

I have heard others talk about this.  Obama tried to point it out in a speech and was accused of being anti-business and saying business owners didn't work for what they have.  I am not a fan of Obama, and he has done his own defense against this, but his original point is quite valid.  He said that along with all the hard work that individuals put into their business, there is also work put in by the community.  The most obvious is the public system of roads and transportation that brings goods and supplied, as well as customers.  Obama talks about police, and firemen who protect businesses, funded by the public common wealth held by all of us.

It goes much deeper than that though.  We have a patent system which was created by and for all of us.  This patent system has been an enormous source of wealth for countless businesses, along with trademark and copyright law.  When businesses have their patents, written works, or trademarks stolen, they enjoy the services of another public sector entity, the courts.  When businesses seek to hire new talent, they do not have to invest significant sums in extensive ground-up training systems because they can walk over to the publicly funded schools in their area and find people trained in a myriad of skills which they require.

This is how those who are able to amass large amounts of money are benefiting from the common holdings of we the people, the common wealth.  This is why the rich are benefiting disproportional to the public.  They live in a society which has built an infrastructure that is the greatest generator of wealth in the modern world.  The wealthy shoulder a moral responsibility to return in proportion to their benefit to the system created by others to give them that opportunity.  This is indeed the moral imperative.  Some of the wealthy will return disproportionately to society if left to their own devices, however, this is by no means the norm.  Anyone who looks at the history of wealth distribution can see that this repayment to society cannot be voluntary.

Through most of our nations history of taxation the wealthy have been taxed at a much higher rate than the middle and lower classes.  Most of the time this meant effective tax rates of 60-80%, though tax rates on the top tier of the rich rose as high as 96%.  Tax rates now are effectively in the 12-15% range for the richest of Americans, and I believe my effective rate last year (which included a one time credit for money I spent on tuition) was 35%, I look forward to an effective rate closer to 40% next year.  I make almost exactly the median income at my current job.  If I pay taxes and don't spend a single penny of what I make, it would take me 142 years to make the 5 million dollars required to have to pay the estate tax.

If you look at where true economic prosperity comes from, the middle class consumers and the public infrastructure created through taxes, it becomes clear why the wealthy should pay proportionally more in taxes than the rest of the population.  I think the estate tax is a perfect tool in this quest because it ensures that the wealthy return to the system for the benefit they have received.  I think that humans are greedy beings, and that the estate tax helps to encourage people to create amazing things by allowing them to employ their money in their choice of pursuits, to hold their reward, but that at some point a portion of that accumulated wealth must be returned to the public which is charged with creating the infrastructure to ensure that future generations are allowed to enjoy the same prosperity that we can.

If humans were not humans and the world was perfect, perhaps this idea of trickle down economics would be viable, but the reality is that it is the job of government to structure markets in such a way as to encourage positive outcomes and discourage negative outcomes, and this idea of completely unregulated economies and markets is a daydream that is doing irrevocable harm to our society.

As always (and dammit, will someone someday take me up on this), tell me I'm wrong!  Or perhaps why I am right?

And here area few videos that inspired my post today:





Saturday, July 6, 2013

False equivalency - drinking Kool-Aid

Location: Salt Lake City, UT, USA
I have heard this one in lots of situations, but the one that really gets under my skin is how it is used in politics and business, and perpetuated by the news media.  The term I refer to is 'drinking the kool-aid'.  For this one a little bit of history is in order (apologies if you lived through this and know the story).

In the 50's an evangelical minister named Jim Jones founded a church called the Peoples Temple.  In the mid 70's Jones moved with most of his followers from Indiana to California.  The story of their time in California is too long to fully describe here, but to make the story short, though their membership continued to grow and Jim Jones had become politically well connected, the Peoples Temple had come under increasing scrutiny fueled by reports from former members, and current members families, through a group named the Concerned Relatives.  Under mounting pressure in the US, and increasingly socialist/communist aspirations for their group, Jim Jones negotiated with the Guyanese government for land in Guyana to establish a settlement, which would later be called Jonestown.

The story of the Peoples Temple in Jonestown is another long and involved story which I will skip here, but you need to understand that Jonestown was built on land deep within the Guyanese jungle on land that natives did not want because it was so difficult to work.  Life in Jonestown was hard, but the people there were (mostly) committed.  At its height there were around 1000 people living there, commune style.  However, back in the US, allegations were flying about people being held against their will, and crimes as serious as murder which left a lot of questions about the Peoples Temple and its leadership.  The allegations became so widespread that Congressman Leo Ryan, from California, arranged a delegation to visit Jonestown himself, accompanied by several members of the press, and family members from the Concerned Relatives group.  The meeting in Jonestown started out well, despite scheduling issues and trouble getting permission to visit the site when they arrived in Guyana.  Ultimately of over 900 present, around 8-12 requested to leave with the congressman, and were granted permission by both the congressman and Jim Jones.  This stirred up a very emotional scene as a parent who asked to leave attempted to take the children from a parent who was choosing to stay.  There was a small scuffle involving one such split family and a knife, after which it was suggested that the congressman and his delegation, which now included the defectors, should leave for the evening while tempers died down.  However, waiting at the airstrip where the congressman and his delegation were preparing to depart after having left Jonestown, several Peoples Temple members arrived and opened fire, killing around 5 including Congressman Ryan.

It is unknown if these killings were directed by Jim Jones (many accounts falsely report that Jones ordered the killing, any evidence of this is circumstantial), but there is no question that when the perpetrators returned, Jim Jones knew what had happened.  Fearing US retaliation for the killing, Jim Jones commanded his people to commit mass suicide by drinking fruit punch (flavor-aid or kool-aid) mixed with cyanide.  Over 900 temple members committed suicide that day, most by cyanide poisoning.  Jim Jones and his inner circle all died to gunshot wounds.  Something like 2 or 4 members who were present that day survived, one of them was hard of hearing and slept through the assembly bell.  Another saw what was happening and lied about needing to get something, then fled and hid in the jungle.  The Jonestown youth basketball team survived because they were elsewhere at the time.

This is the origin of the phrase 'drinking the kool-aid'.  It refers to a large group of people following an ideology or personality to the point of taking their own lives and the lives of their families.

Now, lets compare this to how the term is commonly used.

In 1984 President Regan used this term 'drink the kool-aid' to describe three civil rights activists,  Jesse Jackson, Vernon Jordan Jr., and Benjamin Hooks.  He went further to accuse them of dragging the country into a political Jonestown during the previous election.  Now, lets look at this.  Over the 20th century many black people were unjustly killed in America.  Civil rights leaders looked at the rule of our land, and held faith that in that framework was freedom for their people, and an end (or at least sharp decline) of the threat against their lives.  Perhaps you could see this faith in our system as blind faith, and though eventually the civil rights activists won, they certainly suffered greatly to win, and many were killed for their beliefs.  So in that regard perhaps Regan was right, for a while the civil rights leaders did drink the kool-aid.  They had blind faith that this country would protect them, and many died for that faith.  Unfortunately, this is not what Regan meant.

As the dot-com bubble burst, the New York Times indicated that the phrase was being commonly used to refer to the investors who thought the market would never stop going up.  So in this context we are comparing losing money to taking your life and murdering your family.  I'm not even going to elaborate on how ridiculous this comparison is.

The way this phrase is used in political commentary provides some of the worst examples.  Often it is bandied about by pundits, such as Bill O'Reilly, to simply denote that they find no logic in an alternative viewpoint.  He has some great quotes on the subject:
"Most talk radio is conservative-dominated, ideologue, Kool-Aid–drinking idiots."
You are a blind ideologue who even if somebody’s nice to you, won’t admit it because you’re… Talk about a Kool-aid drinker!
The first situation he is just talking about his competition, in the second he simply doesn't like this lady and finds her logic inconsistent.  Neither situation baring any resemblance to the events at Jonestown.  I pick on O'Reilly because he has got to be the absolute worst, but this issue pervades many other pundits.  O'Reilly has a whole reoccurring bit about kool-aid drinkers where he lambastes anyone who disagrees with him, complete with kool-aid-man graphics and all.

I see this used to describe people who voted for Obama, acceptance of a project at work, employee dress codes.  I even found a reference where it was used to describe a marriage...

One funny thing I found while doing a little research on the uses of this phrase were site after site of people asking what the kool-aid reference was in regards to Obama.  Many report thinking it was a racial slur, like a reference to fried chicken or watermelon might be.  I was saddened that so many would be ignorant of the origin of the phrase, but I couldn't help but chuckle at how revealing it is about our culture.

The sad part is that they do it knowingly.  Regan and O'Reilly lived through Jonestown, and both have a great understanding of the tragedy that unfolded there.  Yet they both made a choice to exploit those events for their own benefits.  They tarnish the memories of those who died at Jonestown by comparing their plight to that of a media star battling for ratings, or corporate executives losing money.  They are disingenuous with their audiences when they assume that their arguments are so weak that they have to be supported by raw emotions such as that brought up by references to tragic events.

This is not unique to Jonestown, just look at the comparisons in modern culture to the holocaust, Hitler, or communist Russia.  Few if any are appropriate, almost all are perfect examples of gross false equivalency.

As always, tell me I'm wrong!

Thursday, June 20, 2013

Judicious Process vs. Due Process

Location: Salt Lake City, UT, USA
Most people living in the US have heard of due process, and though few of us have probably studied the term, I think most of us have an idea what it means. Or so I thought, before I read Eric Holders speech where he outlines the legal justification behind sanctioned killings of US citizens, bypassing the courts.  I'll get to that a little later because the whole topic made me go back and do some reading, and though much of what I found was written in legaleze, I put the time in to try to figure out what it all said.

I don't think it helped though, because I am right back where I started.  I still think due process just refers to all the rights afforded to everyone.  Free speech, trial by jury, remaining silent, representation, among others.  There are a few different contexts in which the term is used, which can be confusing, but luckily, to address Mr. Holder's use of the term, we only need to look at how it is used in the constitution:
No person shall be held to answer for a capital, or otherwise infamous crime, unless on a presentment or indictment of a Grand Jury, except in cases arising in the land or naval forces, or in the Militia, when in actual service in time of War or public danger; nor shall any person be subject for the same offense to be twice put in jeopardy of life or limb; nor shall be compelled in any criminal case to be a witness against himself, nor be deprived of life, liberty, or property, without due process of law; nor shall private property be taken for public use, without just compensation.
This is, in fact, the entire fifth amendment to the constitution, part of our rights as citizens of the US.  I have bolded the section which references due process of law.  It's odd that the entirety of the amendment is quite explicit and direct, but the question of due process has always been more left to context than explicitly understood.

Now lets look at the meat of what Mr. Holder had said:
Some have argued that the President is required to get permission from a federal court before taking action against a United States citizen who is a senior operational leader of al Qaeda or associated forces.  This is simply not accurate.  “Due process” and “judicial process” are not one and the same, particularly when it comes to national security.  The Constitution guarantees due process, not judicial process.
Now, first I have to say that this paragraph is but one in a very long speech.  There is some context there.  He talks about the successes we have had in prosecuting terrorists, and all the plots that have been averted, and he even says a few things I agree with (damn few).  But the quote above is something I find very disturbing, and I think deserves a lot more attention than it got.

If you haven't deciphered it yet, he is basically saying that the 5th amendment doesn't guarantee that if you are accused of something serious, say killing someone, you will face a jury, or even get a court hearing.  That is what is meant by Judicious.  Instead, what Mr. Holder is saying is that due process is an abstract idea, that it simply means that some entity has to review it.  I would assume (perhaps hope is a better term) that there is some standard being applied, like that the entity given this power be impartial, at least I think that is being implied.  He goes on to make a historical argument that the president has traditionally been charged with maintaining national security.

THE ABOVE PARAGRAPH SCARES THE SHIT OUT OF ME!  In my mind we're basically being told that now the president has the legal authority to decide which citizens are a threat to our country, and summarily execute them.

To form a more specific rebuttal, I will start by pointing out that the entire 5th amendment is one sentence.  One cannot look at the term "due process of law" without also looking at the first few words, "No person shall be held to answer for a capital, or otherwise infamous crime, unless on a presentment or indictment of a Grand Jury".  There is a clear call for an indictment (an instrument of the judiciary), and a Grand Jury.  I think this is probably a foundation for a legal argument that I just don't have the experience to produce.  But I think the implications of Mr. Holders opinion reach so far that we as citizens have an obligation to stand up and oppose it.

Some people may look at this debate and not understand why it matters so much.  They look at the context and think that killing al-qaeda is probably a good thing, and they may be right, but the implications of what Mr. Holder is saying go deeper, in fact they question a fundamental understanding of one of the underpinnings of our society, the need for courts.  This opinion invalidates the idea that accused people are guaranteed the right to a trial in a court and judgement by a jury.  Due process becomes this abstract idea that something somewhere will review the situation to make sure it is all nice and legal.  It's the main ingredient required to form a tyranny.

Since I am not a lawyer, I can't give the eloquent rebuttal that I'm sure Mr. Holder would, I certainly can't cite the precedents.  But I can tell you one thing.  We are having out rights stripped from us by people like Eric Holder, and we the people need to stand up and start making some noise because the deeper we get the harder it's going to be to climb back out.  If we don't fight to preserve the idea that we were founded upon, this incredible American experiment is going to fail.

Tell me why I am wrong!

Tuesday, June 18, 2013

Colorado River - Ruby / Horsethief Canyons - Floating!

Location: Grand Junction, USA
Again, we got a group together, loaded up the boats, and hit the Colorado River to raft through Ruby and Horsethief canyons.  We went for twice as long as our previous trip, and I took some video over two of the float days.  Below is the video.  If you are looking for more textual information about this trip, see my previous post on the area!




Feel free to leave feedback!

Monday, June 17, 2013

What are switch statements under the hood?

Location: Salt Lake City, UT, USA
If you don't know what a switch/case block is (often called a switch statement, or switch condition), then this post will likely not be of any use to you.  I apologize, and promise that this will be the last very technical post I make for a great while.  If you're a programmer and don't know what I am talking about, break out google and learn about switches, then come back so you know how they work and can properly take advantage of this wonderful tool!

Switches are great!  Any time that you have several 'if / else if / else' clauses chained together to represent a number of potential paths, you have a great candidate for a switch.  At the very least a switch statement will likely give you code that is much more clear and easy to read.  They are concisely and explicit.  When given the opportunity to shine, a switch can dramatically improve the speed of your code.

To see how switches can improve your code you need to understand how they are being performed 'under the hood', and to understand this we have to actually understand a little about the machine code that your source code is compiled into.  We do this typically with an assembly language, but I will not assume you are fluent in assembly.  Instead I will try to describe what is happening.

Everything in your program is stored in memory before making it to the processor.  Instructions (commands), variables, strings, everything.  Processors only understand rather simple commands, things like move this value from memory location A to memory location B, jump to instruction located at memory location B and continue processing from there, compare values in A and B, add two values, and other basic operations.  These basic operations can be stringed together to accomplish some higher level task.  For instance, you could write a basic assembly program to find the absolute value between A and B:

AddressCommand
0Test if A is greater than B
1If test is true jump to address 4
2Subtract A from B (answer is in B)
3Jump to address 5
4Subtract B from A (answer is in B)
5Return to caller

When we are finished running this code, the result is stored in B (A and B both represent registers, memory on the processor to store things we are currently operating on, usually 4 to 8 bytes each).  We could have accomplished the same thing other ways, for instance, we could have just subtracted them, tested if the answer was less than 0 (negative), and if so we could have multiplied by -1, or even flipped the bits and added one.  In programming there are almost always several ways to do the same thing.

Now that we understand a little bit about how assembly works we can talk about something called a jump table.  A jump table is not something explicitly offered by assembly.  Instead it is a logical programming construction, something someone made up as a way to use assembly to create something useful.  To see how they work, it helps to lay out a scenario in which we would use them.  For our example we will say that we have a game, and the player should be able to choose between 6 different classes.  In C we might make some constants for each class to map the class name to a number which will be used internally to determine the class, in Java we may use an enum to do the same thing.  Regardless of how we represent it, we will abstract our classes to numeric codes 0 through 5.  Here's how this may look as an if/else if block:

if (chosenClass == WARRIOR)
    hitPoints = 50;
else if (chosenClass == MAGE)
    hitPoints = 15;
else if (chosenClass == WARLOCK)
    hitPoints = 20;
else if (chosenClass == ROGUE)
    hitPoints = 30;
else if (chosenClass == NINJA)
    hitPoints = 40;
else if (chosenClass == BARD)
    hitPoints = 10;

Note: ALL CAPS represent enums or constants, numbers, 0-5.

A switch form of the above code might look like this:

switch (chosenClass) {
    case WARRIOR: hitPoints = 50; break;
    case MAGE:    hitPoints = 15; break;
    case WARLOCK: hitPoints = 20; break;
    case ROGUE:   hitPoints = 30; break;
    case NINJA:   hitPoints = 40; break;
    case BARD:    hitPoints = 10; break;
}

So I hope that by looking at my previous example of pseudo assembly, you can see the obvious way that this may be implemented in machine code.  A few comparisons for equality and jumps to different addresses and you're done.  But since I am writing about it, there must be a better way.  Let me propose that instead of doing comparisons, we do some calculating.  We have 6 total choices, and the range is 0 to 5, this is important.  Note that there are no gaps, though this is not a necessity.  Also recognize that once we have made a choice, we have our 6 possible outcomes and that each of these in machine code is two operations: one to set a memory location or register value, and another to jump past our choices (the equivalent of break), twelve total lines of code.  We now can write something with this logic (the chosenClass is held in A, hitPoints is held in B):

AddressCommand
00Multiply A by 2 and store the result in A
01Jump to address stored in (A + 2)
02Set B to 50
03Jump to address 14
04Set B to 15
05Jump to address 14
06Set B to 20
07Jump to address 14
08Set B to 30
09Jump to address 14
10Set B to 40
11Jump to address 14
12Set B to 10
13Jump to address 14
Our code has a really cool feature to it.  Where the logic in our C/Java style code using if/else blocks may have to test our condition up to six times, our assembly code is able to identify the correct code with two basic math operations and a direct jump to a new memory location.  No matter which choice is made, we will never execute more than four instructions.  In computer science we refer to this as constant complexity, and this is really the holy grail of algorithms.  Anytime you see complexity go constant like this, you know you are looking at something special.

So how is this working?  First we look at how large the commands that we are jumping to are, in this case they are each 2 instructions, Set B to a value, and jump beyond the switch (to instruction address 14).  To account for this we multiply A by 2.  We then look at where these commands start, they start at address 03, so we add 2 to A (2 because we are using 0 as a number), and this defines where we jump to in memory.  If the switch input is 3, we multiply by 2, add 2, and we find that we should be jumping to address 08, set B to 30, which properly corresponds to our ROGUE, which is correct for an input of 3.

But this is only a simplified version of a jump table, contrived to show just the basic logic and illustrate how jump tables avoid complexity and as a result can substantially speed up your code.  Our logic here is great to illustrate those points, but it is terribly limited in application.  What if the code that we executed on a matched condition was not uniform in length?  Suddenly our trick of multiplying by 2 fails!  What if the numbers don't start at 0, or 1, what if they start at 1430 and go to 1634, a range of 204?  Our code again breaks.  What if there are gaps in the numbers used?  Using my example code, this quite real possibility is completely undefined -- entirely unacceptable for computers.  Being undefined simply means that we have to define some behavior which will consistently generate correct results.  For our needs this is actually quite easy, but to reveal our strategy we first have to resolve the first issue, that of varying length conditional blocks, the code we want to run when we match is not of uniform length.  We can easily handle ranges that do not start at 0 or 1 by converting them to our range.  1430 to 1634 is the same as 0 to 203, so we can do a little basic subtraction and convert ranges, and again, all this is done in constant complexity!

We can solve the issue of non uniform lengths by using an intermediary list which has some information about how these different lengths impact the addresses at which each starts.  As with everything on a computer, our list goes into memory with each choice being a memory address (4-8 bytes), more specifically, the memory address of one of the instruction blocks.  We then can calculate a position into this list, and retrieve the memory address to jump to for the associated instructions.  We can count on the memory locations being either a multiple of 4 or 8 in our list, and we now can use our simple logic above to jump to the correct locations.  Let me illustrate this with code, using 4 as the multiplier to simulate a 32 bit system:

AddressCommand
00Multiply A by 4 and store the result in A
01Retrieve the value stored in our LIST at offset A, store it in A
02Jump to address stored in A
03Set B to 50
04Set C to 12
05Jump to address 20
06Set B to 15
07Jump to address 20
08Set B to 20
09Set C to 18
10Jump to address 20
11Set B to 30
12Jump to address 20
13Set B to 40
14Set C to 100
15Jump to address 20
16Set B to 10
17Jump to address 20
18NO OPERATION (processor just does nothing for this instruction)
19Jump to address 20
Our LIST looks like this:

OffsetValue
0003
0406
0808
1211
1613
2016
2418
So again, lets say we want to pick class 3, we multiply that by 4, and get 12.  We look up the address stored at offset 12 in the LIST, and find address 11.  We jump to the address stored in our list, 11 and we find we should set B to 30 and jump past to the end.  Again, this corresponds to the ROGUE, number 03.  Our algorithm is correct, and we now have greatly improved the usability of our logic at the cost of a little extra memory.  We retain our constant complexity, which is key.  No matter how large a list of conditions we are working with, we will never had to run more than three instructions to get there, and never more than 5 or 6 to be finished!

Our last task is defining an action when the values have gaps.  This also allows us to handle a default case if we desire.  The first thing to notice about the code above is that we have more items in our list than we have possible choices.  We have added an address at offset 24, which would correspond to a 7th element.  This is a default case, and corresponds to instruction address 18, where we just do nothing and then jump to the end (our code has no default case).  Handling gaps is simple, we just add an entry in our LIST for that number, and set the value to be the instruction address after our code, in this case address 20.  That way if we reach a number with no corresponding instructions, we can simply jump right past our code and continue processing.  Again, all of this maintains constant complexity, our ultimate goal!

An astute observer may notice a flaw in this logic.  What if our range is huge with a lot of gaps.  What if we have conditions to match 0, 4, 10, 50, 945328, and 1 << 63?  We have 6 possible values, but a range that goes from 0, to some obscenely large value (2^63).  Not only does do our gaps occupy substantially more space than our valid options, but our list must have 2^63 entries!  You could never hope to hold all this in the few gigabytes (or event terabytes) of memory available to us.  Luckily modern compilers are smart enough to recognize the conditions which will lead to excessive wasted space like this, and will only compile code into jump tables when it can determine that the range and gap density is low enough to make a jump table efficient.  In other words, not all conditional chains or switches will result in a jump table.  Only those which will lend themselves to increased efficiency, at a cost that is not disproportionately large to memory, will be converted into jump tables.

I think that's about it, but if you think I left something out, notice any mistake, have questions, or anything at all, leave a comment!

Tuesday, June 11, 2013

Keeping Fish and the Aquatic Nitrogen Cycle!

Location: Salt Lake City, UT, USA
I don't know about the rest of you, but I love fish!  I have been an avid enthusiast and hobbyist since I was young.  My father and I were both introduced to the hobby at the same time, on a whim stopping in a pet store where the manager was a Cichlid fanatic.

Marlin the clownfish
One of the big barriers to keeping fish is knowing how to keep them alive.  Unfortunately there is a lot of misinformation about keeping fish out there.  Really, to keep fish, you only need two things besides the obvious of water and something to hold the water.  You need to regulate the temperature (usually a heater is quite sufficient), and you need to filter the water.  I've already said about all there is to say about regulating temperature.  The only thing to add is that some tanks, usually with very powerful lights, will require chilling instead of heating.

Marine aquarium filtration (salt water)
There are several types of filtration: mechanical, chemical, and biological.  Mechanical filtration is what most people think of when they think of filters.  This involves physically trapping non-dissolved matter, usually in some sort of mesh.  Though this does a great job of removing larger visible impurities, the primary benefit to mechanical filtration is that it can help keep your water more clear, which makes little difference to the fish but can increase your enjoyment in watching them significantly.  Mechanical filters can range anywhere from a foam pad, to a micron filter further reduced by diatomaceous earth, the latter of which will even filter out mid-water algal and bacteria blooms, and even a method called foam fractionation which is used in marine aquariums.

Chemical filtration uses substances that absorb impurities from the water.  The most common substance used is activated carbon, though other substances might be used depending on what chemical absorption is required.  This can appear like biological filtration in some regards, you can remove some of the same substances that biological filters take care of, but it accomplishes this by absorbing the impurities.  What most people don't understand about chemical filtration is that it is a band-aid for a real issue.  There is no way to know how effective it is being, and once it fills up, it starts to leech chemicals back into the water.  Chemical filtration, in my opinion, should only be used for very short periods of time between changing (8-24 hours) in emergency situations while you are working to fix the underlying cause of the issue.

Cleaner shrimp working on a polyp coral
So that leaves us with biological filtration, the core of keeping fish, and the core of biological filtration is something called a nitrogen cycle.  Understanding this is really the key foundation of keeping anything alive that lives in water.  It's all you really need to know to keep freshwater fish alive, and the first stepping stone toward keeping marine fish (salt water) and invertebrates, like corals and anemones.  There are a lot of things that go into water, but true waste is primarily nitrogen, in the form of ammonia.  Even uneaten food and dead organisms rotting are ultimately producing ammonia.

Ammonia is a poison to fish, and a high enough concentration will lead to all sorts of problems resulting in a dead tank.  We MUST filter this out!  We look to nature and find that there are aquatic bacteria which actually eat ammonia!  Nitrosomonas basically eat ammonia and produce another form of nitrogen called nitrite(s).  This form of nitrogen is not as toxic as ammonia, but is still a toxin in a fish environment, and must be removed.  Again we look to nature and find another little aquatic bacteria called Nitrobacter which eats nitrites.  Like it's cousins, Nitrobacter produces a nitrate as a byproduct.  Nitrate is a substance you might have heard of before.  It is a form of nitrogen commonly used in fertilizers, and is relatively non-toxic.  This whole process of converting ammonia to nitrate is the nitrogen cycle, and basically all life depends on it.

Understanding this transfer from ammonia to nitrite, then nitrite to nitrate, is the foundation for understanding how nature filters water.  But how do we apply this to our tanks?  We look at how these bacteria live in water, and we find that they flourish best when they have a lot of surface area to colonize (rocks and sand perform this in nature) and a lot of oxygen, which in aquatics means swift moving water with a lot of surface agitation and a large area for exchange with the air.  So in your aquarium you need good water to air surface area (go wide or long, not tall), have good water movement (most fish really want a lot more than most are given), and agitate the surface a little.  It is not the air bubbles rising that help get air in water, it is their breaking at the surface that helps promote gas exchange.

20 gallon long, a lot of surface area.  Notice the rippling effect, great for gas exchange!
Provide a lot of surface for bacteria to live on.  Under-gravel filters of days-gone-by use the gravel as an actual medium for bacteria to grow, but those filters tend to trap waste under the filter plates and it never gets removed, and the flow is not strong enough to provide oxygen.  This results in an anaerobic (non-oxygen breathing) bacteria colony, which you do not want.  Instead, fill the tank with porous rock, with a lot of surface area, and get good Powerhead style pumps to keep things moving.  Lace and some lava rocks work well, I suggest you consult with your local fish store to pick out the right rocks that won't leech bad stuff in your water.  There are also a lot of filters that provide mediums for bacteria growth, from Marinelands power wheels, to bio-balls, and canister inserts, but I prefer to stick with the more time tested approach of having the system support itself (I do advocate sumps, separate tanks hooked together, to help increase water volume, and move some of the bacteria supporting rocks out of the main display tank).

Armed with well oxygenated water, good strong water flow, and plenty of rocks for bacteria to colonize, your tank will process its own waste into harmless nitrate that can be removed by aquatic plants, or with regular water changes.  Failure to remove nitrate can lead to algae growth.

Aquarium Sump
A few things to keep in mind.  Bacteria takes some time to grow, and will not grow until there is an excess of food (ammonia and nitrite).  When first starting out it is important to understand that you will have to cultivate this bacteria up slowly as the tank matures.  This initial 'cycle' typically takes 2-3 months (but can vary from 1 month to a year, it all really depends a on a lot of factors, assume 2-3 months).  To get ammonia in the tank you need something alive.  Just one fish per 20-40 or so gallons will work well, assuming they are fairly small, less if they are larger fish, and you should already have a vision for what you want in the tank, so you should know which fish to get (least aggressive variety is best to start with).  Get a test kit for ammonia and check it every 3-7 days.  You should see the ammonia go up, and plateau for a week or two.


You should be like a month in already and be frustrated at how boring your tank is so far.  That is OK!  Just sit on that emotion and in the end you will come out with a work of art and a piece of pure beauty that everyone ooh's and ahh's over.  Get a nitrite test kit and you should start to see that the nitrite is going up slowly.  Keep an eye on the nitrite, and watch how after a week or two the ammonia has dropped down to fairly safe levels as the nitrite rises.  You could add a few more fish at this point, but only a few, go really slow.  The nitrite will go through the same pattern which ammonia did, it will plateau, then start to drop as new the bacteria colony grows to consume the excess resources.  You want to get a nitrate test as well, and again, you can watch nitrate rise as nitrite fall to safe levels.  Once this is done you have 'cycled' the tank, and have a full nitrogen cycle going on!  Pretty cool huh?

A little Blenny!
Again, go slow from here.  each time you add more fish that ammonia is going to spike again, more bacteria will grow, ammonia will drop as nitrite grows, then nitrite drops as nitrate grows.  Too much of a swing and your fish will suffer, and likely get sick.  So take it slow, give yourself a good year, maybe even two, to fully stock the tank, and you will end up with something you love instead of an algae infested death trap that you loath to look at.

Six-Line Wrasse

If you have any questions about fish or fishkeeping, ASK!






Monday, June 3, 2013

Java - Pass by Reference Value!

Location: Salt Lake City, UT, USA
To maintain a little bit of the geeky theme I have had going this last week, I wanted to talk a little about something I think is often misunderstood by those beginning in programming, and can be especially confusing to those moving to Java from other languages: the subject of reference vs value.

The primary place where I see this confused is in method parameter passing, but a better understanding of this can pay off in other areas as well.  Also, one point of note, the terms function and method are interchangeable in the topic of computer science. Function is the traditional term, borrowed from mathematics, and used in languages like C.  Method is simply an alternate term, and is favored in Java.  The terms are interchangeable in this context.

Before we start to discuss the differences between them, we should ensure we understand the terms:

Value is what we would call an immediate.  It is a section in memory where an actual value sits.  It could be any data type, an int, double, char[], or even a struct.  On the inside this is just a set amount of memory, contiguous bytes, the number of which depends on the size (or width) of the datatype being stored there.

Reference is simply a memory address which points to a 'value'.  References are held in variables, in C/C++ they are called pointers, which is an ideal name for them, as they literally point to somewhere else in memory.  I will continue to call them pointers here.  The size of a pointer is simply the size (or again, width) of the memory addresses on your hardware (either 4 or 8 bytes typically), and will never be any different because it simply stores a memory address that points to some value.

Values are fairly straightforward to understand, but references are more difficult, mainly because references are also values, but special values, memory values.  This means that references can point to other references, forming a chain which hopefully ends at a real object that has been allocated in memory.  A reference which does not point to a valid allocated object in memory is considered a 'null pointer', it should not be used other than to write a valid address to.



The most succinct way I can think of these relationships is that a value is an actual value that is stored in memory.  A reference is the address to a memory location where a value is stored.  And a pointer is a special type of value, it only holds references (memory addresses), it's the manifestation of a reference.



One of the useful reasons for having references is to facilitate parameter passing, and that is primarily what I intend to address today.

Say you have a exponent or power function, it takes int A and B and returns a long, A to the power of B (A^B).  On our imaginary system an int is 32 bits and a long is 64, so we have a total of 64 bits of input to our function, or 8 bytes of data, in computer speak this is basically nothing.  To run this function we allocate 8 bytes, copy over the values that the method caller provided, and crunch the numbers.  When it's time to return our long (which we had to allocate within our function) we again have a small amount of data and we simply copy it over to a location where the calling function knows to find it.  This method of passing parameters to a function is called pass by value.  We are literally passing the value of the variable provided to the function (or more precisely, an identical copy of that value).

Now say you have a sort function, it takes a linked list, and has no return value (void return type), it sorts the list provided in-place.  When sort function is called what happens?  Given the scenario in the previous paragraph we would assume that enough memory would be allocated to hold the linked list, the values held in the original list would be copied over to this new list, and we would sort the values within this new list.  But this has a few issues.  For one, we couldn't have a return type of void because without returning, we have no way of getting our copy of the list back to the caller.  But even if we return our new list, what if our list is big?  I mean really big?  What if our list was more than half our available memory and we couldn't copy the whole thing to pass to the function?

Do we really require twice as much space as our list requires to call a sort method?  Of course not!  This is where references come in.  Instead of copying our list over to our function, we can copy the address of our list which is what we mean by a reference, and then the method can access it without having to copy or move it at all.  Remember, a reference is typically 32 to 64 bits, basically nothing compared to even a small linked list.  This is referred to as passing by reference since we don't copy the actual value we want to operate on, we pass its memory address so that the function can work directly on the object.

So you can probably see why we need pass by reference, but why then do we bother with pass by value?  The main reason that I am aware of is that references do not allow a caller to protect their data from a function.  When a function is passed a reference there is a risk that the function will modify the input data which the calling function is still using, and this could cause the calling function to violate its assumptions about the state of its own data.  When you pass by value you guarantee that none of your parameters are modified without your knowledge, as the called function is only operating on copies.  There is also a question of optimization.  If you have a function that takes a byte as input, passing by reference might use four times as much memory to pass parameters as passing by value, on a 64 bit system it might be 8 times.  Just as we might want to optimize functions that handle very large inputs, we also should be able to optimize functions with small inputs.

Java does away with the question of reference vs value by making all primitive data types (char, short, int, long, float, double, etc...) pass by value, and all non-primitives pass by reference-value.  Java has no pointers because every reference to a non-primitive type is always a pointer.  There is no special syntax involved because this is simply how all non-primitive variables work, they are all pointers to the object.  So what is passed to a method in java is a copy of that reference, or in other words a totally new variable (which under the hood is a pointer), with a copy of the address to the variable passed to the function.  This distinction can be a little tricky to sort out at first, and difficult to describe in text, so you might want to read that a few times...

Some of the more insidious 'bugs' I have seen in beginning students Java code is related to not understanding how method parameters are passed under the hood.  How modifying the contents of an object itself will be reflected in the calling methods copy of the input, but something like reassigning the variable to a new object entirely will result in no change to the object passed in by the caller.  In other words there is a big difference between:

public static void copyList(LinkedList a, LinkedList b) {
    a.clear();
    a.addAll(b);
}

and

public static void copyList(LinkedList a, LinkedList b) {
    a = b;
}

The first example above will result in the calling function having two lists that contain the same nodes in the same order, as we have simply performed operations on the original lists provided by the caller.  In the second example we have assigned the address for object b to the variable named a.  the variable a now points to the same object that the variable b points to.  In fact, within the second method we no longer even have a reference to the list originally passed in as parameter 'a' at all!  Our second method has not changed the original inputs as expected, it simply changed how the method references those lists, duplicating the reference to the list passed in as 'b' and losing the reference to the list passed in as 'a'.

This is a subtle but important distinction to understand if you hope to write functional Java code.  If you are struggling, leave a comment and I might be able to help make it more clear!

I had intended to write a little about the stack and the heap (dynamic vs static memory) and why this distinction of value vs reference exists in the first place, but this post has already grown larger than I would like, so I will save that discussion for another day!

Sunday, June 2, 2013

Java file organization - part 2

Location: Salt Lake City, UT, USA
So last post I showed how I built a few quick file utilities in Java.  If you will recall, I have a very large music collection I copy to (currently) 8 usb thumb drives for use in my car.  I do not want bands broken up between disks, and I want to avoid copying files that my car stereo does not read.  So I wrote a few Java utilities to help me automate this task as my music collection grows.

This post I am going to show two more utilities, and then I'll show how I use these methods to build my disks.  We will need a method to copy files and a method to delete files (to clean out the old files when I rebuild the disks).  I'll start with the remove method because it is the smallest.

 private static void removeAllChildren(File root) {
  LinkedList<File> toRemove = Tools.getAllChildren(root, false, true,
    false, false), toCheck = Tools.getImmediateChildren(root, true,
    false, false, false);

  Iterator<File> i = toRemove.iterator();
  while (i.hasNext())
   i.next().delete();

  toRemove.clear();
  while (!toCheck.isEmpty()) {
   File file = toCheck.pop();
   LinkedList<File> temp = Tools.getImmediateChildren(file, true,
     false, false, false);
   toCheck.addAll(temp);
   toRemove.addAll(temp);
   toRemove.add(file);
  }

  Collections.reverse(toRemove);
  i = toRemove.iterator();
  while (i.hasNext())
   i.next().delete();
 }


Basically we give it a folder, and it will remove all children.  Our general strategy is to first collect all files and simply remove them.  Next we iterate through toCheck and collect all the remaining directories and store them for removal as we continue to check them.  We also add the current directory to the list to remove.  This is similar to the logic we used in the last post to build our folder listing.  Finally, we reverse toRemove so that we are removing the deepest nested directories first, and we iterate again to remove them all.  This should all be rather straightforward.

The next method is a bit more complicated, but not too bad.  We're using nio channels to perform the actual copying of the contents.

 private static long copyFile(File fromFile, short diskNum, File newPath,
   File oldPath) {

  String disk = Short.toString(diskNum);
  if (diskNum < 10)
   disk = "0" + disk;

  String path = fromFile.getAbsolutePath().replace(
    oldPath.getAbsolutePath(),
    newPath.getAbsolutePath() + File.separator + disk);

  File toFile = new File(path.replace(fromFile.getName(), ""));
  if (!toFile.exists())
   toFile.mkdirs();

  toFile = new File(path);
  try {
   toFile.createNewFile();
  } catch (IOException e1) {
   // Choose an action here!
   return 0;
  }

  FileInputStream source = null;
  FileOutputStream destination = null;
  long size = 0;

  try {
   source = new FileInputStream(fromFile);
   destination = new FileOutputStream(toFile);

   FileChannel sourceFileChannel = source.getChannel();
   FileChannel destinationFileChannel = destination.getChannel();

   size = sourceFileChannel.size();
   sourceFileChannel.transferTo(0, size, destinationFileChannel);

  } catch (FileNotFoundException e) {
   // Choose an action here!
   return 0;
  } catch (IOException e) {
   // Choose an action here!
   return 0;
  } finally {
   try {
    source.close();
    destination.close();
   } catch (IOException e) {
   }
  }
  return size;
 }

This method will return a long representing the size of the copied file (0 if the copy fails).  The input is the file which is being moved, a disk number (used to name the target directory), a new root path, and the old root path.  These paths are used to determine where to copy the file to.  For instance, I have my music stored as /usr/home/me/music/<arrtist>/<album>/, so the oldPath would be /usr/home/me/music/.  I want to store these disks to /mnt/<disknum>, so the newPath would be /mnt/.  Everything else about the path would be preserved, so if our old file was /use/home/me/music/coolband/bestalbum/bestsong.mp3, it would be copied to /mnt/01/coolband/bestalbum/bestsong.mp3 (assuming this is destine for the first disk).

The meat of this method starts out by building the name for the new path, creating any necessary directories which contain this file, and creating the actual file.  From there we connect IO streams to the source and destination files, and retrieve their channels for reading and writing.  Once we have the size stored, the actual instruction to transfer the file is basic, we call the transferTo method on the output target, and give it the start and end position as well as the input source channel.  The rest of the method is exception handling in case something goes wrong, and we finish by returning the size of the transfer.  It is interesting to note that the file copy itself is only one line, and even with setting up the IO, it's still less than 10.  Half if not more of the coding work involved is in admin work like generating path names and creating parent directories.

Armed with all these tools we are ready for the main function!

 private static long maxSize = 64000500000L;
 private static short maxDir = 998;
 private static int maxFiles = 20480;
 private static File rootDir = new File("/path/to/masters/");
 private static File newLocation = new File("/path/to/copy/to/");

 public static void main(String[] args) {

  long currSize = 0;
  int totalFiles = 0;
  short totalDirs = 0;
  LinkedList<LinkedList<File>> diskList = new LinkedList<LinkedList<File>>();
  LinkedList<File> disk = new LinkedList<File>();
  diskList.add(disk);

  LinkedList<File> headList = Tools.getImmediateChildren(rootDir, true,
    false, false, true);
  Collections.sort(headList);
  Iterator<File> i = headList.iterator();

  while (i.hasNext()) {

   File head = i.next();
   LinkedList<File> files = Tools.getAllChildren(head, false, true,
     true, true);
   LinkedList<File> dirs = Tools.getAllChildren(head, true, false,
     true, true);

   short numDirs = (short) dirs.size();
   int numFiles = files.size();
   long size = 0;
   for (File file : files)
    size += file.length();

   if (currSize + size > maxSize || totalFiles + numFiles > maxFiles
     || totalDirs + numDirs > maxDir) {
    disk = new LinkedList<File>();
    diskList.add(disk);
    currSize = 0;
    totalFiles = 0;
    totalDirs = 0;
   }

   disk.addAll(files);
   currSize += size;
   totalFiles += numFiles;
   totalDirs += numDirs;
  }

  removeAllChildren(newLocation);

  for (short diskNum = 1; !diskList.isEmpty(); diskNum++) {
   i = diskList.pop().iterator();
   while (i.hasNext())
    copyFile(i.next(), diskNum, newLocation, rootDir);
  }
 }

We start out by defining some limitations imposed by the hardware involved.  The maxSize is a limit of my usb thumb drives (approx 64*10^9), and the maxDir and maxFiles is a limitation of my player.  The rootDir is where we will build our disks from, our master list.  The newLocation is where we will copy our files to build our disks.

For the meat of our method we define a few numeric types to track the size, number of files, and number of directories in each disk; we define a diskList to hold each listing of files which will make up each disk.  Note that this is a nested list, meaning it is a linked list of linked lists.  We have created one linked list that stores other linked lists of files, and we will add them to it as we assemble them.  Next we create our first file list (representing a disk) and add it to the disk list.  The last step before we start to assemble our disks is to gather a list of all the artists, which each have a separate directory under the rootDir, we call this list the headList, and we sort so that the disks are split up alphabetically by artist/band.

Once this is all setup we simply iterate through each artist/band and collect a list of their files, directories, and a total of the size.  If they overflow our current disk we start on a new disk, otherwise, we add them to our current file list, and move on to the next.  Once this is all done and every disks file list is assembled we remove everything from our newLocation (this assumes that you have old disks that were built that will be superseded by the ones about to be copied), and we copy all the files to their associated disks.  Note that this last for loop would be faster if it iterated through diskList instead of using pop() because pop() incurs the overhead of un-linking a list node, though compared to the time that the copy itself takes, this is negligible...

There are lots of other things you an do to automate maintenance on large digital collections like this.  Do you have any ideas?

Handy way to negate a 2's compliment number: (~n)+1

Monday, May 27, 2013

Java file organization - part 1

Location: Salt Lake City, UT, USA
In my car I keep all my music on a collection of, currently 8, 64GB USB thumb drives.  It's great!  I can fit years worth of music into a case that would only hold 5-10 CD's or DVD's.  I have a directory for each artist which contains a separate folder for each album.  I can print out this list of artists/albums very easily, print and laminate it, and keep it in my car to easily find what I am looking for.

This arrangement only really has two drawbacks: I have to split my music library into chunks less than 59.6GB (since disk manufacturers define a GB as 10^9, and your computer defines it as 2^30), and my car stereo will only support files of the type mp3, wma, m4a, and wav.  Files like video's, album covers, and playlist (.m3u) files are useless to transfer for my car.  Splitting my collection, efficiently, and weeding out the unneeded files would take me ages.  And my collection continues to expand.  I would be fumbling through menus, drag & drop, and shortcut keys forever doing it all by hand.


What are computers but the most versatile tools created by man?  Why spend my days doing something that a computer can do so much faster?  To my rescue comes Java, a handy little language with a short development time for small widgets like I would need, but the power and speed to efficiently manage a task as massive as I demanded.

The other day I took about five hours and wrote up a set of utilities to manage the vast majority of my music file management tasks, including building and copying my disk lists.  I like little utilities like this, they help me keep my skills limber.

I figured this might be a good topic to write a little how-to on some of the stuff I did.  I will assume you know some basic java as well as the LinkedList class.

The heart to any file organization is being able to find all the children of a given directory.  So to this end I wrote a helper method and two basic filters to do just that:

 public static String validExtRegex = "^.*?(\\.mp3|\\.wma|\\.m4a|\\.wav)$";

 public static LinkedList<File> getImmediateChildren(File root,
   boolean getDirs, boolean getFiles, boolean onlyValidFiles,
   boolean skipEmptyDirs) {

  LinkedList<File> list = new LinkedList<File>();

  if (!root.isDirectory())
   return list;

  if (getDirs) {
   File[] children = root.listFiles(isDirectory);
   for (int i = 0; i < children.length; i++) {
    if (skipEmptyDirs) {
     if (children[i].listFiles().length < 1)
      continue;
     boolean skip = true;
     for (File f : children)
      if (f.isDirectory()
        || (onlyValidFiles && f.isFile() && f.getName()
          .toLowerCase().matches(validExtRegex))) {
       skip = false;
       break;
     }
     if (skip)
      continue;
    }
    list.add(children[i]);
   }
  }

  if (getFiles) {
   File[] children = root.listFiles(isFile);
   for (int i = 0; i < children.length; i++) {
    if (onlyValidFiles
      && !children[i].getName().toLowerCase()
        .matches(validExtRegex))
     continue;
    list.add(children[i]);
   }
  }

  return list;
 }

 public static FileFilter isDirectory = new FileFilter() {
  @Override
  public boolean accept(File pathname) {
   return pathname.isDirectory();
  }
 };

 public static FileFilter isFile = new FileFilter() {
  @Override
  public boolean accept(File pathname) {
   return pathname.isFile();
  }
 };

We start by defining a regular expression (regex) to match known good file types.  A discussion of regex is beyond this post, but this basically says to match anything which ends in .mp3, .wma, .m4a, or .wav.
The method takes a File object which should be a directory which we want to retrieve the children for, the rest of the parameters are booleans for whether we should get files, directories, and if we should only include valid files or skip empty directories.
The real meat of this method is the root.listFiles(isDirectory) and root.listFiles(isFile) calls, which use the FileFilter objects we create below the method, which are called Anonymous Inner Classes.  This returns only files or directories from the listFiles() method of File.  If you are unfamiliar with the syntax I use to create these objects, search google for anonymous inner class.  The only real complexit in this method is in handling the exclusion of empty directories, and filtering out unmatched file types, but the logic involved should be self-evident if you have done much coding.

This method is great, but what we really need is the ability to list files/directories to any depth.  There are two ways to go about this sort of task, recursive, and iterative.  Though many people are tempted to go for the elegance of a recursive solution, for arbitrarily large data sets the stack overhead makes this less than an ideal candidate, so I have instead written an iterative solution:

 public static LinkedList<File> getAllChildren(File root, boolean getDirs,
   boolean getFiles, boolean onlyValidFiles, boolean skipEmptyDirs) {

  LinkedList<File> list = new LinkedList<File>(), dirsToCheck = new LinkedList<File>();

  if (root.isDirectory())
   dirsToCheck.add(root);

  while (!dirsToCheck.isEmpty()) {
   File file = dirsToCheck.pop();
   LinkedList<File> childDirs = getImmediateChildren(file, true,
     false, onlyValidFiles, skipEmptyDirs);
   LinkedList<File> childFiles = getImmediateChildren(file, false,
     true, onlyValidFiles, skipEmptyDirs);
   if (getFiles)
    list.addAll(childFiles);
   if (getDirs)
    list.addAll(childDirs);
   dirsToCheck.addAll(childDirs);
  }

  return list;
 }

This method should be fairly straightforward.  We are simply using a list named dirsToCheck to collect each directory we find, and we continue to remove and check each directory till the list is empty.  This is known as a breadth first search (as opposed to a depth first search) because it examines each complete level in the hierarchy before moving to the next.  This is a fairly simple method, most of the real work is being done in the first method we created.

In my next post I will show how I build my file lists to be written to thumb drives.  This will include deleting and copying files!

Friday, May 24, 2013

How I quit smoking

Location: Salt Lake City, UT, USA

April of 2011 I quit smoking.  I had quit before around 2004, with the help of Zyban (also known as Wellbutrin), but after going through a period of my life with a lot of stress, I had started again in 2006.  I can proudly say that since April '11 I have tried one cigarette, and the taste was so disgusting that I put it out after a few drags.  I haven't had the slightest inkling to try one since.


I loved smoking.  I certainly was addicted, immensely so, but beyond that I was in love with the act of smoking.  At any point in my day when I was bored I could have a cigarette, and ten minutes would be filled with addictive pleasure.  It provided an oral fixation as well as something to do with my hands.  I also love old fashioned things, trinkets; my zippos and cigarette cases were things I proudly carried and used regularly, long ago having become part of my personal style.  On the psychological front, smoking is a way to instantly differentiate myself from the average straight laced Mormon, and growing up and living in Utah, there is value in such a display.


The key to me quitting was being able to do so without giving up the things I loved about smoking.  I started using an E-Cigarette.  I hadn't planned on quitting smoking entirely.  I had an idea in my head that I would use the e-cigarette most of the time, but still enjoy a real cigarette once a week or so.  But once I had become accustom to the e-cigarette I just lost my desire to smoke.  There was simply no need when the e-cigarette took care of all the things I loved about smoking, and contained none of the drawbacks.


To those who are not aware of e-cigarettes, or those who have been caught up in all of the misinformation which surrounded them when they first became available, I will give you a quick rundown.  An e-cigarette consists of two primary components, a battery, and something which is called an atomizer.  An atomizer is a fancy word to describe a group of devices which break a liquid into a fine mist (atom in the classical sense).  Inhalers are a type of atomizer.  They work in many different ways, but in an e-cigarette the atomizer contains a small heating coil which literally works to make steam.  There are two styles of atomizers.  The more popular ones are built into a cartridge which holds the e-juice, the other style uses an atomizer that is surrounded by a mesh, designed to drip e-juice on directly.

The key to an e-cigarette, of course, is the e-juice, which is a mixture of nicotine with a base and sometimes flavoring.  E-cigarettes work by vaporizing e-juice, or more accurately, by steaming it with the heat coming from the atomizer.  The base used in e-juice is typically propylene glycol, a substance commonly used as a base for medicine and food (it's used in nebulizing inhalers).  Some people are allergic to propylene glycol, so sometimes vegetable glycol will be used instead.  Nicotine is a poison, produced by the tobacco plant to deter insects.  It is the active ingredient in cigarettes, the substance you become addicted to.  In small doses, with a built up tolerance (addiction), it is a relatively safe substance.  It causes an increase in heart rate similar to the effects of caffeine.  Usually a flavoring agent is added to the mixture.  My go-to flavor is Chocolate-Vanilla-Hazelnut, but there are literally thousands of flavors made by different companies.

The choice between different types and brands of e-cigarettes is such a personal thing, but I will tell my own experience and hope that it is enlightening.

I started out buying an e-cigarette package from a local smoke shop.  The brand was Smoke 51, designed to work with cartridges.   They advertised a cartridge as being equivalent to a pack of cigarettes, with 5 cartridges costing about $12.  I used this for two to three months, and was satisfied, but I did have two complaints, the cartridges would leak e-juice on my lips and it tasted disgusting, and I also found them to be imprecise, the way they wick the juice to the atomizer meant that toward the end of their life the cartridges started to give off wispy draws that were unsatisfying, even though the cartridge still contained a lot of juice.  Also the batteries tended to last me 6-8 hours.

To deal with the short battery life I bought a much cheaper unit from the corner store as a backup (I don't recall the brand).  This brand had a different design, using a separate atomizer and cartridge.  The atomizer was a drip atomizer with a wire mesh, and the cartridge was designed to fit on it and wick to the mesh.  After a few days of using the cartridges I got tired of the same issues with juice leaking into my mouth and started using this e-cigarette as a drip atomizer instead of using the cartridges.  It worked very well, and I was much much more satisfied with this setup.  I put away my Smoke 51 unit and never picked it back up.


I liked dripping onto my new e-cig, but the one I had gotten was as cheap as they get.  It only held a charge for 4-6 hours, and the thing was crooked when you screwed the atty on the battery.  Armed with the experience I had gained from vaping for nearly six months, I went online to find solutions to my previous complaints about my e-cigarette experience.  I found a company called Volcano E-Cigarettes, which seemed highly recommended, and had a very active forums where their customers had built quite a neat little community.  The product which drew me was a well crafted version of the typical e-cig you find, but came with two of them, along with a handy carrying case which also acted as a charger, holding enough juice to recharge both batteries a few times.  I went with a cartridge like option, a clear tube which held a large volume of e-juice but I switched back to using a drip atomizer when I had issues with the tanks leaking.

After using this portable charging case which they called the Magma, I was sold, and began trying their different flavors, eventually hitting on the Choconilla-haze, which I have used exclusively for over a year now.  I also became intrigued with a more high end e-cig which Volcano was selling, more appropriately referred to as a cigar due to its oversized battery, called Inferno.  The inferno came in two styles, a larger battery, and a smaller battery that can charge via USB while in use, both styles came in the starter kit.  I quickly gave away the larger battery as I found that the USB pass-through battery was enough to last through a long day, and was convenient to keep plugged in while at my desk using it.  After using the Inferno for a few months I gave away all my other e-cigs and got a second Inferno battery as a backup.  I continue to use the Inferno to this day.


Volcano recently came out with a new e-cigarette named LavaTube.  It is much larger than the inferno, and much much larger than a typical e-cigarette.  It uses a removable battery and has variable voltage, along with improvements under the hood.  The variable voltage means that you can adjust how dense or 'hot' the draw will be.  I tend to run mine at 3.8 volts.  The removable batteries means that I can buy a number of them (I have 4) and charge two at a time in a separate charger.  I can go backpacking for 4-7 days without needing to recharge this way.  I use my LavaTube mostly at my desk, as it is a little large to carry in my shirt pocket, and I carry the Inferno out and about with me.


There are lots of different companies out there making cool e-cigarettes.  So Volcano is not the only game in town, and I encourage you to go see what's available.  I can say that with the few problems I have had (one unit that failed after a few months, and one shipment of the wrong flavor of juice), they were quick to correct the issues and very easy to deal with.  I have placed maybe thirty or so orders with them in the past 2 years.

And last, if you made it this far, you are likely committed to quitting smoking.  I want to explain a difference between the nicotine you get in a cigarette, and what you get with pure nicotine, like that found in e-juice.  Cigarette manufacturers use a chemical process to change nicotine in cigarettes so that it is absorbed into your blood stream much faster than it would naturally.  Because of this, cigarette smokers are trained to the immediate response (the fix) that smoking gives.  Pure nicotine can take 5-10 minutes to fully absorb into the bloodstream, and as such, transitioning from cigarettes to e-juice can be less satisfying than people may expect.  Cigarettes are a game of rewarding a desire, e-cigarettes are more about establishing a constant level of nicotine in your system.  By the time you crave a cigarette, it is too late to get a quick cure with an e-cigarette.  Instead, you will need to get used to using the e-cigarette maintenance, regularly using it to maintain the level which you are accustomed to.

In closing, I know a lot has been brought up about the health risks of e-cigarettes, so I thought I should address this, as I have done extensive research on the subject.  The medical science on nicotine and propylene glycol is fairly extensive and both substances are safe as they are intended to be used in e-cigarettes.  The flavoring used is typically USDA grade and food-safe.  However there is one big warning here.  KNOW WHERE YOU GET YOUR E-JUICE FROM.  Buy from a non-name/fly-by-night operation and you have no idea what they put in there.  Buy from a reputable source where you can confirm the ingredients and methods used in preparation, this is very important.  This is another reason I went with Volcano, who have a large brick-and-mortar business located in the US, employing food-safe laboratories and methods.  The same can be said for hardware, such as batteries and atomizers.  Batteries can malfunction, and parts can be made with toxic substances.  Again, ensure you are purchasing from a reputable manufacturer who care about what goes into their products.

Since quitting smoking, and taking up the e-cigarette I feel great, I don't stink all the time, and I expect to be around a lot longer!  If you have any questions for me leave them in the comments!