Search

Sunday, June 2, 2013

Java file organization - part 2

Location: Salt Lake City, UT, USA
So last post I showed how I built a few quick file utilities in Java.  If you will recall, I have a very large music collection I copy to (currently) 8 usb thumb drives for use in my car.  I do not want bands broken up between disks, and I want to avoid copying files that my car stereo does not read.  So I wrote a few Java utilities to help me automate this task as my music collection grows.

This post I am going to show two more utilities, and then I'll show how I use these methods to build my disks.  We will need a method to copy files and a method to delete files (to clean out the old files when I rebuild the disks).  I'll start with the remove method because it is the smallest.

 private static void removeAllChildren(File root) {
  LinkedList<File> toRemove = Tools.getAllChildren(root, false, true,
    false, false), toCheck = Tools.getImmediateChildren(root, true,
    false, false, false);

  Iterator<File> i = toRemove.iterator();
  while (i.hasNext())
   i.next().delete();

  toRemove.clear();
  while (!toCheck.isEmpty()) {
   File file = toCheck.pop();
   LinkedList<File> temp = Tools.getImmediateChildren(file, true,
     false, false, false);
   toCheck.addAll(temp);
   toRemove.addAll(temp);
   toRemove.add(file);
  }

  Collections.reverse(toRemove);
  i = toRemove.iterator();
  while (i.hasNext())
   i.next().delete();
 }


Basically we give it a folder, and it will remove all children.  Our general strategy is to first collect all files and simply remove them.  Next we iterate through toCheck and collect all the remaining directories and store them for removal as we continue to check them.  We also add the current directory to the list to remove.  This is similar to the logic we used in the last post to build our folder listing.  Finally, we reverse toRemove so that we are removing the deepest nested directories first, and we iterate again to remove them all.  This should all be rather straightforward.

The next method is a bit more complicated, but not too bad.  We're using nio channels to perform the actual copying of the contents.

 private static long copyFile(File fromFile, short diskNum, File newPath,
   File oldPath) {

  String disk = Short.toString(diskNum);
  if (diskNum < 10)
   disk = "0" + disk;

  String path = fromFile.getAbsolutePath().replace(
    oldPath.getAbsolutePath(),
    newPath.getAbsolutePath() + File.separator + disk);

  File toFile = new File(path.replace(fromFile.getName(), ""));
  if (!toFile.exists())
   toFile.mkdirs();

  toFile = new File(path);
  try {
   toFile.createNewFile();
  } catch (IOException e1) {
   // Choose an action here!
   return 0;
  }

  FileInputStream source = null;
  FileOutputStream destination = null;
  long size = 0;

  try {
   source = new FileInputStream(fromFile);
   destination = new FileOutputStream(toFile);

   FileChannel sourceFileChannel = source.getChannel();
   FileChannel destinationFileChannel = destination.getChannel();

   size = sourceFileChannel.size();
   sourceFileChannel.transferTo(0, size, destinationFileChannel);

  } catch (FileNotFoundException e) {
   // Choose an action here!
   return 0;
  } catch (IOException e) {
   // Choose an action here!
   return 0;
  } finally {
   try {
    source.close();
    destination.close();
   } catch (IOException e) {
   }
  }
  return size;
 }

This method will return a long representing the size of the copied file (0 if the copy fails).  The input is the file which is being moved, a disk number (used to name the target directory), a new root path, and the old root path.  These paths are used to determine where to copy the file to.  For instance, I have my music stored as /usr/home/me/music/<arrtist>/<album>/, so the oldPath would be /usr/home/me/music/.  I want to store these disks to /mnt/<disknum>, so the newPath would be /mnt/.  Everything else about the path would be preserved, so if our old file was /use/home/me/music/coolband/bestalbum/bestsong.mp3, it would be copied to /mnt/01/coolband/bestalbum/bestsong.mp3 (assuming this is destine for the first disk).

The meat of this method starts out by building the name for the new path, creating any necessary directories which contain this file, and creating the actual file.  From there we connect IO streams to the source and destination files, and retrieve their channels for reading and writing.  Once we have the size stored, the actual instruction to transfer the file is basic, we call the transferTo method on the output target, and give it the start and end position as well as the input source channel.  The rest of the method is exception handling in case something goes wrong, and we finish by returning the size of the transfer.  It is interesting to note that the file copy itself is only one line, and even with setting up the IO, it's still less than 10.  Half if not more of the coding work involved is in admin work like generating path names and creating parent directories.

Armed with all these tools we are ready for the main function!

 private static long maxSize = 64000500000L;
 private static short maxDir = 998;
 private static int maxFiles = 20480;
 private static File rootDir = new File("/path/to/masters/");
 private static File newLocation = new File("/path/to/copy/to/");

 public static void main(String[] args) {

  long currSize = 0;
  int totalFiles = 0;
  short totalDirs = 0;
  LinkedList<LinkedList<File>> diskList = new LinkedList<LinkedList<File>>();
  LinkedList<File> disk = new LinkedList<File>();
  diskList.add(disk);

  LinkedList<File> headList = Tools.getImmediateChildren(rootDir, true,
    false, false, true);
  Collections.sort(headList);
  Iterator<File> i = headList.iterator();

  while (i.hasNext()) {

   File head = i.next();
   LinkedList<File> files = Tools.getAllChildren(head, false, true,
     true, true);
   LinkedList<File> dirs = Tools.getAllChildren(head, true, false,
     true, true);

   short numDirs = (short) dirs.size();
   int numFiles = files.size();
   long size = 0;
   for (File file : files)
    size += file.length();

   if (currSize + size > maxSize || totalFiles + numFiles > maxFiles
     || totalDirs + numDirs > maxDir) {
    disk = new LinkedList<File>();
    diskList.add(disk);
    currSize = 0;
    totalFiles = 0;
    totalDirs = 0;
   }

   disk.addAll(files);
   currSize += size;
   totalFiles += numFiles;
   totalDirs += numDirs;
  }

  removeAllChildren(newLocation);

  for (short diskNum = 1; !diskList.isEmpty(); diskNum++) {
   i = diskList.pop().iterator();
   while (i.hasNext())
    copyFile(i.next(), diskNum, newLocation, rootDir);
  }
 }

We start out by defining some limitations imposed by the hardware involved.  The maxSize is a limit of my usb thumb drives (approx 64*10^9), and the maxDir and maxFiles is a limitation of my player.  The rootDir is where we will build our disks from, our master list.  The newLocation is where we will copy our files to build our disks.

For the meat of our method we define a few numeric types to track the size, number of files, and number of directories in each disk; we define a diskList to hold each listing of files which will make up each disk.  Note that this is a nested list, meaning it is a linked list of linked lists.  We have created one linked list that stores other linked lists of files, and we will add them to it as we assemble them.  Next we create our first file list (representing a disk) and add it to the disk list.  The last step before we start to assemble our disks is to gather a list of all the artists, which each have a separate directory under the rootDir, we call this list the headList, and we sort so that the disks are split up alphabetically by artist/band.

Once this is all setup we simply iterate through each artist/band and collect a list of their files, directories, and a total of the size.  If they overflow our current disk we start on a new disk, otherwise, we add them to our current file list, and move on to the next.  Once this is all done and every disks file list is assembled we remove everything from our newLocation (this assumes that you have old disks that were built that will be superseded by the ones about to be copied), and we copy all the files to their associated disks.  Note that this last for loop would be faster if it iterated through diskList instead of using pop() because pop() incurs the overhead of un-linking a list node, though compared to the time that the copy itself takes, this is negligible...

There are lots of other things you an do to automate maintenance on large digital collections like this.  Do you have any ideas?

Handy way to negate a 2's compliment number: (~n)+1

No comments:

Post a Comment