isopack

Introduction

isopack is a command line tool for filtering a list of files to pack in an ISO image to try and produce the largest image that will still fit on a given media.

This is a very hard problem (see Knapsack problem) and isopack has a variety of algorithms you can use to attempt to find a very good packing without taking too much time (at least compared to the time writing a BD-R disk takes, for instance :-).

No doubt other algorithms exists which might work better, the current collection was just what I came up with over a few days of fooling around with packing algorithms.

Note: I'm putting this out there with no license restrictions. Do what you will with it. (Especially don't ask me if you can use it: The answer is "Yes." :-).

Download

You can download the C++ source for the tool here:

isopack.tar.bz2

Usage

isopack [options...] < infile > outfile

isopack reads a modified form of the mkisofs -graft-points -path-list filename list on stdin and writes a reduced list of files in the same format to stdout. (Put simply, each line has the filename on the ISO image, an equal sign, and the matching filename on the computer to be written to the ISO image. Any equal signs or backslashes in the filenames are escaped with a leading backslash.)

In addition to the filename lines, you can group files you always want to stay together and files that must be included in the packing. Lines that consist of only an open bracket (optionally followed by an exclamation point) or only a close bracket mark the beginning and end of groups. The open bracket with the exclamation marks a group of files that must be included in the output. For example:

[!
README=/file/collection/README
]

defines a group with a single README file which will definitely be included in the resulting packing.

On the other hand:

[
Movie.avi=/file/collection/Movie.avi
Movie.srt=/file/collection/Movie.srt
MoviePoster.jpg=/file/collection/MoviePoster.jpg
]

defines a group of files which do not have to be included in the output list, but if any one of them is included, all three will be included.

Lines in the input do not have to be inside the [] group structure. Each line not included in a group is treated as being a group of one.

The group syntax is not included in the output, so the output can be fed directly to mkisofs.

Note that isopack requires individual files to be listed in this input, it will not (for instance) accept a line naming a directory and expand that into all the files in the directory. If you want directory contents included in the packing, you need to list all the individual files in those directories.

Warning: Burning software often decides to format media by default, so the size you think might fit will not actually fit. Certainly when I tried k3b using growisofs to write a BD-R.SL disk, it ran out of space at the 98% point when I used a packing that should have fit in an unformatted disc.

The dvd+rw-mediainfo tool is available on most linux distros, and it will show you all the different recommended sizes for different standard formattings of media. You may need to pick one of the smaller sector sizes instead of the unformatted size in order to use some burning software.

Options

-help Print usage information for the program. You may also pass the word help following the equal (=) in any of the options below to get detailed information about that option.

-algorithm=name Name the algorithm to use to try and find a packing. This option may be specified multiple times, and multiple algorithms will be tried.

  • simple is a fast simple heuristic that was basically devised as whatever I thought might work. There is no wonderful theory behind this algorithm.
  • brute is a brute force search through all permutations. Unless you also use the -time= options, this will probably never finish on any moderately large list of files, but it does try the permutations in an order that usually finds some good packings in the first few minutes.
  • ffd is the first fit descending algorithm. It tries to pick files to add starting with the largest and going down the list skipping over a file that makes the packing too big and trying the next smallest. This is very fast, but doesn't usually do a very good job.
  • luckN is the random Monte Carlo packing algorithm. Specify the name with an integer immediately following (i.e. luck1000). The integer value says how many times to make a new random packing. You can specify zero as the number, and this will run until the -time= limit expires (or the sectors packed gets within -close=sectors of filling the media).

-time=seconds Specify the maximum time in seconds any one algorithm is allowed to spend trying to find a packing. You can suffix the time value with m to mean minutes, or h to mean hours. Note that if you specify multiple algorithms, each one will be given this much time.

-close=sectors Specify how many sectors are close enough to be willing to use the new packing just found rather than take longer searching for a better one. This can be used to make the search finish faster than the value given on the -time option.

-waste=sectors Specify how many sectors you are willing to waste on the media. After isopack has completed all normal processing, it will check this at exit and exit with an error status if the packing wastes more sectors than this (might be useful for scripting).

-watchdog=seconds Accepts the same format time values as the -time= option, but instead of timing the entire run of the packing algirithm, acts as a watchdog timer and resets the timeout each time a better packing is found, so something like -watchdog=30 would stop looking for new packings after it spent 30 futile seconds trying to find a better one.

Note: Aside from the -time, -close, and -watchdog arguments, you can also stop the running algorithm by sending a SIGINT signal to the isopack process.

-verbose=level Specify the verbosity level as an integer value. Zero means be quiet. One updates you about new packings as they are found. Two or more prints random details about what is going on (all this output is to stderr).

-sectors=size Specify the size of ISO image to generate. The size parameter may either be a number of 2048 byte sectors, or the name of one of the predefined image sizes:

DVD+R.SL 2295104 12 cm 1 Layer DVD+R
DVD+R.DL 4173824 12 cm 2 Layer DVD+R
DVD-R.SL 2298496 12 cm 1 Layer DVD-R
DVD-R.DL 4171712 12 cm 2 Layer DVD-R
BD-R.SL 12219392 12 cm 1 Layer Blu-Ray Recordable
BD-R.DL 24438784 12 cm 2 Layer Blu-Ray Recordable
BD-R.8cm.SL 3804288 8 cm 1 Layer Blu-Ray Recordable
BD-R.8cm.DL 7608576 8 cm 2 Layer Blu-Ray Recordable
BD-XL.TL 48878592 12 cm 3 Layer Blu-Ray XL Recordable
BD-XL.QL 62500864 12 cm 4 Layer Blu-Ray XL Recordable
DVD+RW 2295104 12 cm DVD+RW
DVD-RW.F 2297888 12 cm DVD-RW formatted
DVD-RW.U 2298496 12 cm DVD-RW not formatted
BD-RE.F 11826176 12 cm 1 Layer Blu-Ray RE default format
BD-RE.Z 12219392 12 cm 1 Layer Blu-Ray RE 0 spare format

-prog=file Give the full path to the mkisofs executable to use for determining exact size of ISO images.

-args=string Give the argument string to be passed to mkisofs. Arguments that are required by isopack will be added to the beginning and end of the string like so:

mkisofs -print-size -graft-points string -path-list tempfile

The default string (only without the line wrapping) contains:

-quiet -volid TEST_IMAGE -volset '' -appid 'isopack test image' -publisher '' -preparer '' -sysid LINUX -volset-size 1 -volset-seqno 1 -rational-rock -joliet -joliet-long -no-cache-inodes -udf -full-iso9660-filenames -iso-level 3

-addarg=string Rather than setting the entire argument string, you can use this to simply add additional arguments to the end of the list (you can use this option multiple times to add multiple strings).

Note: Not all versions of mkisofs are the same, you may need to (for instance) add the -allow-limited-size argument to the above string if you are not using the version of mkisofs that comes with the cdrtools package (see references).

-infile=filename Instead of reading the path list from stdin, read it from the named file.

-outfile=filename Instead of writing the resulting path list to stdout, write it to the named file.

-unpacked=filename Write the groups from the input path list which were not included in the packing to this file (including any of the [] group specifiers). This file may then be used as input for another isopack run to generate a new packing from the leftover files.

Results

I have a collection of movies and other big files I'm been using as test data (and the mkdummy.sh script shipped with the code to generate phony sparse files with the exact same sizes). There are 69 files organized into 44 groups and the total size of all the files is 83,646,744,084 bytes. My tests try to fit as much as possible of that 84GB on a 25GB BD-R.SL disk

Running just ./isopack with no arguments, it only takes a fraction of a second to run and comes up with this packing:

Best packing found contains 25022370117 bytes of data in 57 files.
Unused sectors on media: 926 of 12219392 (99.992% full)
Packing found in 0 seconds by simple algorithm.

Trying ./isopack -algorithm=luck0 -time=4m gives:

Best packing found contains 25024414966 bytes of data in 7 files.
Unused sectors on media: 2 of 12219392 (100.000% full)
Packing found in 9 seconds by luck algorithm.

Luck, being random, found a really good packing in the first 9 seconds, then never found a better one in the remaining 3 minutes and 51 seconds. It isn't always so quick (but this is the sort of reason I added the -close option).

Finally, trying ./isopack -algorithm=brute -time=4m gives:

Best packing found contains 25024415614 bytes of data in 8 files.
Unused sectors on media: 1 of 12219392 (100.000% full)
Packing found in 59 seconds by brute algorithm.

That's a better packing than luck found, but it took longer (not that the results would always come out that way).

I also have some actual results: I was able to pack a BD-R.SL with only 2 unused sectors and write it with k3b using the cdrtools back end rather than growisofs. k3b still had some kind of silly problem trying to verify the data, but sha256sum produced matching checksums for files on the BD-R and the hard disk, so the write did seem to work perfectly and not truncate any files. I saved the log from that k3b run and I should be able to look at the commands it used and produce a mostly automated command line script to write my backups when I collect enough files to want to dump some to BD-R.

Work Flow

In the tarball with the source, I also have some sample scripts showing how I plan to use isopack (I've done this a bit at a time so far, but now I've put together scripts to automate things, and hopefully they will work when I get enough files to need a new backup).

pack-dir is a perl script that turns a directory tree into the sort of path list that can be passed to isopack as input. You can hand edit the file after it is created to define any groups you might want to add, etc. A sample run might look like:

prompt> pack-dir /disk2/Movies > packlist.txt

pack-and-write is a shell script that runs isopack, mkisofs, and cdrecord to automate the packing and media writing process. It hasn't really been tested at all, and would need extensive personalization to be suitable for use on another machine with a different burner, etc. Consider it an example that may or may not actually work as it is. After making any hand edits to the above packlist.txt file a sample run might look like:

prompt> pack-and-write packlist.txt "MOVIES-2011-12-13" 2>&1 | pack-progress

pack-check is a script intended to be run to compute checksums for the files on the mounted media after it is written and compare those checksums to the same files on the original hard disk directory. After writing the above media, putting it back in the computer, and mounting it at /media/MOVIES-2011-12-13, a verify run might look like:

prompt> pack-check /media/MOVIES-2011-12-13 /disk2/Movies

That will tell you if all the checksums match. If the backup was successful, you can now remove the files in /disk2/Movies which were backed up to the media (assuming that is the way you planned to do things - this is my work flow, it doesn't have to be your's :-).

References

  • Blu-Ray Disk Capacity FAQ This is where the predefined numbers for Blu-Ray ISO sizes come from.
  • Wikipedia DVD disk capacity comparison This is where the numbers for DVD ISO sizes come from.
  • scdbackup is another tool (that has been around a lot longer than isopack) for backing up large amounts of data to a collection of optical media.
  • Cdrtools (cdrecord) home page The documentation for mkisofs can be found on this site (as well as other places). The isopack tool uses the same -path-list format data as mkisofs so the output can be easily fed to mkisofs.
  • 1998 Matlab CD packing contest Description of the winning entry in an interesting programming contest for packing songs on a CD which is very similar to the job isopack does.
  • Cdwrite Mailing List Message This reply to my mail on the debian cdwrite mailing list seems to explain a lot of the mysterious behavior I was seeing with files that should have fit on a disc not fitting due to the way the burning software I used happened to work (in this case growisofs being used by k3b).
  • public.tmr.com has the source for the breaker program which does a similar job and dates from the 1990's (thanks to Bill Davidsen for the pointer).
  • My Home Page where other wondrous things may be found (and I might even get some political converts :-).

Revisions

Saturday, Jun 16th 2012 Removed the dev= arg from cdrecord command in pack-and-write script since my hardware changed a bit and cdrecord seems to find the device without me naming it.

Saturday, Apr 28th 2012 Fixed problems with the pack-check script and files with spaces in the name. Added the pack-progress program to use to overwrite all the progress messages from cdrecord onto a single line.

Friday, Jan 13th, 2012 Finally accumulated enough data to try the pack-and-write script, found some typos in the script and a SEGFAULT in isopack. Realized the -waste option would be useful, so I added it. (The script did work after the fixes though).

Page last modified Sat Jun 16 14:25:51 2012