If you're not confused, you're not paying attention.

After much confusion at many points, this web page documents all the things I've learned about fetching recorded programs off my TiVo and on to my desktop where I can strip the commercials and transcode the files into a smaller size that plays well on my Nexus 7. (This also gives my new 4TB disk something to do.)

Why do this? Mainly to create an infinite source of TV programs to watch while on my treadmill (I'm almost out of the Babylon 5 episodes I transcoded from my DVDs, but it seems unlikely that I'll ever run out of Doctor Who episodes :-)

Work Flow

All the work starts in a cron job which runs the fetch-who script at 1:17AM every day on my system.

That script uses get-tivo-dir to pipe the directory web page fetched from the TiVo through a little utility script (add-newlines) that simplifies the hacky parsing that is then done by the dig-episodes script to dig up just the Doctor Who episodes and using the get-tivo script, download the ones I don't already have. (At the moment, I'm only fetching Doctor Who - I'd need more work to add other shows, but it shouldn't be complicated.)

Once the .mpg file is downloaded, the real work is started in the background using start-keeper-bg which makes an entry in a worklist directory and backgrounds the keep-working script to run the worklist entries one after another until there is nothing more to do. If there are more episodes to download, the I/O bound task of downloading the next episode can proceed while the background job is doing the CPU bound job of transcoding the last one.

This brings us to the keeper script which is the big one that does most of the work to invoke tools that will do the transcoding of the newly downloaded video file. It will run the create-keep-guess script if there is not yet a list of segments to keep from the video. It will run the generate-timestamps script if there is not yet a timestamp file.

The work of running ffmpeg to do a high quality two pass transcoding of each commercial free segment now begins. It uses the print-one-segment script to find the correct segment to transcode and adjust the start time for that segment. It then uses the multijob.c program to run up to 4 copies of ffmpeg in parallel until it has produced the individual files for each commercial-free program segment.

Finally, all the commercial-free segments are concatenated with ffmpeg to produce the final full video.

At the end of this process, I get a file that is a high quality 1280x720 video of about 500 or 600 megabytes instead of the original 5 to 6 gigabyte 1920x1080 mpeg file.

As a final step, it uses the make-airdate-link script to generate a hard link in the top level directory to the new video file named with the original airdate so the directory sorts nicely in broadcast order.

After all these transcoding jobs finish, the keep-working script runs build-web to build a handy web page I can use on my local LAN to download episodes to devices like my tablet. This uses the build-db script to build a simple text file database with information about the program from various sources. It uses update-airdates to try and fetch the original airdate for episodes I haven't noticed before. It uses midentify to determine the playing time of the transcoded and original files. It does various sanity checks to guess if the original video was clipped or the automated commercial detection failed. All this info is recorded in the simple database file where it can be edited by hand or accessed by other scripts using db-read.

The end result is something like this sample Doctor Who Episodes page (the links on that page don't work, by the way :-)

Problems

Uncooperative files and tools:

Files downloaded from the TiVo play well enough if you go sequentially from beginning to end, but for whatever reason (I can imagine cable companies inserting commercials over the original broadcast, different kinds of program streams for commercials, etc) attempting to seek in these files using the index information available is hopeless.

Another problem I found with the TiVo .mpg files is an annoying random variation in the audio sync. It seems to always be around 300ms off, but different downloads will have different variations. This required me to write my scripts so it was easy to try remuxing the audio and video with different delays until I find a variation that seems to generate perfect sync. (I haven't yet seen it drift during a recording, it is just always off by a fixed amount, different for each download.)

I don't understand why I don't see sync problems when watching the recordings on the TiVo, but I never see them there. (Possibly this has something to do with TiVo storing the files internally as transport stream and only converting them to program stream when I download).

My original attempts to use mencoder and a .edl (edit list) file to encode was hopeless. It could sorta kinda do a one pass encoding, but to get a high quality two pass encoding was absolutely impossible. Some schemes involving encoding it twice, once to a high bitrate copy that could at least have a decent index constructed seemed to work, but I hated to lose any quality by doing two encodings.

My quest eventually led me to ffmpeg and the obscure fact that if you put the -ss seek time argument before the input file, it will try to seek via the internal indexes, but if you put -ss after the input file, then it will seek by sequentially reading the file. This actually produced a system reliable enough to do a two pass transcoding from the original .mpg file with no intermediate file required.

It takes a long time (especially when you want to start near the end of the video) to do the seek, but the times seem to produce frame accurate beginnings and endings of transcoded segments from the input file. In an attempt to speed things up, I tried a suggestion I found on the web to use -ss before the file to seek close to the start frame, then use -ss again after the file to get to the exact desired frame by reading to that point. Unfortunately, the index is so screwed up, this was impossible (it even made ffmpeg puke a couple of times).

Eventually, I discovered the segment muxer (or is it demuxer) in ffmpeg. Using it and my list of times for segments of the file I want to keep (versus the commercial segments I want to throw away), I can split the original file into multiple segments, each one containing one of the parts I want to keep no more than about 30 seconds into the segment. Now I can do the transcoding with one of these split parts as the input file, and merely have to be careful to adjust my timestamps based on the times ffmpeg tells me it actually split the segments. Since I use the copy codec when splitting the segments, no quality is lost, and the split runs quite fast. I used this in the keeper script for a while, and it allowed the splitting and transcoding of what will be about a 40 to 45 minute episode to be done in less than an hour (on my core i7 system).

Ha! Next I discovered .mpg files that were resistant to the segment muxer, but worked perfectly when reading from the beginning, so I abandoned the segment muxer (keeping the code in the script to be used under an option) and went back to reading the whole file from the beginning to get to a segment start. I also discovered that if I encode the audio at the same time, rather than trying to add it later, the sync seems to be perfect. I have no idea why the sync seems off when playing the original file, but perhaps it only gets off when I skip forward through the video.

Sometimes (no doubt due to damaged mpeg files) even this fanaticism gives me files where the audio sync is off. At this point I have to resort to the show-packets and find-glitch scripts to examine the segment where the audio goes out of sync in great detail and find a timestamp within that segment where the packets seem to be screwy (lots of audio with no video or vice versa). If I can find a glitch that seems to be at the same time the audio goes bad, I can re-encode the file, splitting the segment at that timestamp. That generally gives me two segments that are internally synced OK, but screw up at the join point when I concatenate them. I can then use merge-segments script to extract the segment audio streams and the make-silence script to generate some short silent audio files which I can join together to build an audio stream that plays with corrected sync when muxed together with the video to make the final file. In particularly screwed up files, I may need to do this in several places till I get the whole file in sync properly.

Annoying TiVo web server

The brilliant programmers at TiVo seem to have introduced a time bomb in the form of an expired cookie (see this article), so when I tried to use my original version of the get-tivo script, nothing worked any longer. Fortunately some folks had figured out how to work around the problem, and I just need to google for a fix (like the one described in the link above).

Once I started to download a lot of episodes, I discovered the TiVo web server would sometimes get hung and never talk again till the TiVo was rebooted. Putting a 15 second delay between the curl commands I use to access the TiVo seems to fix this.

Frame times

Aside from the problems with bad file indexes, I find that no two tools ever seem to actually agree on timestamps and frame numbers (how this could be, I don't know, I just go by experimental evidence).

That leads to the problem of what to do when the automated commercial detection fails, and I want to tweak the start and stop times by hand. To tweak them, I need to know the actual time that ffmpeg believes rather than the time some other tool might report.

Fortunately, I discovered the drawtext filter in ffmpeg. I can use it to generate a low resolution copy of the original file with each frame's timestamp drawn on the video. Absolutely guaranteed to be the time that ffmpeg believes in, since it produces the time. That is what the generate-timestamps script does.

Ffmpeg version Hell

When figuring out how to do things in these scripts, I'd read about some nifty feature of ffmpeg which I'd want to try, only to discover that the ffmpeg in the fedora/rpmfusion repos is too old to support that feature. I'd see a new version in the rawhide repos, and try it, but it just segfaults. I finally discovered on the download page of the ffmpeg web site links to static self-contained builds of ffmpeg for linux. I downloaded one of those, pointed all my scripts to it, and everything has been perfect since then.

Script Index

add-newlines

#!/usr/bin/perl
#
# Silly script that sticks newlines in front of all the table rows and table
# data in a web page passed into stdin. Used to simplify hacky parsing of
# web info without doing the full parsing of all the html elements.
#
while (<>) {
   s/\<tr/\n\n\n\<tr/g;
   s/\<td/\n\<td/g;
   print;
}

bg-this

#!/bin/bash
#
# Build background worklist entry to run whatever is given on the
# command line in whatever directory this script was run from.
#
mydir=`dirname $0`
PATH=`$mydir/echo-path`
export PATH
#
topdir="/huge/vids"
whodir="$topdir/DoctorWho"
dbdir="$whodir/.data"
workdir="$dbdir/worklist"
#
[ -d "$workdir" ] || mkdir -p "$workdir"
if [ -d "$workdir" ]
then
   curdir=`/bin/pwd`
   basefile=`basename $curdir`
   if [ -d "$curdir" ]
   then
      workfile="$workdir/$$-$basefile.work"
      ( umask 077 ; echo cd '"'$curdir'"' > "$workfile.temp" )
      ( umask 077 ; echo exec \> work.log 2\>\&1 >> "$workfile.temp" )
      ( umask 077 ; echo "$@" >> "$workfile.temp" )
      mv "$workfile.temp" "$workfile"
      nohup keep-working > /dev/null 2>&1 < /dev/null &
   else
      echo $curdir is not a directory 1>&2
      exit 2
   fi
else
   echo Unable to create $workdir 1>&2
   exit 2
fi

build-db

#!/usr/bin/perl -w
#
# Build the database of info (in plain text format so it can be easily
# edited by hand) about all the doctor who episodes.
#
# Use -f option to force updating time and size of all video files when
# doing the update.
#

use strict;

my $topdir="/huge/vids";
my $whodir="$topdir/DoctorWho";
my $dbdir="$whodir/.data";
my $dbfile="$dbdir/allinfo.txt";
my %db;
my %dirdb;

# Get PATH set to include this script's directory and other useful bits

my $newpath=`dirname $0`;
chomp($newpath);
$newpath=`$newpath/echo-path`;
chomp($newpath);
$ENV{'PATH'}=$newpath;

my $force = (scalar(@ARGV) > 0) && ($ARGV[0] eq "-f");

# Utility routine to check file1 to see if it is newer than file2
# always returns 1 if either file does not exist.
#
sub samefile {
   my $file1 = shift;
   my $file2 = shift;
   my @f1 = stat($file1);
   my @f2 = stat($file2);
   return ((scalar(@f1) > 1) && (scalar(@f2) > 1) && ($f1[1] == $f2[1]) &&
           ($f1[0] == $f2[0]));
}

# Utility routine to check file1 to see if it is the same as file2
# (i.e. same dev and inode).
#
sub newerthan {
   my $file1 = shift;
   my $file2 = shift;
   my @f1 = stat($file1);
   my @f2 = stat($file2);
   return (scalar(@f1) < 10) || (scalar(@f2) < 10) || ($f1[9] > $f2[9]);
}

# Utility routine to run midentify on video file and return length
# in seconds.
#
sub getvidseconds {
   my $vidfile = shift;
   my $avh;
   my $seconds;
   if (open($avh, '-|', 'midentify', "$vidfile")) {
      while (<$avh>) {
         if (/^ID_LENGTH=(.+)$/) {
            $seconds = $1;
         }
      }
      close($avh);
   }
   return $seconds;
}

# Read the info.txt file with the html table data copied from the
# original TiVo directory web page. Return the episode description
# with various formatting cleaned up.
#
sub getdescription {
   my $dirname = shift;
   my $inf;
   my $description;
   if (open($inf, '<', "$whodir/$dirname/info.txt")) {
      while (<$inf>) {
         if (/Doctor Who:/) {
            chomp;

            # Remove boring and repetitive copyright notice at end.
            s/Copyright.*\<\/td\>/\<\/td\>/;

            # Remove leading and trailing table data html tags
            s/^\s*\<td\b[^\>]*\>//;
            s/\<\/td\>//;

            # And leading and trailing spaces.
            s/^\s+//;
            s/\s+$//;

            $description = $_;
            last;
         }
      }
      close($inf);
   }
   return $description;
}

# Check for the .edl file generated by running comskip. If it exists and the
# first commercial skip starts within 2 seconds of the beginning of the mpg
# file, then it is "good", otherwise it is "bad".
#
# I now convert the .edl into a .keep, and if I've created a .keep with
# no guess- prefix, then there are no edl errors (because I manually fixed
# them in the .keep file).
#
sub edlstatus {
   my $dirname = shift;
   my $basefile = shift;
   my $inf;
   my $edl='bad';
   if (-f "$whodir/$dirname/$basefile.keep") {
      $edl = 'good';
   } else {
      if (open($inf, '<', "$whodir/$dirname/$basefile.edl")) {
         my $line = <$inf>;
         chomp($line);
         my @ed = split(' ', $line);
         if (scalar(@ed) == 3) {
            if ($ed[0] <= 2.0) {
               $edl='good';
            }
         }
      }
   }
   return $edl;
}

# What with marking some directories bad and asking to re-record
# episodes (or the TiVo just re-recording them by itself), I might
# have multiple directories with copies of the full .mpg file. This
# routine records the directory info for each directory under a
# top level hash indexed by the basename. I can sort through any
# duplicates and decide which directory has the best copy to include
# in the final database file.
#
sub accumdirinfo {
   my $basefile = shift;
   my $recdate = shift;
   my $dirname = shift;
   my $goodir = shift;
   my $topr = $dirdb{$basefile};
   if (! defined($topr)) {
      $topr = {};
      $dirdb{$basefile} = $topr;
   }
   my $r = {};
   $topr->{$dirname} = $r;
   my $mainr = $db{$basefile};
   if (! defined($mainr)) {
      $db{$basefile} = $r;
   }
   $r->{'score'} = 0;
   if (defined($mainr) &&
       exists($mainr->{'mpgdirname'}) &&
       ($mainr->{'mpgdirname'} eq $dirname) &&
       (! newerthan("$whodir/$dirname/$basefile.mpg", $dbfile))) {
      if (exists($mainr->{'edl'})) {
         $r->{'edl'} = $mainr->{'edl'};
      }
      if (exists($mainr->{'mpgseconds'})) {
         $r->{'mpgseconds'} = $mainr->{'mpgseconds'};
      }
      if (exists($mainr->{'description'})) {
         $r->{'description'} = $mainr->{'description'};
      }
      if (exists($mainr->{'mpgsize'})) {
         $r->{'mpgsize'} = $mainr->{'mpgsize'};
      }
      if (exists($mainr->{'mpg'})) {
         $r->{'mpg'} = $mainr->{'mpg'};
      }
      $r->{'score'} += 2000;
   }
   $r->{'mpgdirname'} = $dirname;
   $r->{'recdate'} = $recdate;
   if (! $goodir) {
      $r->{'score'} -= 1000;
   }
   if (! exists($r->{'edl'})) {
      $r->{'edl'} = edlstatus($dirname, $basefile);
      if ($r->{'edl'} eq 'bad') {
         $r->{'score'} -= 100;
      }
   }
   if ($force || (! exists($r->{'mpgseconds'}))) {
      my $seconds = getvidseconds("$whodir/$dirname/$basefile.mpg");
      if (defined($seconds)) {
         $r->{'mpgseconds'} = $seconds;
         if ($seconds >= (61.9*60)) {
            $r->{'score'} += 200;
         } elsif ($seconds < (60.1*60)) {
            $r->{'score'} -= 200;
         }
      } else {
         $r->{'score'} -= 500;
      }
   }
   if (! exists($r->{'description'})) {
      my $description = getdescription($dirname);
      if (defined($description)) {
         $r->{'description'} = $description;
      }
   }
   if ($force || (! exists($r->{'mpgsize'}))) {
      my @mpstat = stat("$whodir/$dirname/$basefile.mpg");
      if (scalar(@mpstat) > 7) {
         $r->{'mpgsize'} = $mpstat[7];
         $r->{'mpg'} = 1;
      }
   }
   if (! exists($r->{'mpg'})) {
      if (-f "$whodir/$dirname/$basefile.mpg") {
         $r->{'mpg'} = 1;
      }
   }
   if (defined($mainr)) {
      my $airdate = $mainr->{'airdate'};
      if (defined($airdate) && samefile("$whodir/$airdate-$basefile.avi",
                                        "$whodir/$dirname/$basefile.avi")) {
         $r->{'score'} += 1000000;
      }
   }
}

# Read in the existing database (if any) to start with known data
# (which may no longer exist anywhere else). Do not record the 'avi'
# and 'mpg' flags - need to verify the existence of such files
# again during the rebuild of the database.

my $fh;
my $r;
if (open($fh, '<', $dbfile)) {
   while (<$fh>) {
      chomp;
      if (/^\[(.+)\]$/) {
         my $basename = $1;
         $r = {};
         $db{$basename} = $r;
      } elsif (/^([A-Za-z0-9_]+)=(.+)$/) {
         if (defined($r)) {
            my $key = $1;
            my $val = $2;
            if (! (($key eq 'avi') || ($key eq 'mpg'))) {
               $r->{$key} = $val;
            }
         }
      }
   }
   close($fh);
   undef($fh);
   undef($r);
}

# Read the list of .avi files in $whodir to find any basenames and
# airdates that might not already be in the database. If the .avi file
# is newer than the database file, also update the length info in
# the database.

my $dh;
my @whodirnames;
if (opendir($dh, $whodir)) {
   @whodirnames = readdir($dh);
   closedir($dh);
   undef $dh;
}
my $n;
foreach $n (@whodirnames) {
   if ($n=~/^(\d+-\d+-\d+)-([A-Za-z0-9_]+)\.avi$/) {
      my $airdate = $1;
      my $basename = $2;
      $r = $db{$basename};
      if (! defined($r)) {
         $r = {};
         $db{$basename} = $r;
      }
      if (! exists($r->{'airdate'})) {
         $r->{'airdate'} = $airdate;
      }
      my $newavi = newerthan("$whodir/$n",$dbfile);
      if ((! exists($r->{'aviseconds'})) || $newavi || $force) {
         my $aviseconds = getvidseconds("$whodir/$n");
         if (defined($aviseconds)) {
            $r->{'aviseconds'} = $aviseconds;
         }
      }
      if ((! exists($r->{'avisize'})) || $newavi || $force) {
         $r->{'avisize'} = (stat("$whodir/$n"))[7];
      }
      $r->{'avi'} = 1;
   }
}

# Now read the directories to find any original .mpg files and other
# info stashed in the download directories.

foreach $n (@whodirnames) {
   if ($n=~/^(\d+-\d+-\d+)-\d+-([A-Za-z0-9_]+)$/) {
      accumdirinfo($2, $1, $n, 1);
   } elsif ($n=~/^bad-(\d+-\d+-\d+)-\d+-([A-Za-z0-9_]+)$/) {
      accumdirinfo($2, $1, $n, 0);
   }
}

# Go through all the info gathered from directories and pick the
# highest score directory to copy info into main database.

my $bn;
foreach $bn (keys(%dirdb)) {
   my $topr = $dirdb{$bn};
   my $dirname;
   my $topscore;
   my $topref;
   foreach $dirname (keys(%{$topr})) {
      my $r = $topr->{$dirname};
      if ((! defined($topscore)) || ($topscore < $r->{'score'})) {
         $topscore = $r->{'score'};
         $topref = $r;
      }
   }
   my $mainr = $db{$bn};
   if (defined($topref) && defined($mainr)) {
      my $key;
      foreach $key (keys(%{$topref})) {
         if ($key ne 'score') {
            my $val = $topref->{$key};
            $mainr->{$key} = $val;
         }
      }
   }
}

# Save new db (keeping backup)

sub compare_airdate {
   my $ada = $db{$a}->{'airdate'};
   my $adb = $db{$b}->{'airdate'};
   if (! defined($ada)) {
      $ada='';
   }
   if (! defined($adb)) {
      $adb='';
   }
   my $rval = $ada cmp $adb;
   if ($rval == 0) {
      $rval = $a cmp $b;
   }
   return $rval;
}

my $dbtemp="$dbfile.$$";
my $dbh;
my $missing_airdate = 0;
if (open($dbh, '>', $dbtemp)) {
   foreach $bn (sort compare_airdate keys %db) {
      $r = $db{$bn};
      if (! exists($r->{'airdate'})) {
         $missing_airdate = 1;
      }
      print $dbh "\n[$bn]\n";
      my $key;
      my $val;
      foreach $key (sort(keys(%{$r}))) {
         $val = $r->{$key};
         print $dbh "$key=$val\n";
      }
   }
   close($dbh);
   unlink("$dbfile.bak");
   link($dbfile,"$dbfile.bak");
   unlink($dbfile);
   link($dbtemp,$dbfile);
   unlink($dbtemp);
}

if ($missing_airdate) {

   # I'm missing airdates for some episodes, update the database
   # from the Doctor Who wiki page to fill in any new airdates
   # discovered since the last time I did this.

   system("update-airdates");
}

build-web

#!/usr/bin/perl -w
#
# Build the web interface for access to Doctor Who episodes I've
# accumulated...
#
# Working on this - want to output new table and header when the doctor
# changes as well as a button and description info for that doctor.
# Probably should special case the doctors revisited episodes to put
# them first for each doctor (Perhaps add a special sortdate attribute
# to override airdate for sort, but not for print).

use strict;
use Data::Dumper;

# Get PATH set to include this script's directory and other useful bits

my $newpath=`dirname $0`;
chomp($newpath);
$newpath=`$newpath/echo-path`;
chomp($newpath);
$ENV{'PATH'}=$newpath;

my $verbose = (scalar(@ARGV) == 1) && ($ARGV[0] eq "-v");

# Update the database file with latest information

system("build-db");

my $topdir="/huge/vids";
my $whodir="$topdir/DoctorWho";
my $dbdir="$whodir/.data";
my $dbfile="$dbdir/allinfo.txt";
my $doctorfile="$dbdir/doctors.txt";
my $webindex="$whodir/index.html";
my %db;
my %doctors;

# Now read in the database of all the episode information

my $fh;
my $r;
if (open($fh, '<', $dbfile)) {
   while (<$fh>) {
      chomp;
      if (/^\[(.+)\]$/) {
         my $basename = $1;
         $r = {};
         $db{$basename} = $r;
      } elsif (/^([A-Za-z0-9_]+)=(.+)$/) {
         if (defined($r)) {
            my $key = $1;
            my $val = $2;
            $r->{$key} = $val;
         }
      }
   }
   close($fh);
   undef($fh);
   undef($r);
}

# And all the doctor information

if (open($fh, '<', $doctorfile)) {
   while (<$fh>) {
      chomp;
      if (/^\[(.+)\]$/) {
         my $docname = $1;
         $r = {};
         $doctors{$docname} = $r;
      } elsif (/^([A-Za-z0-9_]+)=(.+)$/) {
         if (defined($r)) {
            my $key = $1;
            my $val = $2;
            $r->{$key} = $val;
         }
      }
   }
   close($fh);
   undef($fh);
   undef($r);
}

# Build the index.html file from the information in the database

my $tempfile="$webindex.$$";
my $htm;
open($htm, '>', $tempfile) || die "Cannot write $tempfile : $!\n";
print $htm <<'HEADER';
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">

<html>
<head>
  <meta name="generator" content=
  "HTML Tidy for Linux (vers 25 March 2009), see www.w3.org">

  <title>Doctor Who Collection</title>
  <link rel="StyleSheet" href=".data/wholist.css" type="text/css">
<script type="text/javascript">
	function cecontrol(id){
		if(document.getElementById(id).style.display != "table"){
			document.getElementById(id+"img").style.display = "none";
			document.getElementById(id).style.display = "table";
			document.getElementById(id+"btn").innerHTML="-";
		}else{
			document.getElementById(id).style.display = "none";
			document.getElementById(id+"img").style.display = "block";
			document.getElementById(id+"btn").innerHTML="+";
		}
	}
</script>
<style type="text/css">
table {
   display:none;
}
button {
   font-family:Monospace;
   font-size:100%;
   font-weight:bold;
   border-radius:50%;
   border:0px;
}
</style>
</head>

<body>
  <a href="/cgi-bin/play-next-vid"><img width="1000" height="255" style="border-style: none none none none;" src=".data/DoctorWho.png"></a>
HEADER

my $lastdoc;

sub compare_airdate {
   my $ada = $db{$a}->{'sortdate'};
   my $adb = $db{$b}->{'sortdate'};
   if (! defined($ada)) {
      $ada=$db{$a}->{'airdate'}
   }
   if (! defined($adb)) {
      $adb=$db{$b}->{'airdate'};
   }
   if (! defined($ada)) {
      $ada='';
   }
   if (! defined($adb)) {
      $adb='';
   }
   return $ada cmp $adb;
}

my @doclist;
my $n;
foreach $n (sort compare_airdate keys %db) {
   my $r = $db{$n};
   my $doc = $r->{'doctor'};
   my $avi = $r->{'avi'};
   my $mpg = $r->{'mpg'};
   next if (! $avi) && (! $mpg);
   my $nth = 0;
   if (defined($doc)) {
      $doc = $doctors{$doc};
      if (defined($doc)) {
         my $num = $doc->{'index'};
         if (defined($num)) {
            $nth = $num+0;
         }
      }
   }
   if (! defined($doclist[$nth])) {
      $doclist[$nth] = [];
   }
   push(@{$doclist[$nth]}, $n);
}

my $showcount=0;
my $showsec=0.0;
my $doc;
my $butnum = 0;
foreach $doc (@doclist) {
my $d = $db{$doc->[0]};
$d = $d->{'doctor'};
print $htm "<p><button id=\"tab${butnum}btn\" onclick=\"cecontrol('tab${butnum}')\">+</button> ";
if (! defined($d)) {
   print $htm "General Wibbly-Wobbly, Timey-Wimey, Spacey-Wacey stuff..</p>\n";
   print $htm "<p id=\"tab${butnum}img\">\&nbsp;</p>\n";
} else {
   my $dr = $doctors{$d};
   my $di = $dr->{'image'};
   my $dw = $dr->{'which'};
   print $htm "$d: The $dw Doctor.</p>\n";
   print $htm "<p id=\"tab${butnum}img\"><img src=\".data/$di\">";
   my $companions=$dr->{'companions'};
   if (defined($companions)) {
      my $cp;
      foreach $cp (split(/,/,$companions)) {
         print $htm "\&nbsp;<img src=\".data/$cp\">";
      }
   }
   print $htm "</p>\n";
}
print $htm "  <table id=\"tab${butnum}\" class=\"ts\" border=\"1\" cellpadding=\"5\">\n";
++$butnum;
print $htm <<'FIRSTROW';
    <tr class="hdr">
      <th class="dws">Original<br>
      Airdate</th>

      <th class="dws">Minutes</th>

      <th class="dws">Size</th>

      <th class="dws">Flags</th>

      <th class="dwl">Description</th>
    </tr>
FIRSTROW
foreach $n (@{$doc}) {
   my $r = $db{$n};
   my $avi = $r->{'avi'};
   my $mpg = $r->{'mpg'};
   next if (! $avi) && (! $mpg);

   my $handtuned=0;
   if (exists($r->{'handtuned'}) && ($r->{'handtuned'} == 1)) {
      $handtuned=1;
   }
   my $torrent=0;
   if (exists($r->{'torrent'}) && ($r->{'torrent'} ne '')) {
      $torrent=1;
   }
   my $warning = 0;
   if (! $avi) {
      $warning = 1;
      print STDERR "$n: Missing .avi file\n" if $verbose && (! $handtuned);
   }
   if (! $mpg) {
      $warning = 1;
      print STDERR "$n: Missing .mpg file\n" if $verbose && (! $handtuned);
   }
   my $airdate = $r->{'airdate'};
   my $minutes = $r->{'aviseconds'};
   if (defined($minutes)) {
      $showsec += $minutes;
      $minutes = int(($minutes + 30.0)/60.0);
      if ($minutes < 41) {
         $warning = 1;
         print STDERR "$n: AVI duration less than 41 minutes\n"
            if $verbose && (! $handtuned);
      }
   } else {
      $minutes = '';
      $warning = 1;
      print STDERR "$n: No AVI duration available\n"
         if $verbose && (! $handtuned);
   }
   my $edl = $r->{'edl'};
   if ((! defined($edl)) || ($edl eq 'bad')) {
      my $mpgdirname = $r->{'mpgdirname'};
      if ((! defined($mpgdirname)) || (! (-f "$whodir/$mpgdirname/$n.keep"))) {
         $warning = 1;
         print STDERR "$n: No or bad .edl file\n" if $verbose && (! $handtuned);
      }
   }
   if ($minutes ne '') {
      my $mpgminutes = $r->{'mpgseconds'};
      if (defined($mpgminutes)) {
         $mpgminutes = int(($mpgminutes + 30.0)/60.0);
         if ($mpgminutes < 62) {
            $warning = 1;
            print STDERR "$n: MPG less than 62 minutes\n"
               if $verbose && (! $handtuned);
         }
         $minutes .= " was $mpgminutes";
      } else {
         $warning = 1;
         print STDERR "$n: No MPG duration available\n"
            if $verbose && (! $handtuned);
      }
   }
   my $avisize = $r->{'avisize'};
   if (defined($avisize)) {
      $avisize = int(($avisize + (500*1000))/(1000*1000));
      $avisize = "$avisize MB";
   } else {
      $warning = 1;
      $avisize = '';
      print STDERR "$n: No AVI size available\n" if $verbose && (! $handtuned);
   }
   my $flag_new = 0;
   my $flag_very_new = 0;
   my $readme='';
   if ($mpg) {
      my $mpgdirname = $r->{'mpgdirname'};
      if (defined($mpgdirname)) {
         if (-f "$whodir/$mpgdirname/README") {
            $readme="$mpgdirname/README";
         }
         my $mpgfile = "$whodir/$mpgdirname/$n.mpg";
         if ((-f $mpgfile) && ((-M $mpgfile) < 7)) {
            if ((-M $mpgfile) < 1) {
               $flag_very_new = 1;
            } else {
               $flag_new = 1;
            }
         }

         # If there is a "guess" .keep file check out the durations of the
         # keepable segments. If any but last one is less than 4 minutes or
         # greater than 20 minutes then that is highly suspicious. Come to
         # think of it, if there are more than 8 segments or less than 4,
         # that is kinda suspicious as well.

         my $guesskeep = "$whodir/$mpgdirname/guess-$n.guess";
         my $gh;
         if (open($gh, '<', $guesskeep)) {
            my @durations;
            while (<$gh>) {
               chomp;
               my @set = split(' ', $_);
               if (scalar(@set) == 2) {
                  push(@durations, $set[1] - $set[0]);
               } else {
                  $warning = 1;
                  print STDERR "$n: Bad format .keep line\n"
                     if $verbose && (! $handtuned);
               }
            }
            close($gh);
            if ((scalar(@durations) < 4) || (scalar(@durations) > 8)) {
               $warning = 1;
               print STDERR "$n: Unlikely number of keep segs\n"
                  if $verbose && (! $handtuned);
            }
            while (scalar(@durations) > 0) {
               my $dur = shift(@durations);
               if ($dur > (20.0*60)) {
                  $warning = 1;
                  print STDERR "$n: Segment time > 20 minutes\n"
                     if $verbose && (! $handtuned);
               }
               if ((scalar(@durations) > 0) && ($dur < (4.0*60))) {
                  $warning = 1;
                  print STDERR "$n: Segment time < 4 minutes\n"
                     if $verbose && (! $handtuned);
               }
            }
         }
      }
   }
   my $description = $r->{'description'};
   if (! defined($description)) {
      print STDERR "No description for $n\n";
      $description='';
   }
   if ($avi) {
      my @pieces=split(/\&quot\;/, $description);
      if (scalar(@pieces) == 3) {
         my $mpgdirname = $r->{'mpgdirname'};
         my $html5='';
         my $vidsrc='';
         if (-f "$whodir/$mpgdirname/$n.mp4") {
            if (! (-f "$whodir/$mpgdirname/$n.jpg")) {
               system("cd $whodir/$mpgdirname ; " .
                      "make-poster $n.mp4 $n.jpg > /dev/null 2>\&1");
            }
            $vidsrc .= <<MP4SRC;
    <source src="$n.mp4"
            type="video/mp4; codecs=avc1.42E01E,mp4a.40.2">
MP4SRC
         }
         if ( -f "$whodir/$mpgdirname/$n.webm") {
            $vidsrc .= <<WEBMSRC;
    <source src="$n.webm"
            type="video/webm; codecs=vp8,vorbis">
WEBMSRC
         }
         unlink("$whodir/$mpgdirname/$n.html");
         if ($vidsrc ne '') {
            chomp($vidsrc);
            my $htm;
            if (open($htm, '>', "$whodir/$mpgdirname/$n.html")) {
               print $htm <<HTMEND;
<!DOCTYPE html>

<html>
<head>
  <title>$pieces[1]</title>
</head>

<body>
  <p><b>$pieces[1]</b><br>
  $pieces[2]</p>

  <video width="1280" height="720" controls preload="none"
         poster="$n.jpg">
$vidsrc
  </video>
</body>
</html>
HTMEND
               close($htm);
            }
         }
         if (-f "$whodir/$mpgdirname/$n.html") {
            $html5 = " <a href=\"$mpgdirname/$n.html\">HTML5</a>";
         }
         $description=$pieces[0] .
                      "<a href=\"$airdate-$n.avi\">" . $pieces[1] . "</a>" .
                      $html5 . $pieces[2];
         if (!defined($description)) {
            print STDERR "Ill-formed description for $n\n";
         }
      }
   } else {
      $description=~s/\&quot\;//g;
   }
   if ($minutes eq '') {
      $minutes = '&nbsp;';
   }
   if ($avisize eq '') {
      $avisize = '&nbsp;';
   }
   if ($handtuned) {
      $minutes = "<span class=\"tuned\">$minutes</span>";
      $warning = 0;
   }
   if ($torrent) {
      $warning = 0;
   }
   if ($warning) {
      $warning = '<img width="18" height="16" src=".data/warning.png"> ';
   } else {
      $warning = '';
   }
   if ($flag_new || $flag_very_new) {
      if ($flag_new) {
         $warning .= '<img width="16" height="16" ' .
            'title="Downloaded in last week" src=".data/red_new.png"> ';
      } else {
         $warning .=
            '<img width="16" height="16" ' .
               'title="Downloaded in last day" src=".data/animated_new.gif"> ';
      }
   }
   if (! defined($airdate)) {
      $airdate='&nbsp;';
   }
   my $notes='';
   if ((! exists($r->{'nodownload'})) || ($r->{'nodownload'} ne '1')) {
      $notes = '<img width="24" height="24" ' .
         'title="Want to download new copy" src=".data/download-icon.png">';
   }
   if ((! $handtuned) && (! $torrent)) {
      $notes .= ' ' if ($notes ne '');
      $notes .= '<img width="24" height="24" ' .
         'title="Needs hand tweaking" src=".data/tweak-icon.png">';
   }
   if ($warning ne '') {
      $notes .= ' ' if ($notes ne '');
      $notes .= $warning;
   }
   if ($readme ne '') {
      $notes .= ' ' if ($notes ne '');
      $notes .= "<a href=\"$readme\" target=\"_blank\">" .
         "<img width=\"24\" height=\"24\" src=\".data/readme-icon.png\"></a>";
   }
   if ((! exists($r->{'title'})) ||
       (! exists($r->{'airdate'}))) {
      $notes .= ' ' if ($notes ne '');
      $notes .= '<img width="24" height="24" ' .
         'title="Missing database attribute" src=".data/question.png">';
   }
   my $flags = $r->{'flags'};
   if (defined($flags)) {
      $notes .= ' ' if ($notes ne '');
      $notes .= '<img width="24" height="24" ' .
         'title="Reconstructed episode" src=".data/CobbleStone.png">';
      if ($flags eq "R") {
         $flags = "<br><em>Episode reconstructed with stills and audio.</em>";
      } elsif ($flags eq "A") {
         $flags="<br><em>Episode reconstructed with animation.</em>";
      } else {
         $flags = "<br><em>Unknown flag $flags</em>";
      }
   } else {
      $flags='';
   }
   if ($notes eq '') {
      $notes='&nbsp;';
   }
   ++$showcount;
   print $htm "<tr><td class=\"dws\">$airdate</td><td class=\"dws\">$minutes</td><td class=\"dws\">$avisize</td><td class=\"dws\">$notes</td><td class=\"dwl\">${description}${flags}</td></tr>\n";
}
print $htm <<'TRAILER';
  </table>
TRAILER
}

   print $htm "<p>$showcount Episodes recorded";
   if ($showsec > 0) {
      $showsec = int($showsec + 0.5);
      my $showmin = int($showsec / 60);
      $showsec -= ($showmin * 60);
      my $showhour = int($showmin / 60);
      $showmin -= ($showhour * 60);
      my $showday = int($showhour / 24);
      $showhour -= ($showday * 24);
      if ($showday > 0) {
         if ($showday == 1) {
            print $htm ", 1 day";
         } else {
            print $htm ", $showday days";
         }
      }
      if ($showhour > 0) {
         if ($showhour == 1) {
            print $htm ", 1 hour";
         } else {
            print $htm ", $showhour hours";
         }
      }
      if ($showmin > 0) {
         if ($showmin == 1) {
            print $htm ", 1 minute";
         } else {
            print $htm ", $showmin minutes";
         }
      }
      if ($showsec > 0) {
         if ($showsec == 1) {
            print $htm ", 1 second";
         } else {
            print $htm ", $showsec seconds";
         }
      }
      print $htm " of commercial free Doctor Who.";
   }
   print $htm "</p>\n";

   print $htm "<p>Icon legend:</p>\n";

print $htm <<'LEGEND';
  <table class="ts" border="1" cellpadding="5" style="display:table;">
    <tr class="hdr">
      <th class="dws">Flag</th>

      <th class="dwl">Description</th>
    </tr>

    <tr>
      <td class="dws"><img width="18" height="16" src=".data/warning.png"></td>
      <td>Problems detected with this video.</td>
    </tr>

    <tr>
      <td class="dws"><img width="16" height="16" src=".data/red_new.png"></td>
      <td>New recording less than a week old.</td>
    </tr>

    <tr>
      <td class="dws"><img width="16" height="16" src=".data/animated_new.gif"></td>
      <td>New recording less than 24 hours old.</td>
    </tr>

    <tr>
      <td class="dws"><img width="24" height="24" src=".data/download-icon.png"></td>
      <td>Video has glitch, would like to download a new copy.</td>
    </tr>

    <tr>
      <td class="dws"><img width="24" height="24" src=".data/tweak-icon.png"></td>
      <td>Commercial removal not yet hand tweaked in this video.</td>
    </tr>

    <tr>
      <td class="dws"><img width="24" height="24" src=".data/readme-icon.png"></td>
      <td>Link to special notes on this video.</td>
    </tr>

    <tr>
      <td class="dws"><img width="24" height="24" src=".data/question.png"></td>
      <td>Missing attributes in database entry.</td>
    </tr>

    <tr>
      <td class="dws"><img width="24" height="24" src=".data/CobbleStone.png"></td>
      <td>Reconstructed episode cobbled together.</td>
    </tr>
  </table>
  <div class="modtime">
LEGEND
   my $curtime=`date`;
   chomp($curtime);
   print $htm "Page last modified $curtime";
print $htm <<'TRAILER';
  </div>
</body>
</html>
TRAILER
close($htm);
unlink($webindex);
rename($tempfile,$webindex);

clean-all

#!/bin/bash
#
# Run clean-dir in each Doctor Who episode directory
#
mydir=`dirname $0`
PATH=`$mydir/echo-path`
export PATH
#
if [ -d "/huge/vids/DoctorWho" ]
then
   cd /huge/vids/DoctorWho
   tmpfile="/var/tmp/clean$$"
   trap  "rm -f $tmpfile" EXIT
   ls -1 | fgrep -v .avi | grep -e '^[0-9]' > "$tmpfile"
   while read dirname
   do
      if [ -d "$dirname" ]
      then
         ( 'cd' "$dirname" ; clean-dir )
      fi
   done < "$tmpfile"
else
   echo Missing the /huge/vids/DoctorWho directory 1>&2
   exit 2
fi

clean-dir

#!/bin/bash
#
# Clean up all the machine generated files in a directory, leaving only the
# hand constructed or downloaded files and the final .avi file. Do nothing
# if the directory is missing essential files.
#
mydir=`dirname $0`
PATH=`$mydir/echo-path`
export PATH
#
echo Cleaning directory `/bin/pwd`
mpgcount=`ls -1 *.mpg | wc -l`
if [ "$mpgcount" = "1" ]
then
   basefile=`basename *.mpg .mpg`
   if [ -f "$basefile.mpg" ]
   then
      :
   else
      echo \*\*\* "$basefile.mpg" is not a regular file 1>&2
      exit 2
   fi
   if [ -f "$basefile.avi" ]
   then
      :
   else
      echo \*\*\* Missing "$basefile.avi" 1>&2
      exit 2
   fi
   if [ -f "$basefile.keep" ]
   then
      :
   else
      echo \*\*\* Missing "$basefile.keep" 1>&2
      exit 2
   fi
   if [ -f "info.txt" ]
   then
      :
   else
      echo \*\*\* Missing info.txt 1>&2
      exit 2
   fi
   for i in tempdir-* "time-$basefile.avi" "hq-$basefile.avi" \
            "$basefile.concat" "$basefile.edl" "$basefile.csv" \
            "pcm-$basefile.avi" "aud-$basefile.wav" "guess-$basefile.keep" \
            work.log guess.log runseg.log gentime.log webm.out
   do
      if [ -d "$i" ]
      then
         echo Removing directory $i
         rm -rf "$i"
      else
         if [ -f "$i" ]
         then
            echo Removing file $i
            rm -f "$i"
         fi
      fi
   done
else
   echo \*\*\* Count of mpg files \($mpgcount\) is not 1 1>&2
   exit 2
fi

create-keep-guess

#!/bin/sh
#
# Script to create a .edl file with comskip and invoke edl-to-keep
# to convert it to a .keep file.
#
mydir=`dirname $0`
PATH=`$mydir/echo-path`
export PATH
#
infile="$1"
#
basefile=`basename $infile .mpg`
comskip --quiet --ini=/usr/local/etc/comskip/comskip.ini "$infile"
if [ -f "$basefile.edl" ]
then
   rm -f "$basefile.csv" "$basefile.log" "$basefile.logo.txt" "$basefile.txt"
   edl-to-keep "$basefile.edl"
else
   echo "comskip failed to create $basefile.edl" 1>&2
   exit 2
fi

db-read

#!/usr/bin/perl -w
#
# Simple utility to read database from other shell scripts.
#

use strict;

my $topdir="/huge/vids";
my $whodir="$topdir/DoctorWho";
my $dbdir="$whodir/.data";
my $dbfile="$dbdir/allinfo.txt";
my %db;

# Get PATH set to include this script's directory and other useful bits

my $newpath=`dirname $0`;
chomp($newpath);
$newpath=`$newpath/echo-path`;
chomp($newpath);
$ENV{'PATH'}=$newpath;

# Read in the existing database.

my $fh;
my $r;
if (open($fh, '<', $dbfile)) {
   while (<$fh>) {
      chomp;
      if (/^\[(.+)\]$/) {
         my $basename = $1;
         $r = {};
         $db{$basename} = $r;
      } elsif (/^([A-Za-z0-9_]+)=(.+)$/) {
         if (defined($r)) {
            my $key = $1;
            my $val = $2;
            $r->{$key} = $val;
         }
      }
   }
   close($fh);
   undef($fh);
   undef($r);
}

if (scalar(@ARGV) != 2) {
   die "Usage: db-read key valuename\n";
}

$r = $db{$ARGV[0]};
if (defined($r)) {
   my $val = $r->{$ARGV[1]};
   if (defined($val)) {
      print "$val\n";
      exit(0);
   }
}
exit(2);

dig-episodes

#!/usr/bin/perl -w
#
# Silly perl script that reads the web directory fetched from a tivo box
# (after filtering through add-newlines) and looks for episodes of
# Doctor Who to download.
#
# Will fail miserably if the TiVo starts formatting the web page
# differently...
#

use strict;

# Get PATH set to include this script's directory and other useful bits

my $newpath=`dirname $0`;
chomp($newpath);
$newpath=`$newpath/echo-path`;
chomp($newpath);
$ENV{'PATH'}=$newpath;

my $topdir='/huge/vids';
my $whodir="$topdir/DoctorWho";
my $dbdir="$whodir/.data";
my $dbfile="$dbdir/allinfo.txt";
my %db;

# Read in the existing database to check for already downloaded episodes
# we can skip (marked with nodownload=1 after I'm sure the copy I have
# is good enough that I'm unlikely to do better).

my $fh;
my $r;
if (open($fh, '<', $dbfile)) {
   while (<$fh>) {
      chomp;
      if (/^\[(.+)\]$/) {
         my $basename = $1;
         $r = {};
         $db{$basename} = $r;
      } elsif (/^([A-Za-z0-9_]+)=(.+)$/) {
         if (defined($r)) {
            my $key = $1;
            my $val = $2;
            $r->{$key} = $val;
         }
      }
   }
   close($fh);
   undef($fh);
   undef($r);
} else {
   print "Cannot read $dbfile : $!\n";
}

sub check_one_episode {
   my $rd = shift;
   my $episode_name;
   my $record_date;
   my $url;
   my $id;
   if ($rd->[1]=~/alt=\"BBCAHD\"/) {
      if ($rd->[2]=~/Doctor Who/) {
         $episode_name = $rd->[2];
         chomp($episode_name);
         $episode_name=~s/\&[a-zA-Z0-9_]+\;//g;
         $episode_name=~s/^\<td[^>]+\>//;
         $episode_name=~s/\<\/td\> *$//;
         $episode_name=~s/^\<b\>//;
         $episode_name=~s/\<br\>.*$//;
         $episode_name=~s/\<\/b\>.*$//;
         $episode_name=~s/([\w']+)/\u\L$1/g;
         $episode_name=~s/[^A-Za-z_0-9]//g;
         $episode_name=~s/^DoctorWho//;
         $record_date = $rd->[3];
         if ($record_date=~/(\d+)\/(\d+)/) {
            my $month=$1;
            my $day=$2;
            my @curtime = localtime(time);
            my $year = $curtime[5] + 1900;
            my $curmonth = $curtime[4] + 1;
            if ($curmonth < $month) {
               $year -= 1;
            }
            $month = sprintf("%02d",$month+0);
            $day = sprintf("%02d",$day+0);
            $record_date = "$year-$month-$day";
         }
         $url = $rd->[5];
         if ($url=~/\<a href=\"([^"]+)\"/) {
            $url = $1;
            $url=~s/\&amp\;/\&/g;
         }
         if ($url=~/id=(\d+)/) {
            $id = $1;
         }
      }
   }
   if (defined($episode_name) && ($episode_name ne '') &&
       defined($record_date) && defined($url) &&
       defined($id)) {
      $r = $db{$episode_name};
      my $nodownload;
      if (defined($r)) {
         $nodownload = $r->{'nodownload'};
      }
      if (((! defined($nodownload)) || ($nodownload ne '1')) &&
          (! -d "$whodir/$record_date-$id-$episode_name")) {
         my $tempdir="$whodir/$$";
         mkdir($tempdir);
         if (-d $tempdir) {
            chdir($tempdir);
            my $fd;
            if (open($fd, '>', "info.txt")) {
               my $i;
               foreach $i (@{$rd}) {
                  print $fd $i;
               }
               close($fd);
               sleep(15);
               system("get-tivo", $episode_name, $url);
               chdir($whodir);
               if ((! -f "$tempdir/$episode_name.mpg") ||
                   ((stat("$tempdir/$episode_name.mpg"))[7] == 0)) {
                  system("rm -rf $tempdir");
               } else {
                  rename("$$", "$record_date-$id-$episode_name");
                  system("start-keeper-bg", "$record_date-$id",
                         "$episode_name");
               }
            }
         }
      }
   }
}

my @rowdata;
while (<>) {
   if (/^\<tr/) {
      if (scalar(@rowdata) >= 6) {
         &check_one_episode(\@rowdata);
         undef @rowdata;
      }
   } elsif (/^\<td/) {
      push(@rowdata, $_);
   }
}
if (scalar(@rowdata) >= 6) {
   &check_one_episode(\@rowdata);
}

echo-path

#!/bin/sh
#
# As lots of these scripts are run from crom, adjust PATH to include things
# not naturally provided by cron, also stick the static build ffmpeg
# directory at the front of the list to get the latest fmmpeg downloaded by
# following a few links from ffmpeg.org (the ffmpeg in fedora is too old to
# support many options and filters I need).
#
thisdir=`dirname $0`
thisdir=`'cd' $thisdir ; /bin/pwd`
ffmpegdir="/zooty/downloads/ffmpeg/2014-03-23"
#
newpath=`echo $PATH | sed 's@'"$ffmpegdir"'@@g' | sed 's@::@:@g'`
newpath=`echo $PATH | sed 's@'"$thisdir"'@@g' | sed 's@::@:@g'`
newpath=`echo $PATH | sed 's@/usr/local/bin@@g' | sed 's@::@:@g'`
newpath="$ffmpegdir:$thisdir:/usr/local/bin:$newpath"
echo $newpath

edl-to-keep

#!/usr/bin/perl
#
# Find directories in database which have .edl files, but do not have
# .keep files, and convert the .edl to .keep format. Stick a guess-
# prefix on the .keep file to indicate it is just an automatically
# generated guess and not yet manually verified.
#
# If a .edl file argument is given, then just convert that .edl file,
# and don't fool with database and scanning all directories.
#
my $topdir="/huge/vids";
my $whodir="$topdir/DoctorWho";
my $dbdir="$whodir/.data";
my $dbfile="$dbdir/allinfo.txt";
my %db;

# Get PATH set to include this script's directory and other useful bits

my $newpath=`dirname $0`;
chomp($newpath);
$newpath=`$newpath/echo-path`;
chomp($newpath);
$ENV{'PATH'}=$newpath;

# Utility routine to run midentify on video file and return length
# in seconds.
#
sub getvidseconds {
   my $vidfile = shift;
   my $avh;
   my $seconds;
   if (open($avh, '-|', 'midentify', "$vidfile")) {
      while (<$avh>) {
         if (/^ID_LENGTH=(.+)$/) {
            $seconds = $1;
         }
      }
      close($avh);
   }
   return $seconds;
}

# Core converter routine
#
sub do_convert {
   my $edlfile = shift;
   my $mpegseconds = shift;
   my $keepfile = shift;
   my @times;
   my $edh;
   if (open($edh, '<', $edlfile)) {
      while (<$edh>) {
         chomp;
         my @flds = split(' ',$_);
         if ((scalar(@flds) == 3) && ($flds[2] == 0)) {
            push(@times, $flds[0], $flds[1]);
         }
      }
      close($edh);

      # We now have a list of times which, if taken in pairs, are the
      # ranges to throw away, but we want the ranges to keep, so shift
      # things around a bit.

      if ($times[0] <= 2.0) {
         # Current list says to throw away video real near beginning
         # of file, so delete that start time so we'll now have a list
         # that begins with the end of the initial chunk of commercials.
         shift(@times);
      } else {
         # Current list starts after the beginning of the recording
         # so prepend 0.0 to the list to say we want to keep the
         # beginning.
         unshift(@times,0.0);
      }

      my $lastime = $times[scalar(@times)-1];
      if (($lastime >= $mpegseconds) ||
          (($mpegseconds - $lastime) <= 2.0)) {
         # Current list says to throw away video all the way to real
         # near the end, so just pop that end time off the list.
         pop(@times);
      } else {
         # Last commercial ends before end time of video so add
         # end time of video as the end of the last range to keep.
         push(@times, $mpegseconds);
      }

      # Now I have a list of pairs that give ranges of times to keep.
      # Just make sure I do have a non-zero even number in the list before
      # writing them out in .keep file format.

      if ((scalar(@times) > 1) && ((scalar(@times) & 1) == 0)) {
         undef $edh;
         if (open($edh, '>', $keepfile)) {
            while (scalar(@times) > 0) {
               my $start = shift(@times);
               my $end = shift(@times);
               print $edh "$start\t$end\n";
            }
            close($edh);
         }
      }
   }
}

# Process argv to get arguments for do_convert
#
sub convert_one_edl {
   my $edlfile = shift;
   my $basefile = `basename "$edlfile" .edl`;
   my $edldir = `dirname "$edlfile"`;
   chomp($edldir);
   chdir($edldir);
   chomp($basefile);
   my $mpgfile = "$basefile.mpg";
   my $mpegseconds = getvidseconds($mpgfile);
   do_convert($edlfile, $mpgseconds, "guess-$basefile.keep");
}

if (scalar(@ARGV) == 1) {
   convert_one_edl($ARGV[0]);
   exit(0);
}

# Read in the existing database (if any) to start with known data
# so this update won't discard any information.

my $fh;
my $r;
if (open($fh, '<', $dbfile)) {
   while (<$fh>) {
      chomp;
      if (/^\[(.+)\]$/) {
         my $basename = $1;
         $r = {};
         $db{$basename} = $r;
      } elsif (/^([A-Za-z0-9_]+)=(.+)$/) {
         if (defined($r)) {
            my $key = $1;
            my $val = $2;
            $r->{$key} = $val;
         }
      }
   }
   close($fh);
   undef($fh);
   undef($r);
}

my $basename;
foreach $basename (keys(%db)) {
   $r = $db{$basename};
   my $dirname = $r->{'mpgdirname'};
   if (-f "$whodir/$dirname/$basename.edl") {

      # There is a .edl file in this directory.

      if ((! (-f "$whodir/$dirname/$basename.keep")) &&
          (! (-f "$whodir/$dirname/guess-$basename.keep"))) {

         # There is no .keep file, build one from the .edl.

         my $mpegseconds = $r->{'mpgseconds'};
         if (! defined($mpegseconds)) {
            $mpegseconds = getvidseconds("$whodir/$dirname/$basename.mpg");
         }
         do_convert("$whodir/$dirname/$basename.edl",
                    $mpegseconds,
                    "$whodir/$dirname/guess-$basename.keep");
      }
   }
}

find-glitch

#!/usr/bin/perl -w
#
# Utility routine for hunting defects in files. Finds places in the
# output from show-packets where a slew of audio packets with no video
# or video packets with no audio show up. These are the most likely
# places where audio sync problems appear, and it can be helpful to
# re-encode splitting the files at the timestamp where the glitch
# ends and normal alternating audio and video resume. That gives you
# a good place to do something like insert silence (see make-silence)
# to get the audio back in sync.
#
use strict;

my $last_si = 3;
my $last_count = 0;

while (<>) {
   if (/^packet\|stream_index\=(\d+)\|/) {
      my $si = $1;
      if ($si == $last_si) {
         ++$last_count;
         if ($last_count == 4) {
            print $_;
         }
      } else {
         $last_count = 1;
         $last_si = $si;
      }
   }
}

fetch-who

#!/bin/bash
#
# Run from cron or manually to fetch any new Doctor Who episodes
# from the TiVo.
#
mydir=`dirname $0`
PATH=`$mydir/echo-path`
export PATH
logfile="/huge/vids/DoctorWho/.data/fetch-who.log"
exec > $logfile 2>&1
get-tivo-dir | add-newlines | dig-episodes
#
# If nothing is being transcoded in background, then update the web page
# (otherwise it gets updated when all the background jobs are done).
#
if [ -f "/huge/vids/DoctorWho/.data/worklist/keeper.pid" ]
then
   :
else
   build-web -v
fi

generate-timestamps

#!/bin/sh
#
# Given a base.mpg file, generate a time-base.avi file scaled down to a
# small size with video timestamp added to the bottom right of the image.
# While doing this conversion, also convert the audio to mp3 so
# we can extract it later.
#
# The video timestamp helps when manually fixing a .keep file and the audio
# is sometimes helpful to catch commercials as well as the video.
#
mydir=`dirname $0`
PATH=`$mydir/echo-path`
export PATH
#
dt="fontsize=18"
dt="$dt:fontfile=/usr/share/fonts/dejavu/DejaVuSansMono-Bold.ttf"
dt="$dt:box=1"
dt="$dt:text="'%{pts}'
dt="$dt:x=(w-text_w-5)"
dt="$dt:y=(h-text_h-5)"
#
sc="320:-1"
#
vid="-c:v libx264 -preset ultrafast -threads 8 -crf 28"
#
aud="-acodec libmp3lame -ab 128k -ac 2"
#
infile="$1"
if [ -f "$infile" ]
then
   basefile=`basename $infile .mpg`
   ffmpeg -i "$infile" -nostats -vf "scale=$sc,drawtext=$dt" \
      $vid $aud "time-$basefile.avi" < /dev/null
else
   echo "Input file $infile does not exist." 1>&2
   exit 2
fi

get-tivo

#!/bin/sh
#
# First arg is base name of file to store download in (this script will
# append .mpg to the name).
#
# Second arg is the URL, probably obtained via Copy Link Location in web
# browser pointed at the tivo box (almost certainly need to put this in
# quotes).
#
# Downloads the file from the tivo to $1.mpg in current directory
# (takes about 4 or 5 minutes per gigabyte).
#
# The ultra secret media access key provided by the tivo box:
#
mediakey=nnnnnnnnnn
#
if [ $# -ne 2 ]
then
   echo usage get-tivo destination-file-name tivo-url 1>&2
   exit 2
fi
cookies=/tmp/$$cookies.txt
filename=$1
shift
if [ -f "$filename.mpg" ]
then
   echo "$filename.mpg" already exists, not downloading. 1>&2
   exit 2
fi
echo Starting file transfer, this will probably take a while...
curl -s --digest -k -u tivo:$mediakey -c $cookies --cookie "sid=abc" "$@" | \
   tivodecode --mak $mediakey -n -o "$filename.mpg" -
rm -f $cookies
ls -l "$filename.mpg"

get-tivo-dir

#!/bin/sh
#
# Fetch list of tivo recordings in html format to stdout.
#
# The ultra secret media access key provided by the tivo box:
#
mediakey=nnnnnnnnnn
#
cookies=/tmp/$$cookies.txt
curl -s --digest -k -u tivo:$mediakey -c $cookies --cookie "sid=abc" \
   'https://746-0001-902d-ac0b/nowplaying/index.html?Recurse=Yes'
rm -f $cookies

keep-working

#!/bin/bash
#
# Make sure I'm the only one running, then loop through any entries in
# the worklist directory running them till there are no more.
#
mydir=`dirname $0`
PATH=`$mydir/echo-path`
export PATH
topdir="/huge/vids"
whodir="$topdir/DoctorWho"
dbdir="$whodir/.data"
workdir="$dbdir/worklist"
if [ -d "$workdir" ]
then
   cd "$workdir"
   if [ -f "keeper.pid" ]
   then
      echo "keeper.pid" already exists 1>&2
      exit 0
   fi
   echo $$ > keeper.pid
   mypid=`cat keeper.pid`
   if [ "$$" = "$mypid" ]
   then

      # OK, it looks like I am the one who needs to really transcode
      # some files. Let's do them one after another till I run out.
      # Set KEEPER_PID env var so jobs can tell if they are running
      # under this script

      KEEPER_PID="$$"
      export KEEPER_PID
      trap "rm -f keeper.pid" EXIT
      while true
      do
         workfile=`ls -1 *.work 2>/dev/null | head -1`
         if [ -f "$workfile" ]
         then
            bash ./"$workfile"
            rm -f $workfile
         else
            # When transcoding is done, build a new web page index.
            build-web
            exit 0
         fi
      done
   fi
else
   echo "$workdir" is not a directory 1>&2
   exit 2
fi

keeper

#!/bin/bash
#
# Look for a .keep file with space separated pairs of timestamps one each
# line to be kept from the original .mpg file, start parallel jobs to encode
# each piece, wait for them all to finish then join them together.
#
mydir=`dirname $0`
PATH=`$mydir/echo-path`
export PATH
#
# Set x264 encoding parameters to do a high quality 1280x720 scaled down
# recording of the original 1080i mpg file downloaded from the TiVo. Use
# -nostats so the logs don't grow to several megabytes.
#
encopts="-nostats -vf scale=1280:-1"
encopts="$encopts -c:v libx264 -preset slower"
encopts="$encopts -x264opts fast-pskip=1:rc-lookahead=6"
encopts="$encopts -b:v 1500k -threads auto"
#
# Deal with a few possible options
#
jobcount="4"
nicemax=19
nicemin=2
tryfast="no"
while getopts ':Fj:n:Hh' opt
do
   case $opt in
   F) tryfast="yes"
      ;;
   j) jobcount=$OPTARG
      ;;
   n) nicemax=$OPTARG
      ;;
   H|h) myname=`basename $0`
      cat <<EOF
usage: $myname options... infile
Options:
-F         Try faster encoding by splitting into segments
-j int     Max copies of ffmpeg to run in parallel
-n int     Largest nice value to use for background jobs
-h or -H   Print this message and exit

The infile is required and should be the name of the downloaded .mpg file
to be transcoded.

Looks for a .keep file with the commercial-free (start stop) times
in the video to be kept, if there is none, runs comskip to generate
a guess.
EOF
      exit 0
      ;;
   \?) echo Unrecognized option $OPTARG, run with -h for help 1>&2
      exit 2
      ;;
   esac
done
shift $(($OPTIND - 1))
if [ "$nicemax" -gt 19 ]
then
   nicemax="19"
fi
if [ "$nicemax" -lt "$nicemin" ]
then
   nicemax="$nicemin"
fi
#
infile="$1"
if [ -f "$infile" ]
then
   waitforit=''
   dirname=`dirname $infile`
   cd "$dirname"
   basefile=`basename $infile .mpg`

   rm -f "$basefile.csv" "$basefile-part"*.mpg
   rm -rf "tempdir-"*

   # Try to find .keep file, if not there, start up run of comskip
   # to create it in the backgrouns.

   keepfile="$basefile.keep"
   if [ -f "$keepfile" ]
   then
      :
   else
      keepfile="guess-$keepfile"
   fi
   if [ -f "$keepfile" ]
   then
      :
   else
      create-keep-guess "$basefile.mpg" > guess.log 2>&1 < /dev/null &
      waitforit="$!"
   fi

   # I don't always need a time-*.avi file, but if I want to manually pick
   # off the times of commercials (which I usually get around to doing),
   # I'll need it so go ahead and create it early, but I don't need to wait
   # for it.

   if [ -f "time-$basefile.avi" ]
   then
      :
   else
      generate-timestamps "$basefile.mpg" > gentime.log 2>&1 < /dev/null &
   fi

   # If I started any background processing, wait for it now.

   if [ -n "$waitforit" ]
   then
      wait "$waitforit"
   fi

   # And yet another file I need (and cannot background easily since
   # it requires the .keep file to be created first) is the .csv file
   # and all the split mpg file parts (or a dummy .csv if I'm not
   # using -F fast encoding).

   if [ -f "$basefile.csv" ]
   then
      :
   else
      if [ "$tryfast" = "yes" ]
      then
         print-segment-command "$basefile.mpg" "$keepfile" > runseg.temp
         rm -f "$basefile-part"*.mpg
         bash -x ./runseg.temp > runseg.log 2>&1 < /dev/null
      else
         # Create dummy .csv file with a single (plenty long enough) segment
         echo "$basefile.mpg",0.000000,18000.000000 > "$basefile.csv"
      fi
   fi

   if [ -f "$keepfile" ]
   then
      partnum=0
      rm -f "$basefile.concat"
      rm -f "$basefile.jobs"
      niceval="$nicemax"
      while read starttime endtime
      do
         partnum=`expr $partnum + 1`

         # Use perl script to do the complicated work of getting the
         # adjusted start time for the right part file

         print-one-segment "$basefile.csv" $starttime $endtime > oneseg.tmp
         if [ -s oneseg.tmp ]
         then
            :
         else
            echo failed: print-one-segment "$basefile.csv" $starttime $endtime
            exit 2
         fi
         read partfile adjtime duration < oneseg.tmp

         td="tempdir-$partnum"
         [ -d "$td" ] || mkdir "$td"
         rm -f "$td/run"
         echo cd "$td" >> "$td/run"

         # Uncompressed audio allows more exact manual diddling when the
         # mpg files are corrupted and I have audio/video sync problems.
         # Compress the audio at the end when joining the segments.

         echo rm -f "pcm-$basefile.avi" >> "$td/run"
         echo ffmpeg -y -i "../$partfile" $encopts -pass 1 \
                 -f avi  -ss $adjtime -t $duration /dev/null >> "$td/run"
         echo ffmpeg -i "../$partfile" $encopts -pass 2 \
                 -acodec pcm_s16le -ar 48000 -ac 2 \
                 -ss $adjtime -t $duration \
                 "pcm-$basefile.avi" >> "$td/run"

         # Save the ordered list of parts in the format the ffmpeg concat
         # demuxer needs to put them all together again.

         echo file \'$td/pcm-$basefile.avi\' >> "$basefile.concat"

         # Create job for transcoding this piece in background. Stagger the
         # niceness of each background job so we don't waste a lot of time
         # context switching when the actual cpu bound encoding is going on,
         # but we can work on multiple jobs during the seeking and pass 1
         # phases (which are not totally cpu bound).

         echo nice -$niceval bash -x "$td/run" \> "$td/log" 2\>\&1 \
            \< /dev/null >> "$basefile.jobs"
         if [ "$niceval" -gt "$nicemin" ]
         then
            niceval=`expr $niceval - 1`
         else
            niceval="$nicemax"
         fi
      done < "$keepfile"

      # Wait for all the pieces to be transcoded and join them together
      # Reverse the order of the jobs so lots of seeking happens in
      # parallel up front.

      if multijob -r $jobcount < "$basefile.jobs"
      then
         # transcoding the segments worked, put them all together and
         # compress the audio.

         rm -f "$basefile.avi" "$basefile.mp4" "$basefile.jpg" \
               "$basefile.webm" "$basefile.html" "temp-$basefile.avi"

         # I found that sometimes weirdly timestamped recordings wind up with
         # better audio sync if I leave the uncompressed audio in the
         # concat results and then recompress the whole file.

         ffmpeg -nostats -f concat -i "$basefile.concat" -vcodec copy \
                -acodec copy \
                "temp-$basefile.avi" < /dev/null

         if [ -f "temp-$basefile.avi" ]
         then

            ffmpeg -nostats -i "temp-$basefile.avi" -vcodec copy \
                   -acodec libmp3lame -ab 128k -ac 2 \
                   "$basefile.avi" < /dev/null

            # While we are here, create an mp4 with AAC audio as well to
            # be more html5 compliant (maybe switch to this entirely if
            # it turns out well, but the need for the experimental option
            # bugs me at the moment :-).

            ffmpeg -nostats -i "temp-$basefile.avi" -vcodec copy \
                   -strict experimental -c:a aac -cutoff 15000 -b:a 128k \
                   "$basefile.mp4" < /dev/null
         fi

         if [ -f "$basefile.avi" ]
         then
            make-airdate-link "$basefile.avi"
            rm -f "$basefile.concat"
            rm -f "temp-$basefile.avi"
            rm -f "$basefile.jobs"
            rm -f "$basefile-part"*.mpg
            rm -f oneseg.tmp runseg.temp
         fi
      fi
   else
      echo "Input file $keepfile does not exist." 1>&2
      exit 2
   fi
else
   echo "Input file $infile does not exist." 1>&2
   exit 2
fi

make-airdate-link

#!/usr/bin/perl -w
#
# Given name of .avi file just generated for new episode, make the
# link to the "../airdate-*.avi" file.

use strict;

# Get PATH set to include this script's directory and other useful bits

my $newpath=`dirname $0`;
chomp($newpath);
$newpath=`$newpath/echo-path`;
chomp($newpath);
$ENV{'PATH'}=$newpath;

my $topdir="/huge/vids";
my $whodir="$topdir/DoctorWho";
my $dbdir="$whodir/.data";
my $dbfile="$dbdir/allinfo.txt";

my $basefile=`basename $ARGV[0] .avi`;
chomp($basefile);
my $basedir=`dirname $ARGV[0]`;
chomp($basedir);
chdir($basedir);

# Read in the database of all the episode information and fetch airdate
#
sub fetch_airdate {
   my $basefile = shift;
   my $airdate;
   my %db;
   my $fh;
   my $r;
   if (open($fh, '<', $dbfile)) {
      while (<$fh>) {
         chomp;
         if (/^\[(.+)\]$/) {
            my $basename = $1;
            $r = {};
            $db{$basename} = $r;
         } elsif (/^([A-Za-z0-9_]+)=(.+)$/) {
            if (defined($r)) {
               my $key = $1;
               my $val = $2;
               $r->{$key} = $val;
            }
         }
      }
      close($fh);
      undef($fh);
      undef($r);
   }
   $r = $db{$basefile};
   if (defined($r)) {
      $airdate = $r->{'airdate'};
   }
   return $airdate;
}

my $airdate = fetch_airdate($basefile);
if (! defined($airdate)) {

   # First attempt didn't work, try to download new airdate info

   system("update-airdates");
   $airdate = fetch_airdate($basefile);
}
if (! defined($airdate)) {
   $airdate = 'YYYY-MM-DD';
}
unlink("../$airdate-$basefile.avi");
link("$basefile.avi", "../$airdate-$basefile.avi");

make-poster

#!/usr/bin/perl -w
#
# Given a video file as argument, generate a "poster" of screen shots from the
# video. Assumes the video has a valid index and I can seek in it.

my $newpath=`dirname $0`;
chomp($newpath);
$newpath=`$newpath/echo-path`;
chomp($newpath);
$ENV{'PATH'}=$newpath;

my $vidfile = $ARGV[0];
my $poster = $ARGV[1];

my $vidwidth;
my $vidheight;
my $vidsecs;

my $idh;
open($idh, '-|', "mplayer -frames 0 -identify -vo null -ao null $vidfile 2>/dev/null");
while (<$idh>) {
   if (/^ID_VIDEO_WIDTH=(\d+)$/) {
      $vidwidth = $1;
   } elsif (/^ID_VIDEO_HEIGHT=(\d+)$/) {
      $vidheight = $1;
   } elsif (/^ID_LENGTH=(\d+\.\d+)$/) {
      $vidsecs=$1;
   }
}

my $incsec = $vidsecs/17.0;
my $ss = $incsec;
my $i;
my $montage="montage";
for ($i = 0; $i < 16; ++$i) {
   my $fs = sprintf("%04.6f",$ss);
   system("mplayer -nosound -ss $fs -frames 1 -vo png:z=1:prefix=mptmp $vidfile");
   $fs = sprintf("%02d",$i);
   system("convert mptmp00000001.png -resize 25\% sstmp$fs.png");
   $montage .= " sstmp$fs.png";
   unlink("mptmp00000001.png");
   $ss += $incsec;
}
my $qh = int($vidheight / 4);
my $qw = int($vidwidth / 4);
$montage .= " -tile 4x4 -geometry ${qw}x${qh}+0+0 $poster";
system($montage);
system("rm -f mptmp*.png sstmp*.png");

make-silence

#!/bin/sh
#
# usage: make-silence outputfile.wav seconds
#
# Makes a 48KHz 2 channel 16 bit signed PCM audio file of silence of the
# given duration (in seconds - fractions OK).
# 
sox -n -r 48000 -c 2 -e signed-integer -b 16 "$1" trim 0 "$2"

merge-segments

#!/usr/bin/perl

use strict;

# This scripts paves the way for manual fixup of audio sync problems.
# It goes into the individual tempdir directories and extracts the
# PCM .wav files from the .avi files, adding extra silence then
# trimming to insure the length is as close as possible to the video
# duration as measured by reading through the video with ffprobe
# (doesn't rely on the header which might be wrong, especially when
# I clearly have problems in this video file or I wouldn't be resorting
# to this nonsense).
#
# Related scripts: show-packets, find-glitch, make-silence
#
# After getting videos split where glitches exist, you can use this
# to extract the audio and sox to join the audio files as well as any
# extra silence needed to get the audio to sync, and ffmpeg to mux
# the audio and video back together.

# Get PATH set to include this script's directory and other useful bits

my $newpath=`dirname $0`;
chomp($newpath);
$newpath=`$newpath/echo-path`;
chomp($newpath);
$ENV{'PATH'}=$newpath;

sub stream_duration {
   my $type = shift;
   my $vid = shift;
   my $vh;
   my %attrs;
   my $rval;
   if (open($vh, '-|', "ffprobe -count_frames -show_streams $vid 2>/dev/null")) {
      while (<$vh>) {
         if (/^\[STREAM\]$/) {
            undef %attrs;
         } elsif (/^\[\/STREAM\]$/) {
            my $codec_type = $attrs{'codec_type'};
            my $duration = $attrs{'duration'};
            if (defined($codec_type) && ($codec_type eq $type)) {
               if ($codec_type eq 'video') {
                  my $nb_read_frames=$attrs{'nb_read_frames'};
                  if (defined($nb_read_frames) && ($nb_read_frames ne 'N/A')) {
                     my $alt_duration = (($nb_read_frames + 0.0) * 1001.0)
                                        / 30000.0;
                     if ((! defined($duration)) ||
                         ($alt_duration != $duration)) {
                        print "Correcting duration from $duration to $alt_duration\n";
                        $duration = $alt_duration;
                     }
                  }
               }
               if (defined($duration)) {
                  $rval = $duration;
               }
            }
         } elsif (/^([a-zA-Z0-9_]+)=(.+)$/) {
            my $key = $1;
            my $val = $2;
            if (! (($val eq '') || ($val eq 'N/A'))) {
               $attrs{$key} = $val;
            }
         }
      }
   }
   close($vh);
   return $rval;
}

sub video_duration {
   my $vid = shift;
   return stream_duration('video', $vid);
}

sub audio_duration {
   my $vid = shift;
   return stream_duration('audio', $vid);
}

# I'm expecting to be run in a directory which has several tempdir-*
# subdirectories containing pcm-*.avi files. Find them all.

my $dh;
opendir($dh, '.') || die "Cannot read current directory\n";
my @dirents = readdir($dh);
closedir($dh);
my $sd;
my @dirnums;
foreach $sd (@dirents) {
   if (-d $sd) {
      if ($sd=~/^tempdir-(\d+)$/) {
         push(@dirnums,$1);
      }
   }
}

# Make a handy 10 seconds of silence.

unlink('silence.wav');
system('sox', '-n', '-r', '48000', '-c', '2', '-e', 'signed-integer',
       '-b', '16', 'silence.wav', 'trim', '0', '10');
if (! -f 'silence.wav') {
   die "Failed to create silence.wav\n";
}

# Now process files in each temp directory

my $dn;
foreach $dn (sort { $a <=> $b } @dirnums) {
   undef $dh;
   opendir($dh, "tempdir-$dn") || die "Cannot read directory tempdir-$dn\n";
   my @files = readdir($dh);
   closedir($dh);
   my $fn;
   foreach $fn (@files) {
      if (-f "tempdir-$dn/$fn") {
         if ($fn=~/^pcm-(.+).avi$/) {
            my $basename=$1;
            my $duration = video_duration("tempdir-$dn/pcm-$basename.avi");
            print "Extracting audio\n";
            unlink("tempdir-$dn/short.wav");
            system('ffmpeg', '-i', "tempdir-$dn/pcm-$basename.avi",
                   '-vn', "tempdir-$dn/short.wav");
            print "Padding audio\n";
            unlink("tempdir-$dn/long.wav");
            system('sox', "tempdir-$dn/short.wav", 'silence.wav',
                   "tempdir-$dn/long.wav");
            print "Clipping audio to exact length\n";
            unlink("tempdir-$dn/pcm-$basename.wav");
            system('sox', "tempdir-$dn/long.wav",
                   "tempdir-$dn/pcm-$basename.wav",
                   'trim', '0', "$duration");
            my $audlen = audio_duration("tempdir-$dn/pcm-$basename.wav");
            print "tempdir-$dn/pcm-$basename.avi duration is $duration\n";
            print "tempdir-$dn/pcm-$basename.wav duration is $audlen\n";
            unlink("tempdir-$dn/short.wav");
            unlink("tempdir-$dn/long.wav");
         }
      }
   }
}
unlink('silence.wav');

midentify

#!/bin/sh
#
# The -vo null and -ao null are important. See whole funny story at:
# http://home.comcast.net/~tomhorsley/game/heisenbug.html
#
mydir=`dirname $0`
PATH=`$mydir/echo-path`
export PATH
#
mplayer -frames 0 -identify -vo null -ao null "$@" 2>/dev/null | grep -e '^ID_'

multijob.c

/* In bash I can wait for 1 specific background job, or all background jobs,
 * but I can't wait for just any old background job, so I need this little
 * C program to read in a bunch of commands to run in the background
 * and run at most N of them at once, waiting till they have all run.
 *
 * Primarily handy for running the most copies of ffmpeg that seem
 * to function well in parallel (for me, about 4 or 5 seems good).
 *
 * The -r option will execute the jobs in reverse order.
 *
 * The single argument to this program is the max number of jobs to
 * run at once, and the commands to run are fed to it on stdin
 * (one command per line which is then run via sh -c "command line").
 */

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <errno.h>
#include <sys/time.h>

#define BIGLINE 1024

#define STATE_PENDING  1
#define STATE_RUNNING  2
#define STATE_EXITED   3
#define STATE_ERROR    4

struct onejob {
   struct onejob * next;
   int             state;
   int             errnum;
   pid_t           kidpid;
   int             exit_stat;
   struct timeval  start_time;
   struct timeval  end_time;
   char            command[1];
};

struct onejob * cmdlist = NULL;
struct onejob ** endlist = &cmdlist;
long wait_count = 0;
int my_exit_code = 0;
int reverse_jobs = 0;

static void start_one_job(struct onejob * j);
static void print_one_job(struct onejob * j);

static void
start_one_job(struct onejob * j) {
   gettimeofday(&j->start_time, NULL);
   j->kidpid = fork();
   if (j->kidpid == 0) {
      /* This is the child, exec the command */
      execlp("sh", "sh", "-c", j->command, NULL);
      exit(2);
   } else if (j->kidpid == (pid_t)-1) {
      /* Hopefully, this will never happen */
      j->errnum = errno;
      j->state = STATE_ERROR;
      print_one_job(j);
   } else {
      /* This is the parent after successful fork */
      ++wait_count;
      j->state = STATE_RUNNING;
      print_one_job(j);
   }
}

static void
print_one_job(struct onejob * j) {
   printf("Job: %s\n", j->command);
   if (j->state == STATE_EXITED) {
      struct timeval duration;
      unsigned long seconds;
      unsigned long minutes;
      unsigned long hours;

      if (WIFEXITED(j->exit_stat)) {
         int exit_code = WEXITSTATUS(j->exit_stat);
         if (exit_code == 0) {
            printf("  Pid %d exited normally, ", (int)j->kidpid);
         } else {
            my_exit_code = exit_code;
            printf("  Pid %d called exit(%d), ", (int)j->kidpid, exit_code);
         }
      } else if (WIFSIGNALED(j->exit_stat)) {
         int signum = WTERMSIG(j->exit_stat);
         printf("  Died with signal %d, ", signum);
         my_exit_code = 2;
      }
      timersub(&j->end_time, &j->start_time, &duration);
      seconds = duration.tv_sec;
      minutes = seconds / 60ul;
      seconds -= (minutes * 60ul);
      hours = minutes / 60ul;
      minutes -= (hours * 60ul);
      printf("wall time: %02lu:%02lu:%02lu.%06lu\n", hours, minutes, seconds,
             (unsigned long)duration.tv_usec);
   } else if (j->state == STATE_ERROR) {
      const char * errmsg = strerror(j->errnum);
      printf("  Unable to fork, errno %d (%s)\n", j->errnum, errmsg);
   } else if (j->state == STATE_RUNNING) {
      printf("  Running pid %d\n", (int)j->kidpid);
   } else if (j->state == STATE_PENDING) {
      printf("  Pending\n");
   }
}

int
main(int argc, char ** argv) {
   char inbuf[BIGLINE];
   long maxjob;
   struct onejob * start_next;

   if (argc > 2) {
      if (strcmp(argv[1], "-r") == 0) {
         reverse_jobs = 1;
         --argc;
         ++argv;
      }
   }
   if (argc != 2) {
      fputs("usage: multijob [-r] <number>\n", stderr);
      exit(2);
   } else {
      char * endp = NULL;
      maxjob = strtol(argv[1], &endp, 0);
      if (! (*endp == '\0')) {
         fprintf(stderr, "Argument %s is not a number.\n", argv[1]);
         exit(2);
      } else if (maxjob <= 0) {
         fputs("Max job count must be greater than zero.\n", stderr);
         exit(2);
      }
   }
   while (fgets(inbuf, BIGLINE, stdin) != NULL) {
      int len = strlen(inbuf);
      if ((len > 0) && (inbuf[len-1] == '\n')) {
         --len;
         inbuf[len] = '\0';
      }
      if (len > 0) {
         struct onejob * newjob =
            (struct onejob *)malloc(sizeof(struct onejob) + len);
         newjob->next = NULL;
         newjob->state = STATE_PENDING;
         newjob->errnum = 0;
         newjob->kidpid = 0;
         newjob->exit_stat = 0;
         timerclear(&newjob->start_time);
         timerclear(&newjob->end_time);
         strcpy(newjob->command, inbuf);
         if (reverse_jobs) {
            newjob->next = cmdlist;
            cmdlist = newjob;
         } else {
            *endlist = newjob;
            endlist = &newjob->next;
         }
         /* print_one_job(newjob); */
      }
   }
   start_next = cmdlist;
   while ((start_next != NULL) && (maxjob-- > 0)) {
      start_one_job(start_next);
      start_next = start_next->next;
   }
   while (wait_count > 0) {
      int seen_status;
      pid_t seen_pid = waitpid(-1, &seen_status, 0);
      if (seen_pid > 0) {
         struct onejob * found_pid = cmdlist;
         while (found_pid != NULL) {
            if ((found_pid->state == STATE_RUNNING) &&
                (found_pid->kidpid == seen_pid)) {
               break;
            }
            found_pid = found_pid->next;
         }
         if (found_pid != NULL) {
            --wait_count;
            found_pid->state = STATE_EXITED;
            found_pid->exit_stat = seen_status;
            gettimeofday(&found_pid->end_time, NULL);
            print_one_job(found_pid);
            if (start_next != NULL) {
               start_one_job(start_next);
               start_next = start_next->next;
            }
         }
      }
   }
   return my_exit_code;
}

ntsc-frame-time

#!/usr/bin/perl
#
# Quick script to print time in seconds of N ntsc frames (N is arg)
#
printf("%0.7f\n",($ARGV[0] + 0.0) / 29.97);

print-one-segment

#!/usr/bin/perl -w
#
# Given a .csv file, and a start and end time as arguments, print the
# correct "part" file, the adjusted start time, and the duration to
# stdout.
#

use strict;

# Get PATH set to include this script's directory and other useful bits

my $newpath=`dirname $0`;
chomp($newpath);
$newpath=`$newpath/echo-path`;
chomp($newpath);
$ENV{'PATH'}=$newpath;

if (scalar(@ARGV) != 3) {
   die "usage: mpgfile keepfile\n";
}
my $csvfile = $ARGV[0];
my $startime = $ARGV[1];
my $endtime = $ARGV[2];

my $ch;
open($ch, '<', $csvfile) || die "Cannot read $csvfile\n";
my @times;
while (<$ch>) {
   chomp;
   my @seg = split(/,/,$_);
   if (($seg[1] <= $startime) && ($seg[2] >= $endtime)) {
      my $adjtime = sprintf("%06f",$startime - $seg[1]);
      my $duration = sprintf("%06f",$endtime - $startime);
      my $partfile = $seg[0];
      print "$partfile $adjtime $duration\n";
      last;
   }
}
close($ch);

print-segment-command

#!/usr/bin/perl -w
#
# Read the .keep file and print an ffmpeg command to stdout that will
# split the master .mpg about 30 seconds before each keepable segment
# Also have it write a .csv file with the info about exactly where the
# splits happened, so later I can seek to the precise location.
#

use strict;

# Get PATH set to include this script's directory and other useful bits

my $newpath=`dirname $0`;
chomp($newpath);
$newpath=`$newpath/echo-path`;
chomp($newpath);
$ENV{'PATH'}=$newpath;

# Arg 1 should be the .mpg file to be split, and arg 2 the .keep file
# with the times of segments.

if (scalar(@ARGV) != 2) {
   die "usage: mpgfile keepfile\n";
}
my $keepfile = $ARGV[1];
my $infile = $ARGV[0];
my $basefile=`basename $infile .mpg`;
chomp($basefile);

# Slightly complicated here. If I have really short segments or commercial
# breaks I might get confused unless I check to see that all the segment
# times are indeed increasing and none of the segment split times fall
# inside any of the previous segments I want to keep. I want all the
# material I wish to keep to be inside one and only one segment.

my $kh;
open($kh, '<', $keepfile) || die "Cannot read $keepfile\n";
my @segs;
my @times;
my $lastime;
while (<$kh>) {
   chomp;
   my @seg = split(' ',$_);
   my $t = $seg[0] - 30.0;
   if ($t < 0.0) {
      $t = 0.0;
   }
   my $tv = $t;
   if (defined($lastime) && ($t < $lastime)) {
      $tv = -1;
   } else {
      my $s;
      foreach $s (@segs) {
         if (($tv >= $s->[1]) && ($tv < $s->[2])) {
            $tv = -1;
            last;
         }
      }
   }
   push(@segs,[$tv,$seg[0],$seg[1]]);
   if ($tv != -1) {
      $lastime = $tv;
      push(@times,$tv);
   }
}
print "ffmpeg -i \"" .
      $infile .
      "\" -codec copy -map 0 -f segment -segment_list \"" .
      "$basefile.csv" .
      "\" -segment_times " .
      join(',',@times) .
      " \"" .
      $basefile . "-part%03d.mpg\"\n";

process-dates

#!/usr/bin/perl
#
# Update the database of episodes with airdate info extracted from the
# Doctor Who wikipedia page being piped into us.
#
my $topdir="/huge/vids";
my $whodir="$topdir/DoctorWho";
my $dbdir="$whodir/.data";
my $dbfile="$dbdir/allinfo.txt";
my %db;

# Get PATH set to include this script's directory and other useful bits

my $newpath=`dirname $0`;
chomp($newpath);
$newpath=`$newpath/echo-path`;
chomp($newpath);
$ENV{'PATH'}=$newpath;

# Read in the existing database (if any) to start with known data
# so this update won't discard any information.

my $fh;
my $r;
if (open($fh, '<', $dbfile)) {
   while (<$fh>) {
      chomp;
      if (/^\[(.+)\]$/) {
         my $basename = $1;
         $r = {};
         $db{$basename} = $r;
      } elsif (/^([A-Za-z0-9_]+)=(.+)$/) {
         if (defined($r)) {
            my $key = $1;
            my $val = $2;
            $r->{$key} = $val;
         }
      }
   }
   close($fh);
   undef($fh);
   undef($r);
}

my $in_row=0;
my $in_col=0;
my @cols;
my $col_data='';

my %months;
$months{'january'}=1;
$months{'february'}=2;
$months{'march'}=3;
$months{'april'}=4;
$months{'may'}=5;
$months{'june'}=6;
$months{'july'}=7;
$months{'august'}=8;
$months{'september'}=9;
$months{'october'}=10;
$months{'november'}=11;
$months{'december'}=12;
$months{'jan'}=1;
$months{'feb'}=2;
$months{'mar'}=3;
$months{'apr'}=4;
$months{'jun'}=6;
$months{'jul'}=7;
$months{'aug'}=8;
$months{'sep'}=9;
$months{'oct'}=10;
$months{'nov'}=11;
$months{'dec'}=12;

sub process_cols {
   my $ar = shift;
   if (($ar->[0]=~/title=/) && ($ar->[5]=~/\d+\s+[A-Za-z]+\s+\d+/)) {
      my @titles = split(/\<br\b/, $ar->[0]);
      my @dates = split(/\<br\b/, $ar->[5]);
      if (scalar(@titles) == scalar(@dates)) {
         while (scalar(@titles) > 0) {
            my $t = shift(@titles);
            my $d = shift(@dates);
            if ($t=~/title=\"([^\"]+)\"/) {
               $t = $1;
               $t=~s/\(Doctor Who\)//g;
               $t=~s/Doctor Who\://g;
               $t=~s/^\s+//;
               $t=~s/\s+$//;
               $t=~s/\&[a-zA-Z0-9_]+\;//g;
               $t=~s/([\w']+)/\u\L$1/g;
               $t=~s/[^A-Za-z_0-9]//g;
               $t=~s/DoctorWhoEpisode$//;
               if ($d=~/(\d+\s+[A-Za-z]+\s+\d+)/) {
                  $d = $1;
                  @dmy=split(' ',$d);
                  my $m=$dmy[1];
                  $m=~tr/A-Z/a-z/;
                  $m=$months{$m};
                  $d=sprintf("%04d-%02d-%02d",$dmy[2],$m,$dmy[0]);
                  my $r = $db{$t};
                  if (! defined($r)) {
                     $r = {};
                     $db{$t} = $r;
                  }
                  if (! exists($r->{'airdate'})) {
                     $r->{'airdate'} = $d;
                  }
               }
            }
         }
      }
   }
}

while (<>) {
   if (/^\<tr\b/) {
      $in_row = 1;
   }
   if ($in_row) {
      if (/^\<td\b/) {
         $in_col = 1;
         $col_data='';
      }
      if ($in_col) {
         $col_data .= $_;
      }
      if (/\<\/td\b/) {
         $in_col=0;
         if ($col_data ne '') {
            push(@cols, $col_data);
         }
      }
   }
   if (/\<\/tr\b/) {
      $in_row = 0;
      if (scalar(@cols) >= 6) {
         &process_cols(\@cols);
         undef @cols;
      }
   }
}

# Save new db (keeping backup)

sub compare_airdate {
   my $ada = $db{$a}->{'airdate'};
   my $adb = $db{$b}->{'airdate'};
   if (! defined($ada)) {
      $ada='';
   }
   if (! defined($adb)) {
      $adb='';
   }
   my $rval = $ada cmp $adb;
   if ($rval == 0) {
      $rval = $a cmp $b;
   }
   return $rval;
}

my $dbtemp="$dbfile.$$";
my $dbh;
if (open($dbh, '>', $dbtemp)) {
   foreach $bn (sort compare_airdate keys %db) {
      $r = $db{$bn};
      print $dbh "\n[$bn]\n";
      my $key;
      my $val;
      foreach $key (sort(keys(%{$r}))) {
         $val = $r->{$key};
         print $dbh "$key=$val\n";
      }
   }
   close($dbh);
   unlink("$dbfile.bak");
   link($dbfile,"$dbfile.bak");
   unlink($dbfile);
   link($dbtemp,$dbfile);
   unlink($dbtemp);
}

show-packets

#!/bin/sh
#
# Handy script to list the packets in a video file in compact form where
# they can be examined for glitches (find-glitch) to find a good place to
# split the file in order to insert or delete audio to get the joined files
# back in sync.
#
mydir=`dirname $0`
PATH=`$mydir/echo-path`
export PATH
#
ffprobe -show_packets -of compact -show_entries \
   packet=stream_index,dts_time,pts_time,duration_time "$@" 2>/dev/null

start-keeper-bg

#!/bin/sh
#
# Add item to worklist, start the worklist processor if it is not already
# running.
#
# These worklist items will running keeper (one at a time) for each newly
# downloaded episode. (Each keeper run already does multiple background jobs
# so it would be counterproductive to do multiple keeper runs at the same
# time).
#
# The args I get are the recorddate-id and the episodename
#
# Special case no args to extract info from current directory.
#
mydir=`dirname $0`
PATH=`$mydir/echo-path`
export PATH
#
if [ "$#" = 0 ]
then
   t1=`/bin/pwd`
   t2=`basename $t1`
   basefile=`echo $t2 | sed -r -e 's/^[0-9]+-[0-9]+-[0-9]+-[0-9]+-//'`
   id=`echo $t2 | sed -e 's/-'$basefile'$//'`
else
   id="$1"
   basefile="$2"
fi
#
topdir="/huge/vids"
whodir="$topdir/DoctorWho"
dbdir="$whodir/.data"
workdir="$dbdir/worklist"
#
[ -d "$workdir" ] || mkdir -p "$workdir"
if [ -d "$workdir" ]
then
   mpgdir="$whodir/$id-$basefile"
   if [ -d "$mpgdir" ]
   then
      workfile="$workdir/$$-$basefile.work"
      ( umask 077 ; echo cd '"'$mpgdir'"' > "$workfile.temp" )
      ( umask 077 ; echo exec \> work.log 2\>\&1 >> "$workfile.temp" )
      ( umask 077 ; echo $topdir/scripts/keeper '"'$basefile.mpg'"' >> \
                        "$workfile.temp" )
      mv "$workfile.temp" "$workfile"
      nohup keep-working > /dev/null 2>&1 < /dev/null &
   else
      echo $mpgdir is not a directory 1>&2
      exit 2
   fi
else
   echo Unable to create $workdir 1>&2
   exit 2
fi

update-airdates

#!/bin/sh
#
mydir=`dirname $0`
PATH=`$mydir/echo-path`
export PATH
#
url='http://en.wikipedia.org/wiki/List_of_Doctor_Who_serials'
curl -s "$url" | add-newlines | process-dates

References

ffmpeg is the star of the show, doing all the hard work. I had to download one of the static builds by following links on the download page to get an ffmpeg that worked properly and had all the features I use.

comskip automates the process of skipping commercials. I used google to find the comskip for linux link and downloaded the source. Most times this program seems close to psysic, but sometimes it fails miserably and I need to modify the times by hand.

tivodecode is a small but critical piece needed to decode the files downloaded from the TiVo. It was available in the Fedora repos, so I didn't have to build it.

List of Doctor Who serials is the wikipedia page I use to download the original airdate information I use to sort the list of episodes. (Usually, the TiVo shows the original airdate in the program info as well, but then I have to manually copy it into my database.)

mplayer is the tool I usually use to play these videos (even if I no longer use mencoder to produce them).

Future

The current scripts produce a wacky combination of H.264 video in a .avi container with an mp3 audio track.

This is a historical artifact from my experiments and is basically the first thing that seemed to work well. I should probably switch to something more HTML5 compatible like a .mp4 container and AAC audio.

Another thing to investigate is getting more control over the TiVo. There are folks who have produced competing android apps for TiVo control, so the protocol the app interface talks is known. It would be nice to automate things like deletion of shows once I download them and rescheduling recordings if the original is clipped or damaged.

Obviously, it would also make sense to generalize some of the scripts and make them less Who-centric, but that can wait till there is another show I feel like saving.

 
Game of Linux Entry Game of Linux Site Map Tom's Fabulous Web Page
 
Page last modified Sun May 19 15:04:37 2013