2011-09-26

Shuffling my music

This is my mp3 player:

$ cd Music
$ mpg123 "$(ls |random |tail -1)"/*

It plays an album from my local music collection at random.

The random script is the most important part:

#!/usr/bin/perl

my @input = <STDIN>;

for (my $i = 0; $i < @input; $i++)
{
    my $t = $input[$i];
    my $s = rand()*@input;
    $input[$i] = $input[$s];
    $input[$s] = $t;
}

foreach my $a (@input)
{
    chomp $a;
    print $a, "\n";
}

This implements a random shuffle of the input lines in O(n) time and O(n) memory.

I used to use this one-liner to un-sort text:

$ perl -e 'print sort { 0.5 <=> rand() } <STDIN>'

This is cute - but it is actually broken code.

So how many bugs did I manage to fit in one line of Perl?

  1. It is unpredictably slower (it could even potentially run forever).
  2. The output is bad if the final input line doesn't end in a newline.
  3. The output distribution is not really random.

Points 1 and 3 are the most fun - they both have to do with how sorting algorithms work, and will depend on the algorithm used. In my opinion, the biggest problem is that sorting algorithms generally have as an underlying assumption the idea that the value of each element stays the same. The dynamic comparison function above violates this premise, meaning that without deeper knowledge, one is no longer guaranteed that every element will be examined or (conversely) that the algorithm will ever finish running. As such I would intuitively expect algorithms which are guaranteed to make progress to be subtly biased towards leaving elements in their original positions, while algorithms without such guarantees might run for a very, very long time and still produce an uneven distribution reflecting the structure of the sorting algorithm itself.

Amusingly, when I googled for shuffling algorithms I came across this Coding Horror blog post where Jeff talks about more or less the exact same two algorithms, but prefers the one I've discarded. :-) He was set right in the comments.

But even simple problems like this can be really hard to get right - my improved algorithm above is flawed as well!

If want to know why, check out this Wikipedia article.

Of course, either of the above are more than good enough for my music player... but now that I've done my homework, I am going to have to go and fix my code.

Update: Tómas Árni points out I could use sort -R instead of my random script. That's much better!

Tags: tech


Recent posts

...