A Newsweek editor (the famous one, but I've forgotten his name and, really, it's not relevant) was commenting on the randomness of his new iPod Shuffle, or the apparent not-quite-randomness of it. He believed that that his Shuffle had a prediliction for Steely Dan, as various songs by that band came up more frequently than other artists on it, which came up significantly less frequently than he thought that They Should.
Questing for the reason, he went a-interviewing. First Steve Fucking Jobs, who assured him that the iPod Shuffle did not have its own musical preferences. Second, a couple of (thankfully) unnamed mathematicians and cryptologists (no doubt, self-declared), who told him that he was making things up and that the human brain sought out patterns in randomness. The problem here is that THESE PEOPLE ARE ALL MORONS.
No, the iPod Shuffle probably isn't even close to having a cryptographically secure level of random, and there's no reason it needs to (nor would you want to wait for the puny processor in the thing, heavy on the music-decoding bits, light on the hard-integer-math bits, to cough and choke its way through a cryptographically secure PRNG anyway)... but that's not even the point. The point is that there are two broad-side-of-a-barn-obvious explanations for this:
- There's more Steely Dan than other music in his iPod Shuffle. Therefore, there's a greater stastical probability of getting Steely Dan.
- All of these portable music players, and most in-general-purpose-OS music players, don't even pretend to do real random, because the (time) cost of doing so would grow
exponentiallylinearally (see below) with the number of tracks being randomized. Instead, they panel by something (by Artist name, by Album name, by Track name, by--as does my Rio Karma--internal hash representation within DB-not-FS storage of a given track), getting to a panel of some number reasonably large (> 64) to be inclusive of related stuff and reasonably small (< 1024) to avoid being totally tiresome and also not take too long with the un-sort algorithm (I think I've seen this done as the opposite of a Bubble Sort, which is un-sane... have we not learned our qsort lesson yet?), then they may bother interleaving panels, but probably don't.
What's more: this is actually a selling point in the advertising for the iPod Shuffle. They laud the way that it may seem to find interesting correlations in your music. If that's not just total marketing BS, it might be because they're looking at the "type of music" portion of the id3/whatever AAC uses tag as part of its paneling, and actively intermixing your trip-hop or Intelligent Dance Music (yeah... wuh? Also, greetz, 'dolph) with other like-labeled music... a practice some, who've actually seen the bastardization everybody in the fucking world but me brings on the "style" slot, would argue is woefully ill-informed, but that's neither here nor there if they do it right in iTunes, because they'll still get lucky plenty of the time.
Paneling is not a new concept, though I didn't really see the benefit of it for less-than-pseudo-randomizing until I got the Karma. Remember $FORMER_EMPLOYER (more on them in a bit, actually)? It's frequent that they want a subset of the vaguely 350 mil. individuals in the US (or the somewhat fewer than that households), and wanted to do some operation (often, merging new addresses in, and removing redundancies--people don't like getting junk mail to begin with, they really hate getting the same junk mail repeatedly) on that whole group. But this is a huge operation, one which you'd really like to split up and not run on that big Sun, with its fast I/O and slow CPU, but on those cheap Intel-driven machines with their crappy I/O but fast CPUs, so you want a way to split out the work and feed it to four or five of those machines. How do you do it? Well, often by first-digit-of-zip-code. Sure, some zips (9xxxx comes to mind) are heavier than others, but it's Close Enough most of the time.
People, computer science is really so very Not Hard. You need to understand a couple of Very Simple first principles, and you need to possess the capability to Reason. It's just not asking for much. Computers are not wacky, unpredictable, and scary. Unless broken (or obfuscated, thanks much Microsoft), they function along predictable, logical paths. Please to not be trying to read some sort of innate emotion into them. There's No There There.
February 2 2005, 03:50:25 UTC 7 years ago
February 2 2005, 03:58:37 UTC 7 years ago
Jobs... No, I don't think he does. All Jobs knows is flamboyance. He doesn't know long-term marketing, only short-term: his past is littered with broken personal computer dreams (many of which I own, so careful which you insult). As for engineering, Woz was always the brains behind Steve & Steve.
February 2 2005, 04:56:16 UTC 7 years ago
February 2 2005, 04:25:30 UTC 7 years ago
Just index the tracks by int from 0 to N.
map rand(0,1) to (0,N).
February 2 2005, 04:31:59 UTC 7 years ago
February 2 2005, 14:11:00 UTC 7 years ago
February 2 2005, 17:31:54 UTC 7 years ago
February 2 2005, 04:28:07 UTC 7 years ago
February 2 2005, 04:36:47 UTC 7 years ago
Anyway, shiny aside, I don't truck with iPods. They don't play nice with the whacko assortment of computers I've got and use regularly, and they don't play Ogg Vorbis, both of which things my Karma does do.
February 2 2005, 05:35:03 UTC 7 years ago
This is not right.
The Fisher-Yates shuffle is O(N) (certainly not O(exp(N))!) and N*log(N) in space.
A "decent" random number generator (more than enough for shuffling songs) consists of a multiply, a subtract, and a modulus. If you can't multiply numbers, you can do pretty well with two adds, a mod, and bit shifting. (source)
(I'm assuming that by "real" you mean "of a quality furnished by stdlib", not true random or crypto quality random. Even then, it's still O(N).)
February 2 2005, 05:40:15 UTC 7 years ago
I maintain, regardless, that you get Better results for the application by paneling.
February 2 2005, 05:49:13 UTC 7 years ago
I'm with you on making the randomness not quite so random. Nothing like following up some Plaid with a scratchy, monophonic Wanda Landowska Scarlatti recording. That's right, I'm too preoccupied to be bothered with playlists.
February 2 2005, 05:41:54 UTC 7 years ago
February 2 2005, 06:16:22 UTC 7 years ago
I'm not sure about that. Here's the equation for a linear congruential generator and some (very) pseudo assembly:
x(n) = a*x(n-1) + b mod m
load x(n-1)
load a
multiply them
load (b mod m) ; can be precomputed once
add those two
store result
Let's say that takes 6 cycles (it's a RISC, ehh) on a 20 MHz box. (A generation 4 iPod has an 80 MHz ARM) You can find 3 million rands a second. Moving on, we have this inner loop of Fisher-Yates (as described here:
; i is already loaded---it's a loop iterator
; m is already loaded outside of the loop
compute random number
subtract i from m
increment that
load RAND_MAX
divide RAND_MAX by above
divide random number by result
add i and put results in j
; now swap array elements
temp = array[i]
array[i] = array[j]
array[j] = temp
I count 15 ops for the inner loop. Let's add another three for incrementing i and branching if we meet the bound.
We need to make the array we're shuffling: linear indices into the song list. That's three ops for a for loop and another for storing the counter. Now, how many songs can we fit into 20 million operations?
4x + 18x = 2e7
x = just over 9e5
The 1323 songs in my collection (yeah, it's tiny) get sorted in under two hundredths of a second. Slow the algorithm down for array indexing, multiple cycles per multiply/divide, and a fancier RNG and it's still fast. Note that the above code can be improved by putting the RNG's loads outside of the loop.
February 2 2005, 06:38:39 UTC 7 years ago
My pseudocode is atrocious---I've skipped over a number of operations, mostly loads and stores. Or rather, it's specialized instructions for the CRAPS architecture (Confabulated RISC Architecture for Playing Soundfiles).
But that's not so bad, since I was off by a factor of 10 in my time estimate. The sort takes under two thousandths of a second as computed above.
February 2 2005, 14:08:14 UTC 7 years ago
My initial reaction is that you're being a little bit too much computer science and a little bit too little real-world engineering (not accounting for the goofy crap shown on the LCD eating up cycles, not account for hardware interrupts from the drive/USB/Firewire/docking bus), but I'm still coming back to it (and I corrected the above, since I hate it when other people say "expontentially" when they actually meant "linearally", so I'm not going to do it myself).
February 2 2005, 14:08:30 UTC 7 years ago