gr ([info]grumpy_sysadmin) wrote,

NPR & New-sweek get Shuffled.

I heard one of the most irritating news/interest bits I ever have on NPR (WHYY, 'round these parts) earlier today. On the way to work, I think.

A Newsweek editor (the famous one, but I've forgotten his name and, really, it's not relevant) was commenting on the randomness of his new iPod Shuffle, or the apparent not-quite-randomness of it. He believed that that his Shuffle had a prediliction for Steely Dan, as various songs by that band came up more frequently than other artists on it, which came up significantly less frequently than he thought that They Should.

Questing for the reason, he went a-interviewing. First Steve Fucking Jobs, who assured him that the iPod Shuffle did not have its own musical preferences. Second, a couple of (thankfully) unnamed mathematicians and cryptologists (no doubt, self-declared), who told him that he was making things up and that the human brain sought out patterns in randomness. The problem here is that THESE PEOPLE ARE ALL MORONS.

No, the iPod Shuffle probably isn't even close to having a cryptographically secure level of random, and there's no reason it needs to (nor would you want to wait for the puny processor in the thing, heavy on the music-decoding bits, light on the hard-integer-math bits, to cough and choke its way through a cryptographically secure PRNG anyway)... but that's not even the point. The point is that there are two broad-side-of-a-barn-obvious explanations for this:
  1. There's more Steely Dan than other music in his iPod Shuffle. Therefore, there's a greater stastical probability of getting Steely Dan.
  2. All of these portable music players, and most in-general-purpose-OS music players, don't even pretend to do real random, because the (time) cost of doing so would grow exponentially linearally (see below) with the number of tracks being randomized. Instead, they panel by something (by Artist name, by Album name, by Track name, by--as does my Rio Karma--internal hash representation within DB-not-FS storage of a given track), getting to a panel of some number reasonably large (> 64) to be inclusive of related stuff and reasonably small (< 1024) to avoid being totally tiresome and also not take too long with the un-sort algorithm (I think I've seen this done as the opposite of a Bubble Sort, which is un-sane... have we not learned our qsort lesson yet?), then they may bother interleaving panels, but probably don't.
And, you know what? It's better this way. As I mentioned, the Karma (seems to, I don't have source) panel by its internal hash (based on track length, id3-or-whatever-else tag, and the POSIX time of addition to the player for uniqueness, as near as I can tell), but I happened to dump my music onto it as albums, or directories of artists with subdirectories of albums. So I'll often hear one track by an artist and be reminded of another (not necessarily the next track, not necessarily even off the same album) during a long drive (I've been doing quite a few lately)... and not have to go searching for that track, calm in the knowledge (from common-sense observation of my music player's behavior) that that track would pop up soon enough. And, sure enough, it does.

What's more: this is actually a selling point in the advertising for the iPod Shuffle. They laud the way that it may seem to find interesting correlations in your music. If that's not just total marketing BS, it might be because they're looking at the "type of music" portion of the id3/whatever AAC uses tag as part of its paneling, and actively intermixing your trip-hop or Intelligent Dance Music (yeah... wuh? Also, greetz, 'dolph) with other like-labeled music... a practice some, who've actually seen the bastardization everybody in the fucking world but me brings on the "style" slot, would argue is woefully ill-informed, but that's neither here nor there if they do it right in iTunes, because they'll still get lucky plenty of the time.

Paneling is not a new concept, though I didn't really see the benefit of it for less-than-pseudo-randomizing until I got the Karma. Remember $FORMER_EMPLOYER (more on them in a bit, actually)? It's frequent that they want a subset of the vaguely 350 mil. individuals in the US (or the somewhat fewer than that households), and wanted to do some operation (often, merging new addresses in, and removing redundancies--people don't like getting junk mail to begin with, they really hate getting the same junk mail repeatedly) on that whole group. But this is a huge operation, one which you'd really like to split up and not run on that big Sun, with its fast I/O and slow CPU, but on those cheap Intel-driven machines with their crappy I/O but fast CPUs, so you want a way to split out the work and feed it to four or five of those machines. How do you do it? Well, often by first-digit-of-zip-code. Sure, some zips (9xxxx comes to mind) are heavier than others, but it's Close Enough most of the time.

People, computer science is really so very Not Hard. You need to understand a couple of Very Simple first principles, and you need to possess the capability to Reason. It's just not asking for much. Computers are not wacky, unpredictable, and scary. Unless broken (or obfuscated, thanks much Microsoft), they function along predictable, logical paths. Please to not be trying to read some sort of innate emotion into them. There's No There There.
Tags: apple, ipod, news, opinion

  • Post a new comment

    Error

    Your IP address will be recorded 

  • 17 comments

[info]hankshiny

February 2 2005, 03:50:25 UTC 7 years ago

But this is a person who bought in iPod shuffle in the first place. Your attempts at wisdom are futile. (Yes, the mathematicians should know better. Steve Jobs probably does know better, but wants the publicity.)

[info]grumpy_sysadmin

February 2 2005, 03:58:37 UTC 7 years ago

Hey dude, don't step. I'm buying a Mac Mini and using it for my stereo. (Of course, it'll run [info]d_m's mpy3 and the remote'll be one of my old Stinkpads in console mode, but still.)

Jobs... No, I don't think he does. All Jobs knows is flamboyance. He doesn't know long-term marketing, only short-term: his past is littered with broken personal computer dreams (many of which I own, so careful which you insult). As for engineering, Woz was always the brains behind Steve & Steve.

[info]hankshiny

February 2 2005, 04:56:16 UTC 7 years ago

I like most iPods OK. The shuffle just seems extraordinarily useless to me, or at least not useful enough to be a hyped new product.

[info]octal

February 2 2005, 04:25:30 UTC 7 years ago

Why would true-random be so expensive?

Just index the tracks by int from 0 to N.

map rand(0,1) to (0,N).

[info]grumpy_sysadmin

February 2 2005, 04:31:59 UTC 7 years ago

That kind of skips implementing rand() and, where you really get burned on cycles, map(), doesn't it?

[info]grumpy_sysadmin

February 2 2005, 14:11:00 UTC 7 years ago

But the real reason that true random is expensive is that you don't have anything that comes close to resembling a real entropy source in a hand-held device. I guess you could take the thermal noise route, but then you'd have to have an audio-in to sample (which you're probably doing because you can also record audio, which neither the iPod nor the Karma can), and keep it at full-gain when there was no mic plugged in. I don't think real random comes close to making sense here anyway.

[info]feng_huang

February 2 2005, 17:31:54 UTC 7 years ago

I can't speak for the Karma, but the iPod can indeed record audio; I myself recorded a semester's worth of Calculus 1 lectures. Not that this negates your other points or anything, of course.

[info]janet_carter

February 2 2005, 04:28:07 UTC 7 years ago

Not to mention that, computer science aside, most random shuffles of songs will have more of *something* than of other things. "Play random selections" doesn't mean the same thing as "play everything equal amounts." Which is clear to me, even as someone who is half-drawn-in by Apple's claims that it's a "tuneful fashion statement" (although not as tempted as when they were describing the iPod minis as "available in four colors, all shiny").

[info]grumpy_sysadmin

February 2 2005, 04:36:47 UTC 7 years ago

Given that iTunes (and other software, not quite so popular but arguably better) allow for "rating" songs on a 1-5 basis. So, what we really need here is a weighted shuffle (I can't take credit for this idea personally... but it comes from people you don't know, so giving you initials won't help much), so that the 5s get played a statistically accurate number of times more frequently than the 4s, and so forth.

Anyway, shiny aside, I don't truck with iPods. They don't play nice with the whacko assortment of computers I've got and use regularly, and they don't play Ogg Vorbis, both of which things my Karma does do.

[info]stepleton

February 2 2005, 05:35:03 UTC 7 years ago

All of these portable music players, and most in-general-purpose-OS music players, don't even pretend to do real random, because the (time) cost of doing so would grow exponentially with the number of tracks being randomized.

This is not right.

The Fisher-Yates shuffle is O(N) (certainly not O(exp(N))!) and N*log(N) in space.

A "decent" random number generator (more than enough for shuffling songs) consists of a multiply, a subtract, and a modulus. If you can't multiply numbers, you can do pretty well with two adds, a mod, and bit shifting. (source)

(I'm assuming that by "real" you mean "of a quality furnished by stdlib", not true random or crypto quality random. Even then, it's still O(N).)

[info]grumpy_sysadmin

February 2 2005, 05:40:15 UTC 7 years ago

I'm still wrong anyway, but note that I was concerned about the order of the algorithm in time, not in space.

I maintain, regardless, that you get Better results for the application by paneling.

[info]stepleton

February 2 2005, 05:49:13 UTC 7 years ago

Yeah, the first O(N) up there was for time. Not a factor.

I'm with you on making the randomness not quite so random. Nothing like following up some Plaid with a scratchy, monophonic Wanda Landowska Scarlatti recording. That's right, I'm too preoccupied to be bothered with playlists.

[info]grumpy_sysadmin

February 2 2005, 05:41:54 UTC 7 years ago

And, in any case, O(N) is too slow. Panel the first chunk of whatever count in a set period of time; panel the rest of them on the fly while those are playing, and your random never takes longer than it takes to randomize one panel (or several panels if you interleave).

[info]stepleton

February 2 2005, 06:16:22 UTC 7 years ago

O(N) is too slow

I'm not sure about that. Here's the equation for a linear congruential generator and some (very) pseudo assembly:

x(n) = a*x(n-1) + b mod m

load x(n-1)
load a
multiply them
load (b mod m) ; can be precomputed once
add those two
store result

Let's say that takes 6 cycles (it's a RISC, ehh) on a 20 MHz box. (A generation 4 iPod has an 80 MHz ARM) You can find 3 million rands a second. Moving on, we have this inner loop of Fisher-Yates (as described here:

; i is already loaded---it's a loop iterator
; m is already loaded outside of the loop
compute random number
subtract i from m
increment that
load RAND_MAX
divide RAND_MAX by above
divide random number by result
add i and put results in j
; now swap array elements
temp = array[i]
array[i] = array[j]
array[j] = temp

I count 15 ops for the inner loop. Let's add another three for incrementing i and branching if we meet the bound.

We need to make the array we're shuffling: linear indices into the song list. That's three ops for a for loop and another for storing the counter. Now, how many songs can we fit into 20 million operations?

4x + 18x = 2e7
x = just over 9e5

The 1323 songs in my collection (yeah, it's tiny) get sorted in under two hundredths of a second. Slow the algorithm down for array indexing, multiple cycles per multiply/divide, and a fancier RNG and it's still fast. Note that the above code can be improved by putting the RNG's loads outside of the loop.

[info]stepleton

February 2 2005, 06:38:39 UTC 7 years ago

Corrigenda:

My pseudocode is atrocious---I've skipped over a number of operations, mostly loads and stores. Or rather, it's specialized instructions for the CRAPS architecture (Confabulated RISC Architecture for Playing Soundfiles).

But that's not so bad, since I was off by a factor of 10 in my time estimate. The sort takes under two thousandths of a second as computed above.

[info]grumpy_sysadmin

February 2 2005, 14:08:14 UTC 7 years ago

I don't have the brainshare to give this the response it reserves this morning... Perhaps this evening.

My initial reaction is that you're being a little bit too much computer science and a little bit too little real-world engineering (not accounting for the goofy crap shown on the LCD eating up cycles, not account for hardware interrupts from the drive/USB/Firewire/docking bus), but I'm still coming back to it (and I corrected the above, since I hate it when other people say "expontentially" when they actually meant "linearally", so I'm not going to do it myself).

[info]grumpy_sysadmin

February 2 2005, 14:08:30 UTC 7 years ago

That's "response it deserves", duh.
Create an Account
Forgot your login or password?
Facebook Twitter More login options
English • Español • Deutsch • Русский…