Wednesday, March 13, 2013

Recommendation engines

Last night I was not having a good Pandora experience. I was sitting at my home computer, trying to come up with an idea for this week's blog post. Now, I am sure that the fact that my blog post was due in 36 hours and I didn't have a topic might have flavored my experience, but...
Darn! What should I blog about?!?!?

I was hungry for some mambo music. For whatever reason, one neuron in my head talked with one of the seventeen others, and Mama Loves Mambo started playing. Naturally, I fired up Pandora. I like Pandora [1], and I have this notion that maybe it likes my taste in music. Well, usually.

I typed in "Mama Loves Mambo" as a seed song. It gave me a song by a French guy. Heavy guitar, sixties kinda genre. The melody and treatment were (in my mind) vaguely reminiscent of Manfred Mann's version of Quinn the Eskimo. That song I never did understand - hey, Bob Dylan wrote it! But I really was not getting a mambo vibe. So, I clicked thumbs down. Pandora gave me an obsequious apology for ruining my life and gave me a firm promise to never to play this song ever again, for anyone. [2] 

Next up was Johnny Horton. Pandora says that Horton "Horton managed to infuse honky tonk with an urgent rockabilly underpinning."  Now, when I get a case of the urgent underpinnings, I would like to get my tonk honked just as much as the next guy, but I wasn't interested in rocking my billy just then. Another thumbs down, but not before I added Harry Belafonte as another seed.

It took me a while, but I eventually got a few actual mambo songs out. But even today, after Pandora has had a night to sleep on it, and had a chance to head to the library to look up "mambo", it just played "The Lion Sleeps Tonight". This is not like Pandora!

How Pandora works

Pandora has hired scads of musicologists, who do nothing all day but review new songs. Each song has 400 or so characteristics associated with it, including no doubt the beat, tonality, dominant instruments, rhythm, as well as cultural cues.
The staff musicologists at Paindora, a poor competitor of Pandora

The job is simple from there. When Pandora gets a request for a station, it merely has to find other songs that are somewhere near the 400 dimensional block that the seed is in. As more songs are added as seeds, or additional songs are thumbs upped or thumbs downed, the size, shape, and centroid of the neighborhood is adjusted. That would be my first take anyway. I would probably throw in my ellipsoidification algorithm just because it's cool.

Pandora is not the only game in town, of course. There is also iTunes Genius, which will assemble a playlist for you from a single song seed. To do this, it needs to upload information from your own library. Presumably, Genius is crowdsourcing. They combine your playlists with those of every other Genius user to assemble reasonable playlists.

My own little contribution

Several years back, I had a completely different hobby job. I was a karaoke host, under the very original name "John the Revelator". I even had my own blog - go figger. For many gigs, I wound up having to spin some standard music as well as the karaoke. I was just a bit out of my element there... I don't feel I know all the music, especially newer stuff.

The first time I needed to do this, I asked the client for a few seed songs and I went into Pandora to see what it thought went with those songs. This worked out ok, but I had to listen to Pandora for two hours to come up with a two hour set.

I could have used Genius, but the problem was that it is limited to recommending songs that are in my own library, which was perhaps limited. I needed an application that would recommend songs for me to go out and buy. Also, Genius is limited to a single seed. If a client had two or three seed songs, I could only generate two playlists and then combine them.

So I decided to write my own version of Genius.

But first a bit about Netflix

As a math dude, I have read up a bit on how Netflix computes their recommendations. Between 1999 and 2005, Netflix assembled a database of ratings from 480,000 users of nearly 18,000 movies. Netflix decided to turn this data into knowledge by announcing in 2006 a $1M contest for an algorithm that can accurately guess which movies you might like. There were a total of 44,000 submissions for this prize, which was awarded in 2009. Wikipedia discusses the Netflix Prize.
Typical family, addicted to Netflix because of it's awesome recommendation engine

As a gross simplification, Netflix starts with the simple idea. If Joe likes Airplane and Spaceballs, and Bill likes Airplane and Blazing Saddles, then Joe might like Blazing Saddles. And if John only watches movies that feature Pandora Peaks, then he might not want to go out drinking with either Joe or Bill. With a few people in the database, the recommendations might not be all that effective, but with 3.7 zillion ratings ...

It gets a lot more complicated, but that is the gist of it. The algorithm that finally won was a combination of a number of algorithms. To my delight, my favorite technique, singular value decomposition, was part of the winning algorithm.

Back to my work

I found a few websites online that were repositories of playlists. These were streaming music players that allowed people to assemble playlists, and then share them with the full community of users. With some help from my buddy Adam, I downloaded about 30,000 playlists with a total of 15,000 songs and started data mining.

My first approach was fairly simple. Each song in the database was given a list of friends, which were songs that appeared in a playlist with that song. Each friend in the friends list had a count associated with it for every playlist that the two songs shared.

When it came time to turn song seeds into a fully grown playlist, the friends lists of all the seed songs were combined. The playlist was generated by picking random songs from this combined list. The probability of a song being picked was arranged to be proportional to the friend count for the song. 

This worked well enough to create some awesome playlists for a few parties. A few hundred hours work to create playlists of maybe ten hours total? Well worth the effort! 

Problems

I was thinking along the lines of creating an AutoDJ app that would allow someone with no knowledge or interest in music and no social skills to pretend that they were a totally cool DJ, and thus be eligible to all the social advantages implied with that position in life.

Those of you who know me are probably aware that I lack the entourage of groupies that a totally cool DJ has. Since I am no longer in the karaoke host business, I even lack the more sophisticated entourage of karaoke groupies that KJs always have. But, I am never, of course, without my entourage of incredibly sophisticated groupies that follows me around because I am an applied mathematician.
You think you got the groupies?

You guessed it, I never got that killer AutoDJ app going. There were three problems that I ran into. 

The first is that it was labor intensive to clean the playlist data. “Piano Man” might be listed with the artist name “Joel, Billy” or “Billy Joel”, or “Billie Joel”, or even “Bruce Springsteen”. The title might be listed as “Piano Man”, or “Sing Us A Song”, or “Billy Joel – Piano Man”. It was quite common for the title and the artist fields to be reversed. Ya-da, ya-da. All this could be corrected in software, but it would take a bunch of time to develop that code, and I was not all that motivated to do that.

The second problem I found was that, however attractive this idea of crowd-sourcing is, the central limit theorem does not always apply. The average of the group is not always the goal that you want. In this case, the people who upload playlists are generally in their mid-teens to mid-twenties (my guess). As such, the songs that they mix with Johnny Cash are not at all songs that would go over big at a 50th birthday party for a guy who likes real country music.

The third problem was much simpler. I didn't see that it would generate much money. And why else would an applied mathematician apply math?

I could have made a website where a person enters a few songs as seeds and the website generates a fabulous playlist. I could get money from ads. I could link up with Amazon to sell mp3s. Maybe when a playlist is built, it would provide links to buy the songs on Amazon? This might make some money, but I think the whole idea is limited by the existence of Pandora and iTunes Genius. I should add, this was many, many years ago, like maybe four? Back before everyone and their neighbor's pet hamster had a smart phone and a desire to spend trifling sums of money on hundreds of apps that they will never use.

But it's still a cool idea.

---------------------------

[1] Just to be clear, I am talking about Pandora, the music listening website, and not Pandora Peaks, the ummm, model. I can neither confirm or deny any feelings I may have or may have had for the model whose bra size is higher than my IQ.

[2] I always feel a little sad when I am obliged to tell Pandora that it is not doing a perfect job. I mean, who am I to question when it tells me what kind of music I like?!?!?




No comments:

Post a Comment