Web Mining 2.0 presentation

Transcription

Web Mining 2.0 presentation
IIIA - CSIC
Mining
Music Social Networks
for Automating
Social Music Services
Claudio Baccigalupo – Enric Plaza
IIIA-CSIC – September 2007
The Goal
To automatically program the music for the channels of a Web radio,
with a selection process that emulates the knowledge of an expert (DJ)
The Goal
To automatically program the music for the channels of a Web radio,
with a selection process that emulates the knowledge of an expert (DJ)
This requires a domain knowledge about musical associations
(which songs and artists are to be played one after the other?)
We present how we obtain such a knowledge from a data mining process
on a large collection of playlists gathered from the Web
1. The Data Set: gathering playlists from Web users
2. The Data Mining: extracting musical knowledge from playlists
3. The Evaluation: comparing with other similar measures
4. The Application: programming a social Web radio
5. Conclusions
The Data Set: why playlists?
Playlists are sequences of songs compiled by humans for some
purpose, with cultural and social aspects that cannot be found
with other sources of musical knowledge (e.g., acoustic-based)
Playlists form part of that user-created content that is nowadays
more and more available, thanks to the social Web phenomenon
Playlists are easy to gather, analyse, store, and understand
Playlists have a sequential nature, and the ordering of songs is a
relevant feature since our goal is to programme a radio channel
The Data Set: which playlists?
We have collected 599,565 user-compiled playlists from the Webbased music community MyStrands (http://www.mystrands.com)
published using
a Web browser
published using
MyStrands plug-in
The Data Set: which playlists?
We have collected 599,565 user-compiled playlists from the Webbased music community MyStrands (http://www.mystrands.com)
Playlists can be obtained with the Web API called OpenStrands
Playlists have an average length of 16.8 songs
Users are 65% male, 32 years old in average
MyStrands includes more than 5M songs
1. The Data Set: gathering playlists from Web users
2. The Data Mining: extracting musical knowledge from playlists
3. The Evaluation: comparing with other similar measures
4. The Application: programming a social Web radio
5. Conclusions
The Data Mining: what to look for?
While a song X is playing on a radio channel, we wish to know
which songs are musically associated with X , and are good
candidates to be selected to play after X on the channel
We mine the playlists to learn the song association for any pair of
songs (X, Y ) and the artist association for any pair of artists (A, B)
Data
Mining
Process
Song X (Artist A)
Song Y (Artist B)
I Spy (Pulp)
Trash (Suede)
s(X, Y ) = 0.9
s! (A, B) = 0.7
I Spy (Pulp)
T.N.T. (AC/DC)
s(X, Y ) = 0.3
s! (A, B) = 0.2
s(X, Y ) ∈ [0, 1]
s! (A, B) ∈ [0, 1]
The Data Mining: what to consider?
We count the co-occurrences of pairs of songs in the playlists
I Spy (Pulp)
Trash (Suede)
occur together in 4 playlists
We normalise against the popularity of the songs in the playlists
I Spy (Pulp)
since
also co-occur 4 times, but this value is not as relevant,
occurs in 14,897 playlists, 219 times more than Trash (Suede)
Basket Case (Green Day)
Basket Case (Green Day)
We assign stronger associations when the distance between
songs is small and when the ordering is preserved
Playlist #1:
Song 2 (Blur) I Spy (Pulp) Trash (Suede)
Wonderwall (Oasis)
contiguous post-occurrence between songs
Playlist #2:
Basket Case (Green Day)
Vertigo (U2)
distant pre-occurrence between songs
Uno (Muse)
strong association
Trouble (Coldplay)
weak association
I Spy (Pulp)
The Data Mining: song associations
We filter out statistically insignificant associations, and cooccurrences between songs from the same artist
We obtain from the playlists of MyStrands a set of 112,238 songs
that have a song association degree with some other song
Top associated tracks for:
Strangers In The Night (Frank Sinatra)
Smoke On The Water (Deep Purple)
Up, Up and Away (The 5th Dimension)
Message To Michael (Dionne Warwick)
Whatever happens, I Love You (Morrissey)
Sugar Baby Love (Rubettes)
Move It On Over (Ray Charles)
It Serves You Right To Suffer ( John Lee Hooker)
Blue Angel (Roy Orbison)
Space Truckin’ (AA.VV.)
Cold Metal (Iggy Pop)
Iron Man (Black Sabbath)
China Grove (The Doobie Brothers)
Crossroads (Eric Clapton)
Sunshine Of Your Love (Cream)
Wild Thing ( Jimi Hendrix)
The Data Mining: artist associations
With the same technique, we estimate the artist association
degree for 25,881 artists from the playlists of MyStrands
We count the co-occurrences of pairs of artists in the playlists,
normalise along their popularity and consider their distances
Top associated artists for:
Abba
John Williams
Destiny’s Child
Frank Sinatra
Agnetha Faltskog
A-Teens
Chic
Gloria Gaynor
The 5th Dimension
Andy Gibb
Olivia Newton-John
Meco
Danny Elfman
John Carpenter
London Theatre Orchestra
John Barry
Hollywood Studio Orchestra
Elmer Bernstein
Kelly Rowland
City High
Ciara
Fantasia
Christina Milian
Beyoncé
Ashanti
Dean Martin
Sammy David Jr.
Judy Garland
Bing Crosby
The California Raisins
Tony Bennett
Louis Prima
1. The Data Set: gathering playlists from Web users
2. The Data Mining: extracting musical knowledge from playlists
3. The Evaluation: comparing with other similar measures
4. The Application: programming a social Web radio
5. Conclusions
The Evaluation: preamble
We compare the top associated tracks and artists found with the
most similar tracks and artists proposed by different Web sites
MusicSeer
The results will be distinct since we do not look for a similarity
(symmetric measure) but for building a good sequence of songs
(asymmetric, the ordering matters)
Still, some observations can be made
The Evaluation: song association
We assign the highest rankings to songs which are less popular
If one of these songs is contained in the radio library, it will be
played, thus the listeners will probably discover new music
Otherwise, a less associated/more popular song will be played
Top associated songs for:
Strangers In The Night (Frank Sinatra)
Up, Up and Away (The 5th Dimension) Message To Michael (Dionne Warwick) Whatever
happens, I Love You (Morrissey) Sugar Baby Love (Rubettes) Move It On Over (Ray Charles)
It Serves You Right To Suffer ( John Lee Hooker)
Blue Angel (Roy Orbison)
Yahoo!
Mr. Tambourine Man (The Byrds) Don’t You Want Me (Human League) I’m a Believer
(The Monkees) Good Vibrations (The Beach Boys) Stay (Shakespeare’s Sister) The House of The
Rising Sun (The Animals)
Oh Pretty Woman (Roy Orbison)
The Evaluation: artist association
Some high-ranked associations are common, although inferred
with different methods (human experts, playlists, listening habits)
We are able to spot out first one of the most associated artist
Top associated artists for:
Abba
Agnetha Faltskog A-Teens Chic Gloria Gaynor The 5th Dimension Andy Gibb
MyStrands
AMG
Yahoo!
Last.fm
Olivia Newton-John
Donna Summer Madonna Gloria Gaynor Cyndi Lauper Blondie Kool & The Gang
Ace of Base
Gemini
Maywood Bananarama Lisa Stansfield Gary Wright
The Bee Gees The Carpenters The Beatles Foreigner Whitney Houston
Roxette
Madonna
The Bee Gees Madonna Cher Kylie Minogue Boney M. Michael Jackson Elton John
MusicSeer Playlists
The Bee Gees Blondie Cyndi Lauper Queen Cat Stevens Cher The Beach Boys
1. The Data Set: gathering playlists from Web users
2. The Data Mining: extracting musical knowledge from playlists
3. The Evaluation: comparing with other similar measures
4. The Application: programming a social Web radio
5. Conclusions
The Application: what is Poolcasting?
The Application: song scheduling
The collection of songs (Music Pool) is open and dynamic
The music played on each channel cannot be pre-programmed,
every channel is automatically scheduled in real time
Last song played X
Song and Artist Associations
Retrieval
Music Pool
Subset of candidates
musically associated with X
The Application: retrieval process
The best candidates are songs either associated with X, or
associated with songs by A, or associated with songs from artists
associated with A, or whose artist is associated with A
Last song X (A)
I Spy X
(Pulp)
Song and Artist Associations
s(X, Y )
s! (A, B)
Retrieval
Music Pool
Cody (Mogwai)
Drive (R.E.M.)
Uno (Muse)
Nikita (Elton John)
Noon (Eric Serra)
Trash (Suede)
Go (Moby)
T.N.T. (AC/DC)
Pilgrim (Enya)
Roxanne (Sting)
Candidates
Uno (Muse)
Go (Moby)
Drive (R.E.M.)
Trash (Suede)
The Application: reuse process
The best candidates are then ranked according to the music
preferences of the current listeners, and the best song is played
Listeners preferences are inferred analysing their music libraries
Last song X
X (A)
I Spy X
(Pulp)
Song and Artist Associations
s(X, Y )
s! (A, B)
Candidates
Uno (Muse)
Go (Moby)
Retrieval
Drive (R.E.M.)
Trash (Suede)
Music Pool
Ranking
Listeners
Preferences
Feedback
the best ranked
candidate is played next
The Application: more details
The higher the rating and the higher the play count of a song in
a user library (iTunes), the higher the inferred listener preference
Listeners can interact via the Web interface to state their explicit
preferences for the songs played or to rate the next candidates
When listeners have diverging preferences in the same channel,
fairness is achieved by favouring at each moment those listeners
who were less satisfied by the last songs played
1. The Data Set: gathering playlists from Web users
2. The Data Mining: extracting musical knowledge from playlists
3. The Evaluation: comparing with other similar measures
4. The Application: programming a social Web radio
5. Conclusions
Conclusions
We use knowledge discovered from a Web-based music
community to provide a group-customised Web service
Domain knowledge about which songs and artists are musically
associated originates from the data mining of patterns of songs
in a large set of playlists compiled by MyStrands users
The result is a social Web radio where channels are
automatically programmed in real time to match both musical
associations criteria and the preferences of the current listeners
Future work: evaluate the quality of the associations, and extend
the data mining process to include patterns of three or more songs
IIIA - CSIC
ANY QUESTION?
Mining
Music Social Networks
for Automating
Social Music Services
Claudio Baccigalupo – Enric Plaza
IIIA-CSIC – September 2007