DRHA 2014-v1b
Transcription
DRHA 2014-v1b
DRHA ‘14 Songs in a Key of Fandom: Studying Fanmixes Through the 8tracks API Automated indexing ● Fanmixes – a very brief (re)introduction ● Challenges ● Method ● Findings ● Future work Fanmixes =“og:description" content="Fanmix: A fan made collection of songs, or an album reflecting on a tv show, movie, character etc...” – ● Meta tag from drwhofanmixes.tumblr.com May also relate to a narrative (soundtrack to a fanfiction, etc) Architecture Examples: YouTube 8tracks Grooveshark Examples: Mediafire Box.net Sendspace Streaming service (if configured) Storage service (if uploaded) (optional elements) Examples: Tumblr AO3 LiveJournal Blog post ‘advert’ May include cover art Broadcast or syndication of post (Tumblr reblogs, reposts, archives...) Challenges ● Spans multiple sites and APIs ● Query mechanisms for streaming sites can be: ● ● – Slow, inflexible – Dedicated to one conception of task (i.e. supporting streaming) Example: 8tracks – limited track skips, no ‘album content’ view Time taken to review 5000 playlists on 8tracks: – ~6 months non-stop! Method ● Approx 20% of fanmix posts provide no written playlist – just links ● But this method is slow. ● Some fan blogs are closed/set to private (13-30% of Tumblr samples) ● So: query multiple data sources ● ● Use tags and aggregators as sources. Note: Not ALWAYS conventional media fandoms (fanmixes for driving, mermaids, gaming and zombies) – will publish stats for normalised sample once available Requires robust indexer to identify and extract song references from unstructured text ● Solution... tokenisation + binary classification via machine learning* ● AKA... *could also use named-entity recognition – brittle, but safe... Example ● ● Bayes’ (mathematical) insight: If it looks like a duck and quacks like a duck, it’s a duck. Before Bayes... <b>03. The Damned - Ignite</b><br /><i>Hell for leather in my scheme of things tonight → HTML DIGITS TEXT SEP TEXT HTML TEXT ● With Bayes (trained on the FreeDB database) → HTML DIGITS ARTIST SEP TITLE HTML TEXT ● Bayes is fallible but talkative about uncertainty/error Envelopes and messages ● ● Variation: Linguistic, Stylistic, content (lyrics; fanfic; tertiary resources) 8tracks+tumblr or AO3 - differing expressions of the same work Artist – Track <3 Artist2 – Track2 <3.. Track by Artist // Track2 by Artist2 //.. CD-style SWALK Archive Of Our Own (AO3) ● ~500 substantive items tagged ‘fanmix’ ● ~65 ‘fan soundtracks’ ● ● ~50 tagged ‘playlist’ (which tend to be embedded within a larger work) ~40 tagged ‘mixtape’ (often a narrative theme) Tumblr ● Popular base for aggregation/reblogging Sharing sites in use (AO3) Dark green: Harry Potter; Green: Avengers; Pink: Sherlock; Purple: Supernatural; Light blue: Star Trek; Red: Teen Wolf Dark green: Harry Potter; Green: Avengers; Pink: Sherlock; Purple: Supernatural; Light blue: Star Trek; Red: Teen Wolf Shared between fandoms ● Ratings (ubiquitous) ● Certain genres (e.g. ‘angst’) ● The occasional cross-over ● ● And provenance metadata (‘challenges’, etc) What about music? Music connectivity (Teen Wolf) Conclusion Informally*: ● Avengers fanmixes tend to use songs by Vienna Teng, My Chemical Romance, Kesha ● Sherlock → The Mountain Goats, Pink Floyd and Aimee Mann ● Teen Wolf → Coldplay, IAMX ● Supernatural → David Bowie, Lemolo ● Proxy for viewer demographic? ● Has been proposed as basis for collaborative filtering ● Ongoing: working with larger sample of ~6000 fanmixes ● Preservation: up to 30% of origin posts gone in some samples, LJ purges, a lingering sense of loss * health warning: statistical significance