DRHA 2014-v1b

Transcription

DRHA 2014-v1b
DRHA ‘14
Songs in a Key of Fandom: Studying Fanmixes
Through the 8tracks API
Automated indexing
●
Fanmixes – a very brief (re)introduction
●
Challenges
●
Method
●
Findings
●
Future work
Fanmixes
=“og:description" content="Fanmix: A fan
made collection of songs, or an album
reflecting on a tv show, movie, character
etc...”
–
●
Meta tag from drwhofanmixes.tumblr.com
May also relate to a narrative (soundtrack to a
fanfiction, etc)
Architecture
Examples:
YouTube
8tracks
Grooveshark
Examples:
Mediafire
Box.net
Sendspace
Streaming
service (if
configured)
Storage
service (if
uploaded)
(optional elements)
Examples:
Tumblr
AO3
LiveJournal
Blog post ‘advert’
May include
cover art
Broadcast or
syndication of post
(Tumblr reblogs,
reposts, archives...)
Challenges
●
Spans multiple sites and APIs
●
Query mechanisms for streaming sites can be:
●
●
–
Slow, inflexible
–
Dedicated to one conception of task (i.e. supporting
streaming)
Example: 8tracks – limited track skips, no ‘album
content’ view
Time taken to review 5000 playlists on 8tracks:
–
~6 months non-stop!
Method
●
Approx 20% of fanmix posts provide no written playlist – just links
●
But this method is slow.
●
Some fan blogs are closed/set to private (13-30% of Tumblr samples)
●
So: query multiple data sources
●
●
Use tags and aggregators as sources. Note: Not ALWAYS conventional media
fandoms (fanmixes for driving, mermaids, gaming and zombies) – will publish stats
for normalised sample once available
Requires robust indexer to identify and extract song references from
unstructured text
●
Solution... tokenisation + binary classification via machine learning*
●
AKA...
*could also use named-entity recognition – brittle, but safe...
Example
●
●
Bayes’ (mathematical) insight: If it looks like a duck and
quacks like a duck, it’s a duck.
Before Bayes...
<b>03. The Damned - Ignite</b><br /><i>Hell for
leather in my scheme of things tonight
→ HTML DIGITS TEXT SEP TEXT HTML TEXT
●
With Bayes (trained on the FreeDB database)
→ HTML DIGITS ARTIST SEP TITLE HTML TEXT
●
Bayes is fallible but talkative about uncertainty/error
Envelopes and messages
●
●
Variation: Linguistic, Stylistic, content (lyrics;
fanfic; tertiary resources)
8tracks+tumblr or AO3 - differing
expressions of the same work
Artist – Track <3 Artist2 – Track2 <3..
Track by Artist // Track2 by Artist2 //..
CD-style
SWALK
Archive Of Our Own
(AO3)
●
~500 substantive items tagged ‘fanmix’
●
~65 ‘fan soundtracks’
●
●
~50 tagged ‘playlist’ (which tend to be embedded
within a larger work)
~40 tagged ‘mixtape’ (often a narrative theme)
Tumblr
●
Popular base for aggregation/reblogging
Sharing sites in use
(AO3)
Dark green: Harry Potter; Green: Avengers; Pink: Sherlock;
Purple: Supernatural; Light blue: Star Trek; Red: Teen Wolf
Dark green: Harry Potter; Green: Avengers; Pink: Sherlock;
Purple: Supernatural; Light blue: Star Trek; Red: Teen Wolf
Shared between fandoms
●
Ratings (ubiquitous)
●
Certain genres (e.g. ‘angst’)
●
The occasional cross-over
●
●
And provenance metadata (‘challenges’,
etc)
What about music?
Music connectivity (Teen Wolf)
Conclusion
Informally*:
●
Avengers fanmixes tend to use songs by Vienna Teng, My Chemical Romance, Kesha
●
Sherlock → The Mountain Goats, Pink Floyd and Aimee Mann
●
Teen Wolf → Coldplay, IAMX
●
Supernatural → David Bowie, Lemolo
●
Proxy for viewer demographic?
●
Has been proposed as basis for collaborative filtering
●
Ongoing: working with larger sample of ~6000 fanmixes
●
Preservation: up to 30% of origin posts gone in some samples, LJ purges, a lingering sense of loss
* health warning: statistical significance