What`s that song?

Transcription

What`s that song?
What’s that song?
Automated music recognition
technologies for live music and DJs
Teosto research report 1/2013
Helsinki, 28.3.2013
This report details the results of Teosto’s research project Music copyright in the 2010s
(January 2012 – March 2013) that focused on automated music recognition and broadcast
monitoring technologies, and how they can be applied to the monitoring and reporting
processes of a music performance rights society. The project was partly funded by the Finnish
Ministry of Education and Culture.
Teosto wants to thank everyone that participated in the project for their contributions: the
artists, bands, songwriters, performers and their organisations (PMMP, Nightwish, Notkea
Rotta, Darude, K-System, Orkidea, Riku Kokkonen), The Ministry of Education and Culture;
Alex Loscos, Johannes Lyda and the team at BMAT; Karim Fanous and Leonardo Toyama
at Music Ally; Teppo Ahonen, Anne Kosonen at Taloustutkimus, Erno Kulmala at YLE,
Provinssirock, and everyone involved in the project at Teosto.
Ano Sirppiniemi, Head of Research, Teosto
Turo Pekari, Researcher, Teosto
About Teosto
Finnish Composers’ Copyright Society Teosto is a non-profit organisation founded in 1928 by
composers and music publishers to administrate and protect their rights. Teosto represents
approximately 27,000 Finnish and almost three million foreign composers, lyric writers,
arrangers and music publishers.
Teosto’s research activities include market research with clients and research partners,
looking into new technologies and their applications, partnering with companies and
research partners for joint research projects, and working with Teosto’s extensive data on
music use in Finland.
Contact: Ano Sirppiniemi, Head of Research
([email protected] , tel. +358 9 6810 1287, mobile: +358 50 325 6530)
More information: www.teosto.fi/en
Cover photo: Turo Pekari
Contents
1. Executive summary
2. Project outline and structure
4
6
3.
The global music recognition and broadcast monitoring market
9
2.1 Project description
2.2 Partners
2.3 Project timeline
2.4 Project deliverables
2.5 Communication of project results
2.6 Structure of the report
3.1 Automated music recognition
3.2 Business landscape
3.3 Business to business (B2B)
3.4 Business to consumer (B2C)
3.5 Companies developing technologies for their own use
6
6
6
7
7
8
9
10
11
12
13
4.
Live music identification pilot in Provinssirock
14
5.24DJ/Club monitoring pilot
24
6. Consumer survey
32
7. State of the art in Music Information Retrieval: what could 34
34
34
35
37
38
40
41
4.1 BMAT system description
14
4.2 Pilot scenario
14
4.3 Identification process
14
4.4 Test case results
15
4.4.1 PMMP
15
4.4.2 Nightwish
18
4.4.3 Notkea Rotta21
4.5. Evaluation of the results
23
4.6 Possible use scenarios
23
5.1 Identification technology
24
5.2 Pilot setup
24
5.3 Test case results
26
5.3.1 K-System
26
5.3.2 Orkidea
27
5.3.3 Riku Kokkonen
28
5.3.4 Darude29
5.4 Evaluation of the results
31
5.5 Possible use scenarios
31
6.1 Background
6.2 Crowdsourcing potential
6.3 Consumer interest in an interactive online setlist service
6.4 Conclusions
be applied for copyright management? (Teppo Ahonen)
7.1 Music Information Retrieval background
7.2 State of the research
7.3 Current trends
7.4 Audio music retrieval
7.5 Cover song identification
7.6 Conclusions
7.7 Bibliography
8. Summary
of project results
43
9. Conclusions
45
10. References
32
32
32
33
43
45
47
1. Executive summary
Teosto’s research project Music copyright in the 2010s
(January 2012 – March 2013) aimed to shed light on
the effects of new and emerging technologies on the
administration of music copyrights. The main focus
of the project was on music recognition and broadcast
monitoring technologies: the global market for these
technologies and how they could be used in new areas
such as automated live music identification. During the
project we also reviewed relevant academic research on
the subject. The project was partly funded by the Finnish
Ministry of Education and Culture.
The project consisted of four parts:
1.A market study on the European and global music recognition and broadcast monitoring market
2.
Background research on recent and relevant academic studies in the field of Music
Information Retrieval (MIR), a field of research that frequently provides the technology innovations for the music recognition market
These technologies are already successfully being used
for monitoring music use in radio, broadcast TV, and a
number of digital music and video services, by broadcast
companies and service providers, music publishers,
record companies, artists and collective management
organizations. There are a number of international
companies operating in the music monitoring field that
provide monitoring services to businesses and copyright
societies. Also, companies such as Google (YouTube) and
Last.fm have proprietary content recognition systems in
place for their own use that they don’t license to others.
4.A consumer survey on the potential of crowdsourcing live concert set list information
from audience members of live shows
In addition to the growing business-to-business market
for automated music recognition solutions, there are
also a few notable consumer applications based on the
same technologies, such as Shazam and SoundHound,
mobile music identification apps that both have over a
100 million users, and claim to be efficient in converting
the “tagging” of songs by consumers to actual music
download sales for their partnering online music stores
and services.
There’s a growing market for automated music
recognition technologies, and a huge potential in
employing them for managing music rights. In
the coming years we will see the adoption of these
technologies also in new areas and domains, such as live
music and the club/DJ scene, in addition to the already
established markets such as broadcast music monitoring.
While there are a number of technological challenges
to overcome, in order to improve the quality of the core
The focus of this research was not on broadcast
monitoring, but on two new application areas for
automated music recognition systems: clubs and DJs,
and live music. Both these domains are technologically
more challenging than broadcast music monitoring,
and remain relevant research problems for academic
researchers as well. The computational complexity
present in trying to match tracks played and manipulated
by a DJ to an original recording, or trying to match a live
3.
4
identification technologies, and more importantly, in
order to ensure a smooth exchange of information and
metadata between technology providers and users,
automated music recognition technologies have a
strong potential for providing means through which
to monitor the entirety of the music performed in any
given territory, across radio and television broadcasts,
live shows, as well as night clubs and other public
environments.
Two separate technology pilots: one on automated live music identification at a Finnish
rock festival, and another on monitoring music
use in a DJ/club setting
version of a song to an original recorded version of that
song, is well beyond that of identifying and reporting
broadcast music. Compared to a recorded version of
a song, a live version can be in another key or tempo,
might have a totally different instrumentation or song
structure, and correct identification can be further
complicated by things like audio quality and audience
noise.
From academic research, we were aware that algorithms
and solutions exist for matching live (or cover) versions
of songs to a reference database including recorded
versions of the songs. However, there are to our
knowledge no commercial services available for live
or cover version identification. Thus our aim in this
project was to test some of the existing solutions in a
real live event setting in order to evaluate their quality,
to gather data about the process, and to prepare possible
and potential use cases or scenarios for live music
identification systems from the point of view of a music
performance rights society.
The two technology pilots were successful in providing
proof that automated music recognition services can,
in addition to broadcast music monitoring, already be
used in a club environment for identifying and reporting
music played by DJs, and in 1-2 years time possibly also
in a live music setting for automated set list creation.
Evaluating the pilot results, Teosto also identified three
potential use cases for the piloted music identification
technologies that were developed by the Spanish music
technology company BMAT.
way to automatically match the identification results to
relevant metadata – in the case of performance rights
societies like Teosto, to the relevant author and publisher
information for each musical work. In addition, the tested
live music identification and club/DJ monitoring systems
also had certain technical limitations that need to be
improved upon to ensure reliable results.
We approached the technology pilots from the point
of view of a music performance rights society, but the
general results are applicable for other users, such as
live event organizers, club owners, publishers, and artist
organizations. These organizations could come up with
other use scenarios for the automatically generated club
or live show set list data than what was devised in this
project, such as services based on real time track data
from live shows and/or clubs.
The consumer survey showed that among active Finnish
music fans who frequently attend live shows, there is
an interest towards using interactive services focused
around gig set lists. However, the potential user base is
very small, and from the point of view of Teosto, using
crowdsourcing to collect set list information from
audience members on a large scale is currently not a
viable alternative to manual reporting and/or automated
reporting technologies. However, using information
gathered from fans and audience members to verify
automatically generated set lists could be a possibility for
improving the accuracy of automated set list creation in
the future.
The technology pilots, while successful from a proofof-concept point of view, did also point out a number of
challenges and limitations that need to be solved before
adopting the technologies for large scale use. The main
challenge for all automated music recognition systems to
work is twofold: in order to work in an efficient way, they
need a representative reference audio database that is
constantly updated, and there also needs to be a reliable
5
2. Project outline and structure
2.1 Project description
This report details the results of Teosto’s research project
Music copyright in the 2010s (January 2012 – March
2013) that focused on music recognition and broadcast
monitoring technologies, and how they can be applied
to the monitoring and reporting processes of a music
performance rights society. The project was partly funded
by the Finnish Ministry of Education and Culture.
The project consisted of four parts:
1) A market study on the European and global music
recognition and broadcast monitoring market
2) Background research on recent and relevant academic
studies in the field of music information retrieval (MIR), a
field of research that frequently provides the technology
innovations for the music recognition market
3) Two separate technology pilots: one on automated
live music identification at a Finnish rock festival, and
another on monitoring music use in a DJ/club setting
4) A consumer survey on the potential of crowdsourcing
setlist information from audience members of live shows
2.2 Partners
The technology pilots were made possible by the
participating artists, performers, songwriters, crew
members and artist organizations that gave us their
consent for participating in the research: PMMP,
Nightwish, Notkea Rotta, Darude, K-System, Orkidea
and Riku Kokkonen.
Work on different parts of this project was carried out
on Teosto’s behalf by three organizations specialized
in technology and research: BMAT, Music Ally and
Taloustutkimus. Researcher Teppo Ahonen provided
the project with a review of relevant academic research
on the subject area.
BMAT (Spain) is a music technology company operating
globally since 2006. The company specializes in providing
music monitoring services, servicing more than 30
performing rights organizations and collecting societies.
BMAT’s monitoring network, present in more than 50
countries, listens across more than 2,000 radios and
channels every day. BMAT also services singing rating
and music recommendation technologies to companies
such as Samsung, Yamaha, Intel and Movistar. (http://
bmat.com)
2.3 Project timeline
The project started in January 2012 and ended in March
2013. The main project activities are outlined in are
outlined in Table 1.
6
Music Ally (UK) is a digital music business information
and strategy company that has been providing
publications, consulting, research, events, and training to
the music and technology industries since 2001. (http://
www.musically.com)
Taloustutkimus Oy (Finland) established in 1971, is a
privately owned market research company, and currently
the second largest market research company in Finland.
Teppo Ahonen (M.Sc.) is currently finishing his Ph.D.
in computer science at University of Helsinki. In his
work, he focuses on measuring tonal similarity with
information theory based metrics.
We also want to thank Provinssirock (http://www.
provinssi.fi) and YLE /Erno Kulmala for their cooperation and help during the project.
Project task
Start
End
Partners
Project administration
Jan 2012
Mar 2013
Teosto
Technology pilot 1: Live music identification Feb 2012
Sep 2012
BMAT, Teosto, Provinssirock, YLE, PMMP, Nightwish, Notkea Rotta
Technology pilot 2: Clubs/DJs
Oct 2012
Mar 2013
BMAT, Teosto, Orkidea, Darude, K-System, Riku Kokkonen
Consumer survey: Crowdsourcing of set lists
Oct 2012
Jan 2013
Taloustutkimus Oy, Teosto
Market study: Music recognition and broadcast monitoring market
Sep 2012
Dec 2012
Music Ally, Teosto
Background research: relevant academic
research
Aug 2012
Dec 2012
Teppo Ahonen (University of Helsinki)
Table 1. Project timeline.
2.4 Project deliverables
The deliverables of this project include (in addition
to this project report) the results reports of the two
technology pilots by BMAT, market research reports
by Taloustutkimus and Music Ally, as well as two
background articles on relevant academic research and
evaluation of the live music pilot by Teppo Ahonen. The
project deliverables are listed in the table below.
Deliverable
DateType
Author
Teosto – BMAT Vericast Covers Pilot Report
29.8.2012
Technical pilot results report
BMAT
Teosto – BMAT Vericast Clubs Pilot Report
21.3.2013
Technical pilot results report
BMAT
Analysis of the Automatic Live Music Detection Experiment
17.12.2012
Research article
Teppo Ahonen
State Of The Art In Music Information Retrieval: What Could Be Applied For Copyright Management
17.12.2012
Research article Teppo Ahonen
Music recognition and broadcast monitoring market research
18.12.2012
Market research report
Music Ally
Teosto ry – biisilistapalvelututkimus
28.1.2013
Market research report
Taloustutkimus
Project final report
28.3.2013
Project final report
Teosto
Table 2. List of project deliverables.
2.5 Communication of project results
The project results have been presented in two research
seminars for Finnish music industry professionals in
January and March 2013. The seminar presentation
materials are available online on the project seminar
website and Teosto’s Slideshare account (Teosto
presentations). This final report will also be distributed in
pdf form on project/final seminar website in April 2013.
In addition to the above, Teosto has presented the findings
of the live music identification pilot in a Finnish music
industry event MARS (http://www.marsfestivaali.fi) in
Seinäjoki in February 2013, and published a press release
on the live music identification pilot in January 2013.
The two project seminars were arranged in January and
March. The first seminar focused on the technology pilot
results and the second seminar was the final results
seminar for the project. The two invited keynote speakers
for the final seminar were Karim Fanous (Head of
research, Music Ally) and Alex Loscos (CEO, BMAT).
7
Seminar
DateType
Location
Musiikin tekijänoikeudet 2010-luvulla. 31.1.2013
Pilot results seminar
Ennakkoinfo projektin tuloksista.
Erottajan Kasino, Helsinki
Technology, Music Rights, Licensing
Finlandia Hall, Helsinki
21.3.2013
Project final seminar
Table 3. List of project seminars.
Seminar presentations and other material related to the project are listed in the table below.
Presentation/other
DateType
Author
Teosto and BMAT carry out a pioneering live music identification pilot in Finland
26.1.2013
Press release
Teosto
Teosto ja BMAT kehittävät ensimmäisenä maailmassa livekeikkojen automaattista musiikintunnistusta
26.1.2013
Press release
Teosto
Musiikintunnistuspalvelut - markkinakatsaus Mikä biisi tää on? Musiikintunnistusta livekeikoilla.
31.1.2013
Presentation
Ano Sirppiniemi
Livepilotin toteutus ja tulokset
31.1.2013
Presentation
Turo Pekari
Biisilistoja yleisöltä? Crowdsourcingin mahdollisuudet Suomessa. Kyselytutkimuksen tulokset
31.1.2013
Presentation
Turo Pekari
Mikä biisi tää on? Musiikintunnistuspilotti Provinssissa
Emerging Technologies: Teosto’s Live Music
8.2.2013
Presentation Ano Sirppiniemi, Turo Pekari
Recognition and DJ Club Monitoring Pilots
21.3.2013
Presentation
Turo Pekari,
Alex Loscos (BMAT)
State Of The Art In Music Information Retrieval 21.3.2013
Presentation
Research: Applications For Copyright
Teppo Ahonen
(University of Helsinki)
Majority Report: Visions On the Music Monitoring Landscape - Alex Loscos
21.3.2013
Presentation
Alex Loscos (BMAT)
Keynote: Issues In the Music Rights Value Chain
21.3.2013
Presentation
Karim Fanous (Music Ally)
Table 4. List of project presentations and press releases.
2.6 Structure of the report
This report details the findings of the research project.
Chapter 3 focuses on the automated music recognition
and broadcast monitoring market, looking at the
market structure and key companies operating in the
market. Chapters 4 and 5 detail the results of the two
technology pilots on live music identification and club/
DJ monitoring, as well as the identified use scenarios for
both piloted technologies. Chapter 6 lists the key findings
of the consumer survey.
Chapter 7 is a background article on academic MIR
research, State Of The Art In Music Information Retrieval:
What Could Be Applied For Copyright Management by Teppo
Ahonen.
8
Chapter 8 summarizes the key project results, and
conclusions about the results are presented in Chapter 9.
Chapter 10 is a list of references.
This report summarizes and extends on the results
material listed above in Table 2 (Project deliverables).
Sources and references for each chapter are listed at the
end of each chapter.
3. The global music recognition and broadcast monitoring market
Automated music recognition services are already widely
used in different fields of the music industry, including
author societies and performance rights organizations,
for tasks such as monitoring broadcast content in
order to carry out music reporting. As the traditional
process of reporting broadcast music to the performance
rights organizations –with producers and broadcasters
providing cue sheets to the organizations - is known to
have difficulties with accuracy, speed, and sometimes also
the amount of manual work required, automated music
identification technologies could be a solution for making
royalty payments more efficient. At the same time they
could give societies an advantage in the increasing
competition between performing rights societies globally.
The music identification business is expected to grow
within the next couple of years, when more and more
performance rights organisations are starting to adapt
new technologies for gathering music usage data. The
growing number of competition in this field of business
is making the use of new technology more cost effective
and attractive. Interestingly, despite the fact that most
music recognition technologies operate in similar ways
(based on so called acoustic fingerprinting), most of the
companies that offer music recognition services have
set up their operations in a way that would differentiate
them from the competition.
To identify the content in music recognition services,
metadata must be attached to the digital content. In
consumer services like Shazam, metadata is focused on
the recording (e.g. track title, artist, album title and cover
image). Companies providing services to performance
rights organisations must be able to set up a more
complex metadata scheme, including or providing ways
to include information on the composers and publishers
of the musical works. This requires partnerships and
close collaboration between service providers and the
organisations.
The London based research company Music Ally carried
out a music recognition technology market research
for Teosto in December 2012, and the following market
overview is based on the findings of their report.
3.1 Automated music recognition
The advent of digital media, machine readable databases,
high-bandwidth computer processing power and
internet communications is widely considered to have
an enormous potential for improving the reporting of
music usage in the broadcasting sector, both in terms of
accuracy and speed.
One of the earlier digital technologies developed for
the purpose of music recognition is watermarking. The
system involves embedding an identification tag in the
inaudible spectrum of digital music files, for them to be
later identified through specialised software monitoring
those specific frequencies. This system did not prosper,
as it failed to identify re-encoded files, and proved too
complicated to implement in an ecosystem with a wide
range of codecs and compression settings in place.
Nevertheless, research in alternative music recognition
methods continued, and now several companies provide
services in this space. While their range of products,
clients and partnerships vary greatly, the core mechanics
of their technologies are actually very similar, revolving
around the concept of digital acoustic fingerprint matching.
The following are the main elements involved in the
process:
1) Acoustic fingerprinting
Via means of an algorithm (or a sum of algorithms),
a computer program generates a condensed digital
summary of an audio signal. In order for the system to
work, the program must be capable of distinguishing
between any two different pieces of content, generating
a unique digital picture for each and every one of them –
this being the reason why they are called ‘fingerprints’.
A fingerprint consists of the whole duration of the audio
signal, in order for it to be possible to identify a piece of
content at a later stage from any given sample.
2) Data collection
A database of acoustic fingerprints is stored by a company
or body for the purpose of comparing existing and new
fingerprints. Each fingerprint has a unique code assigned
and linked to metadata describing the content. In general
terms, the bigger the database is, the wider the repertoire
the system will be capable of identifying. More often than
not, in the case of music recognition, acoustic fingerprint
databases are stored in an online server.
3) Recognition
A computer program takes a fingerprint of an audio
signal’s sample, which is then compared against
a fingerprint database. Upon finding a match, the
corresponding metadata is provided in order to identify
the content.
While most music identification services are based on
9
these principles, there are some differences in their
capabilities, the most notable being the capacity for
background recognition. This involves being able to
identify a piece of music within a signal overlaid with
redundant audio. Examples of this include a radio DJ
speaking over the song, or bar noise mixed with music
coming out of a speaker.
One of the most intricate complexities in the development
of music recognition programs is the fact that the system
must be robust enough to accurately pinpoint any piece
of music amidst tens of millions of songs, yet also flexible
enough to associate the same fingerprint to very different
samples of the same music work. This is so because most
audio compression formats will generate very different
digital files of the very same recording, and a fingerprint
algorithm that is too strict in its identification would not
associate them with the same music work.
Despite these complexities, several fingerprint systems
have been successfully developed, and the general
consensus is that the overall field is mature enough, with
most technologies available today providing rather high
degrees of accuracy.
4) Metadata
In general terms, and for the purpose of this research,
metadata can be defined as a small piece of data attached
to digital content in order to describe said content. The
type of metadata attached to the identified song varies
according to the end-user of the music identification
technology.
In the case of services such as Shazam, the metadata
focuses on the recording (i.e. track title, artist, album
title and cover image). This can be somewhat more
complicated when the identification is provided for the
music publishing sector, as performance rights societies
require information regarding the songs’ composers
and publishers. In turn, this means that for music
identification companies to provide their services to
performance rights societies they first need to collaborate
in setting up the set of rules required of the medatada.
5) Broadcast monitoring
The basic music recognition system described can
see a fifth component added in the space of broadcast
monitoring, consisting of the means to input
transmissions into the fingerprinting system. From a
purely computational point of view, this is a much more
elemental mechanism, consisting at its most basic of an
aerial receiver in the case of over-the-air broadcasts,
and digital conversion software in the case of webcasts,
both of which feed the transmissions into the fingerprint
algorithm.
Depending on the monitored market, the logistics of the
over-the-air system can vary from having a radio receiver
plugged to a computer system running the identification
software, to a whole network of standalone receivers
spread throughout a territory and feeding a centralised
server which, in turn, analyses all of the collected
broadcasts.
3.2 Business landscape
Interestingly, in the music recognition and broadcast
monitoring business, most companies operating in this
area have carved their respective niches separate from
each other, despite the fact that their core technologies
operate in very similar ways.
In order to better understand the market’s segmentation,
the market can be separated into business-to-consumer
B2B
(B2C) and business-to-business (B2B) segments. Apart
from these, also other, more prominent corporations
operate to a certain extent in the music identification
space. Acoustic fingerprinting is not at the core of
their business and/or they do not license their music
recognition technology to third parties. This is the case,
for instance of Last.fm and Google.
B2C
Figure 1. Segments of the music recognition and broadcast monitoring market.
Each of these segments is briefly outlined below.
10
Companies developing
technologies for their own use
3.3 Business to business (B2B)
Focus here will be on business-to-business services
that license their technology to third parties, since
these services are the logical partners for performance
rights organisations. Music Ally identified 13 prominent
companies in this category, that also offer their services
in Finland:
The business-to-business segment can be further divided
into four different fields. Each of the B2B companies can
operate in one or several of them at the same time. These
categories will be shortly described here, with examples
of relevant companies /service providers.
1) Music identification and broadcast
monitoring for authors and publishers
Service providers in this space are paid by authors,
composers and publishers to quantify the use of their
works by broadcasters. Clients would supply their
contents to the service provider in order to ensure that its
fingerprint database contains all the works that need to
be identified.
Examples: TuneSat, Kollector
2) Music monitoring and broadcast
monitoring for PROs
Service providers in this space are paid by performing
rights organisations to quantify the use of the works of
their members by broadcasters. Typically, the performing
rights organisations would supply their contents to
the service provider in order to ensure that the latter’s
fingerprint database contains all the works that need to
be identified. The service providers supply reports to the
performing rights organisation in the agreed format (for
example composition code, author code, rights owner)
and frequency (for example daily, monthly, annually).
Examples: BMAT, mufin, Nielsen and Soundmouse
3) Identification and cue sheeting for broadcasters
Service providers in this space are paid by broadcasters
to identify the music used on their shows, and generate
the corresponding cue sheets for the broadcaster to
deliver to performing rights organisations, as per the
latter’s guidelines. Service providers need to have robust
fingerprint databases, which is why they often approach
rights holders in order to secure the data directly from
these owners.
Examples: Soundmouse, TuneSat
4) Providing identification technologies /
technology platforms
Companies in this space develop acoustic fingerprinting
algorithms and/or fingerprint databases, in order to
license them to third parties whom, in turn, use them for
Audible Magic (USA)
mufin (Germany)
BMAT (Spain)
Nielsen (USA)
Civolution (Netherlands)
Rovi (USA)
DJ Monitor (Netherlands)
Soundaware (Netherlands)
The Echo Nest (USA)
Soundmouse (UK)
Gracenote (USA)
Tunesat (USA)
Kollector (Belgium)
Table 5. A list of companies that operate in the music recognition
and broadcast monitoring market, and offer their services also
for third parties, in Europe.
developing their own B2B or B2C services.
Examples: The Echo Nest, Gracenote
For most practical purposes, performing rights
organisations use music identification companies in
a similar way that authors and publishers do (1 & 2).
Focusing on service providers for these two fields,
probably the most widely known company that offers
services for both is the US company Nielsen (founded in
1923), one of the worlds’ biggest research corporations
with 5,5 billion USD in annual revenues. Nielsen currently
offers TV, radio and internet monitoring services with
its Nielsen Music and Nielsen BDS brands. Apart from
publishers, authors and PROs, Its key clients include also
radio and TV networks and labels. Nielsen outsources its
fingerprinting technology and the current technology
provider has not been publicly announced.
Another significant service provider for authors,
publishers and PROs is the Barcelona-based BMAT. The
company was established under the umbrella of the
Music Technology Group of the Universitat Pompeu
Fabra in 2006, and its core product is Vericast airplay
monitoring and acoustic fingerprinting service. BMAT
monitors over 2000 radio and TV channels worldwide,
and offers an end-to-end service to its clients with all
technology developed in-house. Other products include
the music personalization service Ella. BMAT was
Teosto’s research partner for this research, piloting an
automated cover version recognition technology at a live
rock festival in summer 2012. However, this technology is
not in production yet.
Broadcasting companies are buying identification
services for generating cue sheets for PROs (3). For
example, London-based Soundmouse offers this kind of
service to BBC from monitoring to delivering cue sheets
to PROs for BBC. Soundmouse is a privately owned
company and uses in-house technology.
11
Some music identification companies also outsource
technology from partners. A number of companies are
specialized in licensing technologies to third parties,
who then develop their own B2B or B2C services. The US
company Rovi is one of the leading technology providers
in this field with 691 million USD in revenues (2011). Its
clients include manufacturers, service providers and
app developers. For music identification, Rovi has both
a music metadata service with 3 million albums and 30
million tracks, as well as a media recognition service.
Many consumer services, including SoundHound, use
Rovi’s metadata to provide cover images and recording
details to the end users. It’s also notable that Rovi also
outsources some of its fingerprinting technology - from
another important technology provider, US company
Audible Magic.
Another example of a metadata /music recognition
analytics platform is The Echo Nest, established in 2005.
Echo Nest is one of the big metadata technology providers
for app developers, with over 350 music apps built on
their platform. The Echo Nest database is an open source
API that provides tools for playlisting, taste profiling
etc., and they have a 34 million songs and 1,12 trillion
data points in it. They provide the software for free and
make money by providing support for it. Because of its
open nature, the Echo Nest is used more actively by third
party developers than any other platform in the field, but
their strengths could lie more in their developer-friendly
approach than their fingerprinting technology.
3.4 Business to consumer (B2C)
The widespread adoption of smartphones with 3G
connectivity and the growth of mobile application
ecosystems –such as Apple’s App Store and Google’s
Play Store- have had a huge impact across the hardware,
software and content industries since the launch of
Apple’s iPhone in 2007. This has in turn also driven
innovations in the field of music recognition.
Increasing mobile computational power and online
connectivity means that now a smartphone user can take
a sample (or fingerprint) of an acoustic audio signal (such
as music being played at a bar or a store), upload it to a
fingerprint database server for matching, and instantly
receive metadata with details of the identified content.
The popularised term describing this process is called
‘tagging’.
The best known companies operating in the B2C music
recognition area are Shazam and SoundHound. They both
offer mobile applications available on the iOS, Android,
Blackberry and Windows Phone platforms. The typical
business models of consumer music recognition services
include mobile software licensing, service subscription
fees, a commission on the sale of linked music services, as
well as advertising.
Shazam is originally a US based company, formed in 1999.
Shazam was the first B2C company providing music
recognition services to mobile phone users, and has been
downloaded to 250 million mobile devices since then. The
American performing rights society BMI bought Shazams
12
technology in 2005 to set up its own music identification
service (Landmark Digital Services). Since then, Shazam
has re-acquired the technology back from BMI, but
the society remains a shareholder in the company. The
company has headquarters in London and is privately
held.
Shazams products are two different mobile apps with
unlimited tagging: free with ads and paid version,
depending on the ecosystem. Shazam apps offer clickto-buy and streaming links, lyrics display and 30 second
streamed previews of the identified songs. Becoming
the first mass-market app in this field has been a benefit
for Shazam, although its competitor SoundHound’s
technology is considered to be more innovative.
Since late 2010, Shazam has been expanding beyond
its core music business, and into developing a range of
second-screen implementations for its audio recognition
technology. The move has seen the company seeking
to boost advertisers, brands and video content owners’
engagement with audiences, delivering promotional
information, coupons and other materials to viewers
using the Shazam mobile app.
The company has repeatedly expressed its interest in
cementing its position in these new areas, and currently
describes itself as “the best way to discover, explore and
share more music TV shows and ads you love”. However,
Shazam’s music business is still its largest source of
revenues.
While Shazam has actively sought and publicised content
and advertising partnerships, the company has also
aimed to keep its technology development in house.
Shazam outsources metadata from Rovi.
Shazam benefits from its widespread adoption, resulting
more from being the first mass-market smartphone
product of its kind, than from any purely technological
superiority. Another factor which has driven Shazam’s
leading position in the B2C market has been the
company’s expansion towards the second-screen
space, maintaining its own brand, rather than merely
licensing its technology. This has not only helped its
association with music through partnerships with the
likes of American Idol and the Grammy Awards, but
also furthered its novelty profile by providing a new
interactive experience in general TV and advertising
viewing.
The other popular B2C company, California-based
SoundHound, started as the music recognition service
Midomi in 2007. Midomi was later rebranded to
SoundHound in 2009. SoundHound claims to have more
than 100 million app downloads.
SoundHound offers three different applications: free with
ads, ad-free (free or paid, depending on the ecosystem)
and a free version called Hound, that enables a voice
search for songs. Like Shazam, the apps offer click-to-buy
and streaming links to the identified songs.
SoundHound’s technology differs from Shazam’s in its
capability to identify songs not only from recordings, but
also from humming or singing. This feature and faster
identification process are the reasons its technology is
believed to be superior to Shazam’s, although Shazam is
more widely used. Like Shazam, SoundHound outsources
its metadata from technology provider Rovi.
3.5 Companies developing technologies for their own use
Some well known companies are big players in music
identification but not as their core business. Google uses
fingerprinting technology for monitoring UGC content in
YouTube and has its own identification widget in Google
Play music store. Streaming radio service Last.fm added
identification feature to its Audioscrobbler technology
in 2007. Audioscrobbler monitors music consumption
in media players, online stores and streaming services.
Unlike companies in other segments, Google and Last.fm
do not licence their technology to others and they do not
offer services for PROs or broadcasters.
This chapter is based on the report “Music recognition
and broadcast monitoring market research” by Music Ally
(2012).
13
4. Live music identification pilot in Provinssirock
First of the two technology pilots was carried out in
June 2012 in the Provinssirock (http://www.provinssi.
fi) rock festival in Seinäjoki, Finland. The purpose of this
pilot study was to test and evaluate BMAT’s live music
identification technology Vericast Covers in a live setting,
with three Finnish bands from three different genres.
The live shows were recorded in two versions: one from
the mixing desk and one from the audience, in order to
be able to determine whether audio quality would have
an effect on the identification results. The live recordings
were compared to a reference audio set that consisted of
the whole recorded catalogue of each band.
4.1 BMAT system description
An automatic cover song identification application
is a system that takes a piece of music as input and
distinguishes if the piece is a cover version of one of
the songs included in the reference database. Here, the
term “cover” has a broad meaning: cover versions could
include also live versions, remixes, variations, or other
different renditions of a composition. Although the task
is often easy for a human listener, it is very difficult for a
computer, as different arrangements, tempos, rhythmics,
languages of the lyrics, and other features of the music
might vary significantly. Thus, the challenge is to extract
meaningful features from the audio data and compare
them in a robust way.
The identification application used by BMAT is based
on the best performing method developed for cover
song identification to date. In the international MIREX
Evaluation for applications in music information
retrieval, the method has so far the highest-ever
performance in cover song identification (at MIREX
2009). BMAT’s Vericast Covers adds several new features
to the original method to make it more suitable for setlist
identification.
4.2 Pilot scenario
Teosto provided BMAT with a total of 231 refererence
tracks (in mp3 format) from the three bands (PMMP,
Nightwish and Notkea Rotta), and a total of six audio
recordings from the three live performances of the
bands in the Provinssirock festival (recorded in June
15 – 17, 2012). The reference audio set included the whole
recorded catalogue of each band.
Each live performance was recorded twice, one version
recorded directly from the mixing desk (in wav format),
and another from the audience near/in front of the
mixing desk (in wav format), using a basic handheld
wave/mp3 recorder (Roland R-05).
Based on the material, the pilot was separated into 3 test
categories (for each band), with two test cases (mixing
desk and audience recording) for each band.
4.3 Identification process
On the BMAT server, the recordings were analysed using
the available fingerprints of the reference song collection
for each band. In general, the cover identification is
processed on a song vs. song basis and returns a distance
between the two input songs, a grade of similarity.
In a first step, the algorithm extracts a Harmonic Pitch
Class Profile (HPCP) for each song. The Harmonic Pitch
Class Profile is a technology that overlaps or folds the
audible spectrum into a one octave space and represents
the relative intensity for each of the 12 semitones of the
Western music chromatic scale. Next, the similarity of the
HPCPs is computed and returned as a similarity distance
(a value between 0 and 100).
14
In this test case, the live performance recordings were
segmented into audio stream segments that lasted 30
seconds each, and the segments were compared to the
reference audio database. The result of the analysis was a
similarity matrix containing the distances between each
audio stream segment and each reference track.
In a final step, a report about all matches for the live
performance audio stream is extracted from the
similarity matrix. There might exist several candidates
for one 30 second segment, in which case the matches
are marked as conflicting. The conflicting matches are
resolved by looking at consecutive audio segments and
the similarity distances of the matches and applying a
threshold for an acceptable distance.
4.4 Test case results
The audio setlist test was conducted by using live
performance material of three Finnish groups: PMMP,
Nightwish, and Notkea Rotta. The groups represent the
genres of pop, heavy metal, and hip hop, respectively.
The object of the experiment was to successfully
determine the setlists using only audio recordings of the
performances as queries and the back catalogues of the
groups as targets. Teosto provided BMAT both the query
and the target audio data.
BMAT reported the results in a report describing
the (possibly conflicting groups of) identified tracks
with their durations and timestamps, along with the
minimum, average, and maximum distances of the
composite segment matches. The matrices of distances
between each performance segment query and reference
data target were provided as electronical appendices.
The report also discusses the time consumption of
performing the test cases. In all cases, the machine-based
setlist approximation performed far more efficiently
than browsing through the performance recordings
manually, suggesting that the system can be taken into
production without too much concerns of the sufficiency
of computational resources.
4.4.1 PMMP
The reference audio set for PMMP included 68 tracks,
for which BMAT extracted digital fingerprints. The
bands’ live performance at Provinssirock main stage
(on Saturday, June 16, 2012, lasting 90 minutes), was
recorded from the mixing desk by PMMP’s FOH sound
engineer, and the audience recording was recorded using
a handheld digital wave/mp3 recorder (Roland R-05) from
the audience, stage center, from directly in front of the
mixing desk.
Both test cases for the PMMP live performance (mixing
desk and audience recording) resulted in very good
identification results. All 19 tracks included in the PMMP
set list were correctly identified from both recorded
versions, resulting in a perfect 100% accuracy.
Photo: Turo Pekari
15
Datetime
2012-06-16 23:31:00
Duration
Track Artist Conflicting
210
Suojelusenkeli
PMMP
C
2012-06-16 23:32:00
180KorkeasaariPMMPC
2012-06-16 23:32:00
180
Etkö ymmärrä
PMMP
C
2012-06-16 23:35:00
270HeliumpalloPMMPC
2012-06-16 23:36:30
90
Kesä -95
PMMP
C
2012-06-16 23:40:30
180MatkalauluPMMP
2012-06-16 23:43:30
390
Pikku Lauri
PMMP
C
2012-06-16 23:44:00
270Merimiehen vaimoPMMPC
2012-06-16 23:50:00
150MatojaPMMP
2012-06-16 23:53:00
270
Jeesus Ei Tule Oletko Valmis
PMMP
C
2012-06-16 23:55:00
120
Tässä elämä on
PMMP
C
2012-06-16 23:56:00
150
Etkö ymmärrä
PMMP
C
2012-06-16 23:58:00
270RakkaalleniPMMPC
2012-06-17 00:02:30
240KesäkaveritPMMP
2012-06-17 00:06:30
210Päät soittaaPMMP
2012-06-17 00:11:00
240LautturiPMMPC
2012-06-17 00:11:30
210
Kesä -95
PMMP
C
2012-06-17 00:15:00
240PariterapiaaPMMP
2012-06-17 00:19:00
270TytötPMMPC
2012-06-17 00:20:00
180
Etkö ymmärrä
PMMP
C
2012-06-17 00:24:00
240Joku rajaPMMP
2012-06-17 00:29:30
210Viimeinen valitusvirsiPMMPC
2012-06-17 00:30:00
150
Suojelusenkeli
PMMP
C
2012-06-17 00:34:00
270JoutsenetPMMPC
2012-06-17 00:34:00
180
Pikku Lauri
PMMP
C
2012-06-17 00:36:00
90
Suojelusenkeli
PMMP
C
2012-06-17 00:39:30
210ToivoPMMPC
2012-06-17 00:42:30
300
Suojelusenkeli
PMMP
C
2012-06-17 00:43:30
240Tässä elämä onPMMPC
2012-06-17 00:43:30
60
Pikkuveli
PMMP
C
2012-06-17 00:44:00
60
Mummola
PMMP
C
2012-06-17 00:48:30
180Koko ShowPMMPC
2012-06-17 00:48:30
150
Pikku Lauri
PMMP
2012-06-17 00:52:00
210KohkausrockPMMP
Table 6. Identification report for PMMP mixing desk recording from Provinssirock, 16.6.2012,
including conflicting matches (resolved matches in bold).
16
C
Figure 2. PMMP similarity matrix distribution for the mixing desk recording. The blue lines represent matches. Source: BMAT
PMMP setlist, Provinssirock, Seinäjoki (16.6.2012)
Identified correctly (Y/N)
1. Korkeasaari
Y
2. Heliumpallo
Y
3. Matkalaulu
Y
4. Merimiehen vaimo
Y
5. Matoja
Y
6. Jeesus ei tule oletko valmis
Y
7. Rakkaalleni
Y
8. Kesäkaverit
Y
9. Päät soittaa
Y
10. Lautturi
Y
11. Pariterapiaa
Y
12. Tytöt
Y
13. Joku raja
Y
14. Viimeinen valitusvirsi
Y
15. Joutsenet
Y
16. Toivo
Y
17. Tässä elämä on
Y
18. Koko show
Y
19. Kohkausrock
Y
Table 7. Final setlist of 19 songs from the PMMP live performance at Provinssi (16.6.2012)
However, it should be noted that out of the identified 19
pieces 13 were included in conflicting groups in the desk
recording and 12 similarly in the audience recording.
Using the minimum and average distance, all conflicting
groups could be resolved correctly.
Based on the results, we can assume that the PMMP
live versions were somewhat closer to the original
performances than those of the other groups in the
experiment. It is also likely that the parameters and
threshold of the system were trained with music of the
pop genre, thus enabling a chance of slight parameter
overfitting for the PMMP case. Nevertheless, the perfect
setlist identification is a very good result for the tested
cover song identification system.
17
4.4.2 Nightwish
The reference audio set for Nightwish included 101 tracks,
for which BMAT extracted digital fingerprints. The bands’
live performance at Provinssirock main stage (on Friday,
June 15, 2012, lasting 89 minutes), was recorded from the
mixing desk by Nightwish’s FOH sound engineer, and the
audience recording was recorded using a handheld digital
wave/mp3 recorder (Roland R-05) from the audience,
stage center, from directly in front of the mixing desk.
For the Nightwish live performance, a total of 13 songs
out of the 16 performed were correctly identified from
the mixing desk recording, resulting in a accuracy of 81%.
Out of the three unidentified songs, one (“Finlandia” by
Sibelius) was not included in the reference audio set and
thus was impossible to identify in the test scenario. The
other two songs, “Come Cover Me” and “Over The Hills
And Far Away” were included in the reference audio set,
but were not correctly identified by the system.
Figure 3. Nightwish similarity matrix distribution for the mixing desk recording. The blue lines represent matches. Source: BMAT
In the Nightwish case, also several false positive songs
were repeatedly detected as match candidates, including
“Instrumental (Crimson Tide /Deep Blue Sea)”, “Lappi
(Lapland)”, “Taikatalvi” and “Sleepwalker”.
“Come Cover Me” was identified as a candidate match,
but the matching failed because they did not exceed
the minimum threshold. Both unidentified songs were
recorded by another lead singer in 2006, and a possible
reason for the identification to fail was the greater
distance between the live version and the recorded
version because of this.
18
From the Nightwish audience recording, 9 out of 16 songs
were correctly identified. Analysis showed that a further
five songs were correctly identified as match candidates,
but failed because of the set minimum distance threshold.
Datetime
Duration
Track
2012-06-15 23:30:30
60
Taikatalvi
Nightwish
C
2012-06-15 23:31:00
210
Lappi (Lapland) II: Witchdrums [*] -
Nightwish
C
2012-06-15 23:31:30
120
Sleepwalker
Nightwish
C
2012-06-15 23:35:00
150
Storytime Nightwish
C
2012-06-15 23:35:30
120
Lappi (Lapland) III: This Moment Is Eternity [*]
Nightwish
C
2012-06-15 23:35:30
120
Etiäinen
Nightwish
C
2012-06-15 23:38:30
90
Storytime Nightwish
C
2012-06-15 23:38:30
60
Lappi (Lapland) IV: Etiäinen [*]
Nightwish
C
2012-06-15 23:38:30
60
Etiäinen
Nightwish
C
2012-06-15 23:38:30
90
Lappi (Lapland) III: This Moment Is Eternity [*]
Nightwish
C
Artist
Conflicting
Datetime
Duration
Track
Artist
2012-06-15 23:40:00
210
Wish I Had An Angel
Nightwish
C
2012-06-15 23:40:30
60
Instrumental (Crimson Tide / Deep Blue Sea)
Nightwish
C
2012-06-15 23:41:00
120
Lappi (Lapland) III: This Moment Is Eternity [*]
Nightwish
C
2012-06-15 23:41:30
120
Sleepwalker
Nightwish
C
2012-06-15 23:42:00
90
Bare Grace Misery
Nightwish
C
2012-06-15 23:43:00
90
Taikatalvi
Nightwish
C
2012-06-15 23:43:30
270
Amaranth Nightwish
C
2012-06-15 23:44:30
210
Instrumental (Crimson Tide / Deep Blue Sea)
Nightwish
C
2012-06-15 23:49:00
120
Scaretale Nightwish
2012-06-15 23:52:00
60
Scaretale Nightwish
2012-06-15 23:54:30
120
Instrumental (Crimson Tide / Deep Blue Sea)
Nightwish
2012-06-15 23:56:30
210
Dead To The World Nightwish
C
2012-06-15 23:58:00
180
Instrumental (Crimson Tide / Deep Blue Sea)
Nightwish
C
2012-06-16 00:00:30
300
I Want My Tears Back
Nightwish
C
2012-06-16 00:04:30
60
Sleepwalker
Nightwish
C
2012-06-16 00:05:00
510
Instrumental (Crimson Tide / Deep Blue Sea)
Nightwish
C
2012-06-16 00:07:00
150
Sleepwalker
Nightwish
C
2012-06-16 00:07:30
90
Etiäinen
Nightwish
C
2012-06-16 00:08:00
240
Lappi (Lapland) III: This Moment Is Eternity [*]
Nightwish
C
2012-06-16 00:10:30
60
Sleepwalker
Nightwish
C
2012-06-16 00:11:30
270
Last of the Wilds
Nightwish
C
2012-06-16 00:18:00
180
Planet Hell
Nightwish
C
2012-06-16 00:19:00
210
Instrumental (Crimson Tide / Deep Blue Sea) Nightwish
C
2012-06-16 00:22:00
270
Ghost River
Nightwish
C
2012-06-16 00:28:00
150
Nemo
Nightwish
2012-06-16 00:33:00
480
Lappi (Lapland) III: This Moment Is Eternity [*]
Nightwish
C
2012-06-16 00:34:30
60
Lappi (Lapland) IV: Etiäinen [*]
Nightwish
C
2012-06-16 00:39:30
270
Song Of Myself
Nightwish
C
2012-06-16 00:47:00
360
Lappi (Lapland) III: This Moment Is Eternity [*]
Nightwish
C
2012-06-16 00:47:00
240
Last Ride Of The Day
Nightwish
C
2012-06-16 00:48:00
300
Lappi (Lapland) IV: Etiäinen [*]
Nightwish
C
2012-06-16 00:49:00
240
Taikatalvi
Nightwish
C
2012-06-16 00:52:00
60
Forever Yours
Nightwish
C
2012-06-16 00:54:00
210
Imaginaerum
Nightwish
C
2012-06-16 00:54:30
60
Arabesque
Nightwish
C
Conflicting
Table 8. Identification report for Nightwish mixing desk recording from Provinssirock, 15.6.2012,
including conflicting matches (resolved matches in bold).
19
Nightwish setlist, Provinssirock, Seinäjoki (15.6.2012) Identified correctly (Y/N)
1. Finlandia
-
2. Storytime
Y
3. Wish I Had an Angel
Y
4. Amaranth
Y
5. Scaretale
Y
6. Dead To the World
Y
7. I Want My Tears Back
Y
8. Come Cover Me
N
9. Last Of The Wilds
Y
10. Planet Hell
Y
11. Ghost River
Y
12. Nemo
Y
13. Over the Hills and Far Away
N
14. Song of Myself
Y
15. Last Ride of the Day
Y
16. Imagenaerum
Y
Table 9. Final setlist of 16 songs from the Nightwish live performance at Provinssi (16.6.2012)
Photo: Turo Pekari
20
4.4.3. Notkea Rotta
The reference audio set for Notkea Rotta included 62
tracks, for which BMAT extracted digital fingerprints. The
bands’ live performance at Provinssirock YleX stage (on
Sunday, June 17, 2012, lasting 60 minutes), was recorded
by YLE for a TV broadcast, and the audience recording
was recorded using a handheld digital wave/mp3
recorder (Roland R-05) from the audience, stage center,
from directly in front of the mixing desk.
The Notkea Rotta live performance analysis resulted
in only three composite matches for the mixing desk
recording and four for the audience recording, all
of which were found to be false positives. Thus the
identification rate for Notkea Rotta was 0%.
Figure 4. Notkea Rotta similarity matrix distribution for the mixing desk recording. The blue lines represent matches. Source: BMAT
Two main reasons were found for the failure of the
system to correctly identify the Notkea Rotta live
performance. The first observation is based on the
match of the tested identification method with the
genre in question, in broad terms, hip hop. The songs
performed contain much less harmonic variation than
songs performed by the pop and metal bands in the
other test cases, making it more difficult to differentiate
songs based on their harmonic structure. Second, the
differences in tempo, instrumentation and structure
compared to the original/reference recordings were
considerably greater in the Notkea Rotta case. Whereas
in the Nightwish case the test results could be improved
by tuning the system parameters (e.g. the minimum
distance threshold), in the Notkea Rotta case this would
not make the results any better.
Photo: Turo Pekari
21
4.5 Evaluation of the results
The algorithm tested in the live music identification
pilot detected 100% of the performed songs for both the
mixing desk and the audience recordings for the PMMP
live performance. It also showed good results for the desk
recording for the Nightwish performance (81%). On the
other hand, the Notkea Rotta live performance resulted
in few and only false positive matches, resulting in 0%
accuracy in identification.
for the cover identification technology, but another
reason is related to the nature of the cover version
identification task itself: most Western pop music tracks
are in relative terms so similar to each other, in terms
of the harmonic, structural and melodic choices, that a
comparison of one song to millions of other songs will
always produce a large number of positive matches –
songs that are in this sense related to each other.
Looking at the results, there was no marked difference
between the mixing desk recordings and the audience
recordings. The audience did however show lower
distances and a lower time precision than the desk
recordings. This has two reasons: one is that in the
audience recording, the actual live performance is only
one part of the audio signal, and the other part is the
audience itself. Another reason is the effects like echos
for drums and vocals that depend on the position of the
recording equipment and that occur between the PA
loudspeaker and the recording microphone.
However, this limitation can be overcome by limiting
the size of the reference dataset as was done in this pilot
research. For practical use scenarios, this would mean
that the system would need the name of the band or
other information before the analysis that can be used to
narrow down the search.
The setlist detection was completed in a reasonable
amount of computational time consumption, easily
surpassing the effiency of manual labor needed for the
task.
The reference audio set in this test was limited to the
recorded catalogue of each artist. The tested cover
identification technology cannot currently be scaled
to work in a similar manner as broadcast monitoring
systems, where each track can be compared to a reference
database of millions or tens of millions of tracks. One
reason for this is the computational complexity needed
22
The uneven performance of the system in the three cases
makes it difficult to provide a definitive opinion of the
system, but several positive notions and ideas for future
work can be presented from the experiment. Based on the
results obtained from the experiment, the system could
be put into operation, but only with music from certain
genres. Also, manual verification for the results will
probably be needed, as the unsolvable conflicting groups
seem to be inevitable.
While a completely reliable automatic setlist detection
system is somewhat an impossible task to build, a system
that needs manual assistance only in the conflicting
identifications and other possible obscurities could well
be constructed from the tested system, based on the
results and the report.
4.6 Possible use scenarios
From the live music identification pilot research, Teosto
identified three possible use scenarios for the technology
from the point of view of a music performance rights
society: large music festivals, active live music clubs, and
artist or tour specific uses.
All the use scenarios take as a starting point that the
technological limitations of the tested system and other
process related factors (such as automated matching
the identification results to Teosto’s database of authors,
publishers and works, which was not tested in this
pilot research) could be solved, and the total costs of the
system would be comparable to the costs of the live music
reporting scheme currently in place.
Large music festivals could be the most cost-effective
use scenario, as the ratio of the number of identified
performances or number of identified songs to the
performance royalties collected and distributed would
be beneficial. The automated identification system would
replace the manual sending of setlists and/or the manual
inputting of setlists by artists to Teosto’s systems. In
music festivals, the data from the identification system
could possibly also be used as input for other services for
festival visitors and other consumers.
Active live music venues, such as large clubs, ski resorts,
and cruise ships could be another use scenario that could
be cost-effective because of the large number of live
shows per year. The live music identification system could
be integrated into the venues music system, and live
music identification could be combined with automated
identification and reporting of mechanical music
(background music, dance music, DJs).
A third use scenario could be artist and/or tour specific
use of the live music identification service, where the
artist would use the service to replace manual inputting
or sending in of set lists to Teosto.
In all the use scenarios listed above, the quality of the
identification results could be improved by introducing a
verification step, where the automatically generated set
list would be verified by the artist before acceptance.
This chapter is based on the articles “Analysis of the
Automatic Live Music Detection Experiment” by Teppo
Ahonen (2012) and “Teosto – BMAT Vericast Covers Pilot
Report” by BMAT (2012).
Photo: Turo Pekari
23
5. DJ/Club monitoring pilot
Teosto and BMAT tested the Vericast audio identification
technology on DJ club performances for automatic setlist
creation. The 3 month pilot was started in December
2012 and four Finnish DJs took part in the pilot: Darude,
K-System, Orkidea and Riku Kokkonen. The DJ sets were
recorded in late 2012.
5.1 Identification technology
For the audio identification BMAT applied their Vericast
audio fingerprint technology, which is currently used by
over 30 performance rights organizations for worldwide
music broadcast monitoring, serving as a tool for
royalty distribution and market statistic calculation.
The technology scales to multi-million reference song
databases as well as real-time audio recording input.
The performance of the identification of a recording
against a song database is influenced by three different
factors.
In the identification process, the user provides the
reference audio by uploading the content or by providing
access to an online stream. Each recording is matched
against the reference collection with its resulting
identifications, after which the identifications are
enriched with audio content metadata (e.g. artist, title,
ISRC, ISWC, album), and an identification report is
prepared.
2) The merging of the results of two neighboring
fingerprint segments and the elimination of conflicting
candidates has a much bigger impact on the calculation
effort and performance of the algorithm. A larger
database creates more possible false positives that have to
be resolved.
For the matching process an audio fingerprint - a
simplified data representation in the frequency domainis extracted for each reference audio song and for the
audio recording. This process is supported by a hash
function, similar to cryptographic hash functions, that
converts the large amount of bits from the input audio
signal to a significant and representative amount of
bits. An important constraint for this hash function is
that perceptually similar audio objects result in similar
fingerprints, and on the other hand, that dissimilar audio
objects result in dissimilar fingerprints.
The reference fingerprints are collected in a database,
optimized for querying input fingerprint fragments. The
algorithm is optimized for identification of reference
audio of minimum 3 seconds playtime. A fingerprint
fragment in the configuration for this pilot represents an
audio segment of 6 seconds, which results in a duration
precision of +/- 3 seconds.
1) The size of the database has a small impact on searching
possible candidates for fingerprint fragments as the data
structure is optimized for this kind of queries.
3) The last influence is the duration of the recording,
which has a linear impact on the search performance in
the database and can have an even bigger influence on the
resolution depending on the coverage and constellation of
the musical content inside the recording.
BMAT fingerprint solution is robust and resistant to
channel distortion and background noise. It is optimized
and tuned to have a detection rate of 99% with no false
positives and over 90% detection for background music.
The solution is restricted to the reference collection of
the user. If a recording contains an unavailable reference
or a new recorded version of that reference (e.g. live
performance, time stretched version) we talk about
a dissimilar audio object in regards to the reference
collection. The algorithm will not be able to match the
song occurrence in the recording.
The identification technology is currently operating in
a production platform that is constantly analyzing over
2.000 radio and TV channels against a music database of
nearly 20 million reference songs.
5.2 Pilot setup
For the pilot BMAT received 5 DJ performances recordings
from TEOSTO, of which 4 were recorded from the line-out
signal of the DJ set and 1 recording was recorded from a
24
microphone placed at the DJ desk. From the recordings,
four have a duration of around one hour (playtime) and
one recording has a length of 30 minutes.
Test DJ Type Recording Duration
1
K-System line-out k_system.mp3
01:00:00.16
2
Orkidea line-out orkidea.mp3
01:00:11.03
3
Riku Kokkonen line-out riku_kokkonen.mp3 00:28:06.17
4
Darude line-out darude_radio.mp3
01:00:05.49
5
Darude microphone darude_seattle.mp3
01:00:00.16
Table 10. Delivered recordings of DJ sets for the pilot. Source: BMAT
Reference collections
For each DJ performance, TEOSTO prepared a detailed
setlist with the artist and song title as well as the order
of appearance. BMAT created the following reference
collections which were used for matching against the
recordings:
ORIGINAL: In addition to the recordings, BMAT received
56% (40 references) of the played content from TEOSTO
and could retrieve another 21% (15 references) from
BMAT internal music database. This content is used as
the ORIGINAL reference collection to match against the
recordings and has an overall coverage of 76% (55 of 72
references) ofthe played music in the recordings.
VERSION: Apart from the original references, BMAT
retrieved 166 songs from the internal music database
that were not the played version of the song, but another
release or remix from the artist (e.g. original, extended
version, club, mix, radio edit, featuring...). This collection
of songs was called VERSION dataset and was used to
bias the results and investigate the relation between the
different versions.
BIGDATA: This third collection was used for performance
estimations and validation and it contains 10K of
popular club music references, including the ORIGINAL
references that were played in the delivered recordings.
Photo: Turo Pekari
25
5.3 Test case results
5.3.1 K-System
K-System is a Finnish DJ and producer since 1999.
K-System’s recorded DK set (length: 60 minutes)
contained a electronic and house music mix. For the pilot,
13 out of the 14 performed tracks were available to be used
as reference audio.
Of the 13 available songs, 13 could be directly identified
with the ORIGINAL database. Time detection is precise to
3 seconds and overlaps of two songs playing at the same
time are detected in several cases up to 16 seconds. In
some cases the outros of the songs do not coincide with
the reference songs because they contain remixes or
heavy filters applied from the DJ.
Table 11 indicates the identifications against the
ORIGINAL collection. The unavailable songs were marked
yellow, the false negative detection are colored red.
Table 12 shows the matches with the VERSION collection.
The rows marked green are matches that were missed
in the identification against the ORIGINAL collection,
but could be identified with a similar version of the
references song in the VERSION collection.
StartEnd Artist
Title
Comment
1
0
303
Deniz Koyu
Tung! (Original Mix)
CORRECT
2
297
559
Cedric Gervais
Molly (Original Mix)
CORRECT
3
553
846
Swedish House Mafia
Greyhound
OTHER VERSION
4
824
1107
Hard Rock Sofa, Swanky Tunes
Here We Go (Original Mix)
CORRECT
5
1039
1378
HeavyWeight
Butterknife
CORRECT
6
K-System & Probaker
Lingerie Demo 1
NOT AVAILABLE
7
1731
2008
Sebastian Ingrosso & Alesso
Calling - Original Instrumental Mix
OTHER VERSION
8
1956
2141
Chocolate Puma, Firebeatz
Just One More Time Baby (Original Mix)
OTHER VERSION
9
2135
2438
Pyero
Ole (Original Mix)
CORRECT
10
2473
2699
Quintino, Sandro Silva
Epic (Original Mix)
CORRECT
11
2683
2920
Nari & Milani
Atom (Original Mix)
CORRECT
12
2913
3130
Daddy’s Groove
Turn The Light Down (David Guetta Re Work) CORRECT
13
3123
3370
Basto
I Rave You (Original Mix)
CORRECT
14
3354
3647
Firebeatz, Schella
Dear New York (Original Mix)
CORRECT
Table 11. Identification for K-System DJ Set against ORIGINAL collection (Source: BMAT)
StartEnd Artist
Title
Comment
1
0
165
Deniz Koyu
Tung! (Edit)
CORRECT
2
297
559
Cedric Gervais
Molly (Club Radio Edit)
CORRECT
3
553
841
Swedish House Mafia
Greyhound (Radio Edit)
OTHER VERSION
4
824
1107
Hard Rock Sofa, Swanky Tunes
Here We Go
CORRECT
7
1725
2008
Alesso, Sebastian Ingrosso
Calling (Lose My Mind) Feat. Ryan Tedder
(Extended Club Mix)
OTHER VERSION
10
2437
2694
Quintino, Sandro Silva
Epic
CORRECT
11
2616
2925
Nari - Milani
Atom
CORRECT
13
3123
3360
Basto
I Rave You
CORRECT
Table 12. Identification for K-System DJ Set against VERSION collection (Source: BMAT)
26
5.3.2 Orkidea
The Finnish electronic music artist DJ Orkidea, performed
for the second test case for this pilot. The recording was
recorded from the line out of the DJ desk (length: 60
minutes) and contains mainly techno and trance music.
StartEnd Artist
Title
Comment
1
0
88
Tiësto
Ten Seconds Before Sunrise
CORRECT
2
87
529
Lowland
Cheap Shrink
CORRECT
3
Attractive Deep Sound
pres. Little Movement
The Anthem!
NOT AVAILABLE
4
Full Tilt
Class War
TIME STRETCH
5
Steve Brian
Vueltas (Thomas Datt Instrumental)
TIME STRETCH
6
1485
1788
Orkidea
Liberateon (Mystery Islands Remix)
CORRECT
7
Paul Oakenfold
Southem Sun (Orkdea´s Tribute Mx)
NOT AVAILABLE
8
2207
2525
Solarstone & Giuseppe Ottaviani
Falcons (Giuseppe Ottaviani OnAir Mix)
CORRECT
9
2596
2920
Alex M.O.R.P.H.
Etemal Flame (Alex M.O.R.P.H.s
Reach Out For The Stars Mix
CORRECT
10
Mark Sherry
My Love (Outburst Vocal Mix)
NOT AVAILABLE
11
Omnia
Infina
TIME STRETCH
Table 13. Identification for Orkidea DJ Set against ORIGINAL collection.
There were no identifications results with the VERSION
database for this test case, but in comparison to the other
test cases, only 2 out of 166 references from the VERSION
database are related to Orkidea DJ Set.
The algorithm can detect 5 of 8 available songs against the
ORIGINAL database. All 3 cases of false negative could
be related to the fact that the algorithm is not resistant
to timescale-pitch modification. Part of the report are
3 audio pairs, each pair contains an audio sample of
the reference and recording, which show the timescale
examples for the songs #4, #5 and #11. It results that
for these songs a correct reference phonogram was not
available and the algorithm detected 100% of the specified
and detectable cases. The total detection rate for this test
case was 62,5%.
27
5.3.3. Riku Kokkonen
the pilot. With a length of 28 minutes, this test case has
the shortest recording of the pilot with the most different
reference songs.
Riku Kokkonen is a Finnish DJ playing Electronic
and electronic and house music. The delivered DJ set
contained 18 of which 11 were available for reference for
StartEnd Artist
Title
Comment
1
Avicii
Levels (Acapella)
NOT AVAILABLE
2
Taio Cruz
Hangover (Acapella)
NOT AVAILABLE
3
Dada Life
Kick out the epic motherfucker (Original Mix) NOT AVAILABLE
4
164
221
Basement Jaxx
Wheres Your Head At Your
CORRECT
5
Nari & Milani
Up (Acapella)
NOT AVAILABLE
6
282
349
Showtek & Justin Prime
Cannonball (Original Mix)
CORRECT
7
343
534
Swedish House Mafia
Greyhound
CORRECT
8
538
605
Daniel Portman & Stanley Ross
Sampdoria (Original Mix)
CORRECT
9
Fatboy Slim
Right Here, Right Now (Acapella)
NOT AVAILABLE
10
635
754
Plastik Funk, Tujamo
WHO (Original Mix)
CORRECT
11
758
831
Chuckie
Who Is Ready To Jump (Dzeko & Torres Remix) CORRECT
12
819
892
Firebeatz & Schella
Dear New York (Original Mix)
CORRECT
13
House Of Pain
Jump Around
OTHER MIX
14
Jay-Z & Kanye West
Otis (A Skillz Remix)
NOT AVAILABLE
15
TJR
Funky Vodka
NOT AVAILABLE
16
1208
1358
Oliver Twizt
Love Trip (David Jones Remix)
CORRECT
17
1347
1594
Steve Angello & Third Party
Lights (Original Mix)
CORRECT
18
1567
1685
The Aston Shuffle & Tommy Trash Sunrise (Won´t Get Lost)
CORRECT
Table 14. Identification for Riku Kokkonen DJ Set against ORIGINAL collection.
StartEnd Artist
Title
Comment
3
0
170
Dada Life
Kick Out The Epic Motherf**ker
NOT AVAILABLE
4
164
216
Basement Jaxx
Where’s Your Head At (Radio Edit)
CORRECT
7
338
534
Swedish House Mafia
Greyhound (Radio Edit)
CORRECT
10
625
744
Plastik Funk, Tujamo
Who
CORRECT
11
794
831
Chuckie
Who Is Ready To Jump (Ryan Riback Remix)
CORRECT
13
850
1005
House Of Pain
Jump Around (Deadmau5 Edit)
OTHER MIX
16
1213
1358
Oliver Twizt
Love Trip
CORRECT
17
1347
1594
Steve Angello & Third Party
Lights
CORRECT
Table 15. Identification for Riku Kokkonen DJ Set against VERSION collection.
database. Overlaps in between songs could be identified
by the algorithm correctly up to 27 seconds (see Table 14,
#17,#18). In different cases the end of a song could not be
detected as it was remixed from the DJ. The song #13 was
annotated incorrectly in the set list, it is not the original
phonogram, but a variation and a different phonogram.
In the VERSION database the correct phonogram was
28
available and detected by the algorithm. Same for song
#3 which is played at the beginning of the DJ set and was
not available in its annotated version for the pilot, but
could be detected correctly with the VERSION database.
The system detected all tracks that were available for
detection. The overall detection rate for this test case was
90,9%
In this test case, we can see the linear impact of the
recording length on the performance. In comparison to
the prior test cases, the computation time is nearly cut by
half. The performance is not absolutely linear to the other
test cases, because we have the same amount of different
references on the same search space and a big part of
the computation is used to resolve possible overlaps and
merge neighboring partial matches.
For the BIGDATA database the algorithm computes the
results 487 times real time according to the recording
length.
5.3.4 Darude
Darude Radio DJ
Darude is a Finnish trance producer and DJ. The recording
of this test case is 60 minutes long and contains 14 songs
from Electronic and Trance music genres. The recording
StartEnd Artist
1
36
344
was carried out during a studio performance for a radio
show.
Title
Comment
Morgan Page,
Andy Caldwell & Jonathan Mendelsohn
Where Did You Go (Tom Fall Remix)
CORRECT
2
313
580
Ashley Wallbridge
Grenade (Original Mix)
CORRECT
3
527
769
Marco V
GOHF (Kris O’Neil Remix)
CORRECT
4
Cosmic Gate & JES
Flying Blind (TwisteDDiskO Club Mix)
NOT AVAILABLE
5
968
1286
Dada Life
Rolling Stones T-Shirt (Original Mix)
CORRECT
6
Ferry Corsten
Radio Crash (Progressive Mix)
NOT AVAILABLE
7
Above & Beyond feat. Zoe Johnston
Love Is Not Enough
(Maor Levi & Bluestone Club Mix)
TIME STRETCH
8
Nitrous Oxide
Tiburon (Sunny Lax Remix)
TIME STRETCH
9
1956
2238
Above & Beyond feat. Andy Moor
Air For Life (Norin & Rad Remix)
CORRECT
10
2263
2300
Philip Aelis & Tiff Lacey
Heart In Blazing Night (David Kane Remix) CORRECT
11
2483
2838
Majai
Emotion Flash (Incognet Vocal Mix)
CORRECT
12
2806
3129
Jonathan Gering
Let You Go (Original Mix)
CORRECT
13
3113
3365
Nick Wolanski
I Love Mandy (Original Mix)
CORRECT
14
3349
3599
Ercola vs. Heikki L.
Deep At Night (Adam K & Soha Remix)
CORRECT
Table 16. Identification for Darude Radio DJ Set against ORIGINAL collection.
For the pilot, 12 songs were available and 10 songs could
be detected in the ORIGINAL database. The 2 false
negative cases are related to a slight time stretch that was
introduced by the DJ during his performance. Regarding
the detectable songs we achieve a detection rate of 100%,
and the total detection rate for this test case was 83,3%.
StartEnd Artist
Title
Comment
4
870
907
Cosmic Gate & JES
Flying Blind
NOT AVAILABLE
5
968
1286
Dada Life
Rolling Stones T-Shirt
CORRECT
Table 17. Identification for Darude Radio DJ Set against VERSION collection.
In the VERSION database the song #4, that wasn’t
available in the ORIGINAL database could be detected
correctly with a different reference (see Table 17). This
shows again that the algorithm is robust to mixes that
do not destroy the timescale in the frequency domain.
The observations for the performance are similar to the
test cases above. The recording could be analyzed and
matched in
29
Darude Seattle DJ Set
The material for this test case was recorded by the same
DJ as in the previous test, Darude, and is a mix of a
microphone and DJ set recording. The recording is 60
minutes long and contains 15 songs. The device used for
the recording was a Zoom H4N recorder, with a 120 °
stereo microphone directed from the DJ desk towards the
audience.
The audience and room were recorded on a single stereo
track and another track was recorded from the mixer
line. Both tracks were mixed together using Logic Pro
software. The final mix was compressed and signals
have been adjusted and equalized, aiming for a balanced
ambiance. The final mix is approximately 90% of the
clean mixer signal, with the audience track on top to
create atmosphere.
StartEndArtist
In this case, BMAT’s identification technology could
detect only 5 out of 11 available songs within the
ORIGINAL database. The analysis against the VERSION
database did not resolve any further matches. The
results indicate a very low similarity and confidence.
After a deep analysis on the frequency spectrum of the
references and the recording we could find significant
differences which cause the current implementation and
configuration of the fingerprint extraction algorithm and
the later search to fail for this material.
These differences are probably caused by the post
processing of the recording e.g. compression, equalizing
and ambiance balancing, as well as the mixing of the
two signals. For the detection of this audio material, a
different extraction of the fingerprints - both for the
recordings and for the reference ones- is necessary. The
total detection rate for this test case was 45,5%.
Title
Comment
1
36
344
Morgan Page,
Andy Caldwell & Jonathan Mendelsohn Where Did You Go (Tom Fall Remix)
CORRECT
2
313
580
Ashley Wallbridge
Grenade (Original Mix)
CORRECT
3
527
769
Marco V
GOHF (Kris O’Neil Remix)
CORRECT
4
Cosmic Gate & JES
Flying Blind (TwisteDDiskO Club Mix)
NOT AVAILABLE
5
Dada Life
Rolling Stones T-Shirt (Original Mix)
CORRECT
Ferry Corsten
Radio Crash (Progressive Mix)
NOT AVAILABLE
7
Above & Beyond feat. Zoe Johnston
Love Is Not Enough
(Maor Levi & Bluestone Club Mix)
TIME STRETCH
8
Nitrous Oxide
Tiburon (Sunny Lax Remix)
TIME STRETCH
9
1956
2238
Above & Beyond feat. Andy Moor
Air For Life (Norin & Rad Remix)
CORRECT
10
2263
2300
Philip Aelis & Tiff Lacey
Heart In Blazing Night (David Kane Remix) CORRECT
11
2483
2838
Majai
Emotion Flash (Incognet Vocal Mix)
CORRECT
12
2806
3129
Jonathan Gering
Let You Go (Original Mix)
CORRECT
13
3113
3365
Nick Wolanski
I Love Mandy (Original Mix)
CORRECT
14
3349
3599
Ercola vs. Heikki L.
Deep At Night (Adam K & Soha Remix)
CORRECT
968
1286
6
Table 18. Identification for Darude Seattle DJ Set against ORIGINAL collection.
The performance of the algorithm for the ORIGINAL and
VERSION database are very similar to the other test cases.
A difference can be found in the BIGDATA database. As
the fingerprints from the recording and the references
30
are not compatible the amount of possible candidates is
very low and has a more significant impact on greater
databases. Nevertheless the recording could be processed
in 657x real time.
5.4 Evaluation of the results
In In Table 19, we can see the global detection results
of the pilot. We separated the results into #Available
(number of available songs) #Detectable, number of
songs that should be detectable from the technology (e.g.
excluding cases of timescalepitch), #Detected (number
of songs that were detected by BMAT) #Version (number
of songs that were detected with another version than
indicated) and #Total, the resulting total of detection by
BMAT including detected versions.
RESULTS
# Available
# Detectable
#Detected
% Detected
# Version
# Total
% Total
K-System
13
13
13
100%
0
13
100%
Orkidea
8
5
5
62,50%
0
5
100%
Riku Kokkonen 11
11
10
90,91%
1
11
100%
Darude Radio
12
10
10
83,33%
0
10
100%
Darude Seattle 11
11
5
45,45%
0
5
45,45%
Table 19. Overall results of the algorithm for the complete pilot.
During the pilot, we could detect 5 cases of time
stretching in the delivered material. Those changes were
introduced by the DJ and that could not be detected by
the algorithm of the structural changes in the songs.
Excluding these cases, we see that the technology could
detect 100% of all recordings that have been taken directly
from the mixer. In the case of the mixed recording-
that contained mixer signal and microphone recorded
audience signal- the current configuration of the
algorithm shows a poor detection rate. The reason behind
that is that the algorithm is currently optimized for radio
and broadcast recordings, that does not normally contain
signal manipulations in this dimension
5.5 Possible use scenarios
The technology used in the DJ pilot is already in use for
broadcast monitoring by a number of clients around
the world, which means that it’s more mature than the
solution tested in the live music identification pilot. For
the intended use – for monitoring music use in clubs and
by DJs- the technology has some limitations, most notably
that the use of time-stretching by DJs in pilot resulted in
tracks not being identified.
If the limitations that were recognized in this pilot can
be overcome, and the costs of setting up and running this
type of service proves economically viable, Teosto sees
three possible use scenarios for a club /DJ automated
music monitoring system. The use scenarios are similar
to the use scenarios for a live music identification system
(see chapter 4.6): festivals, venues and artist/tour specific
uses.
For the club and DJ monitoring technology, the venue
based use scenario could be the most interesting, as
permanent installation of monitoring systems/services
to active venues (including e.g. clubs, ski resorts, cruise
ships) would also allow for monitoring of background
music usage, and in time, also installation of possible live
music identification services.
The live festival and artist /tour specific use scenarios
would be similar to the scenarios presented in chapter
4.6. We are aware that for example in the Netherlands this
type of a monitoring service has already been tested on
DJ/EDM tours, and experiences from these trials should
be taken into consideration in further planning of these
use scenarios.
This chapter is based on “Teosto –BMAT Vericast Clubs
Pilot Report” by BMAT (2012).
31
6. Consumer survey
In addition to the two technology pilots, the project
included a consumer survey on the possibilities and
potential of using crowdsourcing methods for gathering
information about live gig setlists directly from the
audience. The web survey was carried out in the web
consumer panel of the Finnish market research company
Taloustutkimus Oy (http://www.taloustutkimus.
fi) in December 2012 – January 2013. The 639 survey
respondents were persons who visit live music events at
least once a month.
6.1 Background
The respondents were 15 to 40 years of age, and were all
active concert goers (gigs, clubs, concerts, festivals). Out
of a total of 3920 initial respondents, 16% belonged to the
target group (visits live events at least once a month). The
most active age group in the survey was the 31 to 35 year
olds, of whom almost one in four respondents (23%) go to
live events every week or several times a month.
Majority of the survey respondents could be described as
fans. 81% of the survey respondents could name a favorite
artist (one or several), and 24% of these respondents
say they are members of a fan club or otherwise active
in online fan communities of one or several artists. The
artists that were mentioned most often were Finnish
metal artists, such as Stam1na, Mokoma, Nightwish and
Amorphis; international bands such as Metallica and
Muse, and Finnish pop acts like PMMP and Chisu.
Three out of four respondents that have a favorite artist
have also bought artist merchandise. Bands and artists
that were most often mentioned for merchandise
purchases were Metallica, Mokoma, Stam1na and Iron
Maiden. Male respondents and respondents aged 26
to 35 are more active when it comes to buying artist
merchandise.
The respondents seek information about upcoming live
events online (87%), usually buy their tickets beforehand
(84%), and try to go to every local gig by their favorite
artist(s) (64%).
6.2 Crowdsourcing potential
78% of respondents say they can “name most of the
songs played by their favorite artist in their live
show”, an encouraging result from the point of view of
crowdsourcing potential. Further, the survey target group
is already actively engaged in social media and a majority
of them are also active smartphone users. In fact, 41%
of respondents say they have (at least once) “posted an
update to a social media service directly from a live gig”.
Fans are also interested in following up on a gig by their
favorite artist by searching for gig reviews (45%) and
other information (photos, setlists, fan reviews, etc.)
about the gig (54%) online after the gig.
There’s a marked difference between fans searching
for material posted online by other fans, and actively
contributing (producing, uploading) material themselves:
whereas 54% search for updates made by other fans, only
8% say they write gig reviews themselves, and 9% say
they contribute set lists to online fan communities and/or
social media. However, the respondents who are fan club
members of one or several artists, or who are engaged
in online fan communities, are also more active in this
regard than other respondents.
6.3 Consumer interest in an interactive online setlist service
One main purpose of the consumer survey was to try
and estimate how interested active concert goers and
fans would be in a service that would provide gig set lists
(lists of songs performed at a gig) and would enable the
uploading of set lists by the fans themselves. The interest
was measured by asking the respondents how interested
they would be in using such a service, and how interested
they would be in uploading material into such a service
themselves.
32
As a baseline and for comparison, the respondents
were also asked about their current use of several live
music related online services: Finnish online ticketing
services (lippupalvelu.fi, tiketti.fi, lippu.fi), live music
information services (meteli.net, songkick.com), services
that already focus on setlists (setlist.fm), and one mobile
fan engagement service/platform (Mobile Backstage).
The ticketing services were used actively, with over
75% of respondents using (use or has tried the service)
all three major Finnish ticketing sites. 41% had used or
tried the Finnish live music information service Meteli.
net, and 9% used or had tried Setlist.fm. Songkick and
Mobile Backstage (which is often branded for each
artist and thus probably not known to fans as a separate
service) received only a few mentions – artist fan sites
and Facebook were mentioned more often as information
sources on live music events.
In the end, a total of 30% of all respondents surveyed
showed interest in a service that would enable viewing
set lists, and posting set lists by consumers and fans.
Respondents who could name one or several favorite
artists, were slightly more interested (33%), and fan club
members even more clearly so (47%). Out of different age
groups, respondents aged 15 to 20 showed more interest
in this type of a service than older age groups.
A smaller amount of respondents (12%) said they would
also be interested in posting material to a setlist service
themselves. Again, the fan club members had the largest
amount of interest, with 26% of fan club members saying
they would be interested in posting set lists to this type of
service themselves.
6.4 Conclusions
It has to be noted that converting this type of general
interest shown by consumers in a web survey into actual
users for a service will not be easy or straightforward.
Nevertheless, for the purposes of the present research
project, the aim of the survey was to identify potential
target groups for these types of services, and from this
point of view the survey results are interesting. First,
we can see that the total number of consumers that
could be the target group for an online set list service
in Finland is small. This will probably rule out any large
scale implementation of crowdsourcing methodologies
for gathering set list information by a performance rights
society like Teosto, at least for the time being.
On the other hand, there are certain groups of consumers
– specifically, fan club members and persons actively
engaged in artist fan communities – that could be
genuinely interested in using and posting set list
information for their own favorite artists. This could leave
room for crowdsourcing solutions that could be used for
verifying setlists provided by automated solutions. The
quality of automated reports could be verified by both
fans and artists themselves in order to add a layer of
quality control. From the artists’ point of view this could
also potentially be a way to further engage active fans.
References: Teosto ry – Biisilistapalvelututkimus.
Taloustutkimus Oy, 28.1.2013
33
7. State of the art in Music Information Retrieval:
what could be applied for copyright management?
Teppo Ahonen
The distribution and consumption of music is undergoing
a drastic change. Whereas music used to be distributed
in physical media such as vinyl albums, cassettes,
and compact discs, the current trend favors online
distribution, in either download stores such as iTunes,
or streaming services such as Spotify. Although physical
albums are still manufactured and sold, music is
nowadays more and more stored in various hard drives,
from large servers to personal computers and handheld
devices. Such vast amounts of music data have created a
demand for efficient, reliable, and innovative methods for
accessing music.
Music information retrieval (MIR) [13, 37] is a relatively
young area of research that studies how information can
be extracted, retrieved, and analyzed from large amounts
of music data. MIR is an interdisciplinary area of research
that combines studies from at least computer science,
musicology, mathematics, acoustics, music psychology,
and library sciences.
The target groups of MIR studies fall into various
categories. Firstly, MIR research provides tools and
information for music scholars. Another example of a
MIR target group is the consumer, a person who wants
to find and access music online. But more importantly, in
recent years also the music industry has also experienced
a growing attraction towards MIR research. For example,
a search engine for content-based music retrieval would
clearly have a wide user base. In a similar manner,
the technological discoveries may prove useful for
organizations directly related to the music industry, such
as music copyright societies. The innovations of MIR can
be, and to some extent already have been, applied for
detecting the use of copyrighted material.
The purpose of this article is to provide a review of the
current state of the art in MIR, and offer insight on
methodologies that could be applied to different tasks in
the area of copyright management.
7.1 Music Information Retrieval background
Pinpointing the origins of MIR research is difficult.
Since the early days of information retrieval research,
there has been an interest to study whether the same
methodologies could also be applied for different data,
including music. Several suggestions on the very first
MIR systems date back to the 1960s, with ideas of
representing music as programming language. Clearly,
the computational capabilities back then were too limited
for the algorithms and data representations required for
efficient music retrieval.
In the decades that followed, the progress in
computational and storage resources led into the
development of real-world applications for information
retrieval of musical data. At first the focus was on the
conventional methodologies of textual information
retrieval, combined with music. This can be deemed
rather limiting for a phenomenon as diverse as music
[15], and since the beginning of the 21st century, more and
more studies have focused on the music content itself.
The vast majority of MIR studies still focus on
information retrieval, but a growing amount of studies
focus on other topics that could be defined as “informatics
in music”, for lack of a better term. It should be noted that
studies of automatic composition and other similar areas
of computational creativity are not deemed to be MIR
studies in the strict sense of the word.
7.2 State of the research
In the past ten years, the research area of MIR has
grown into a vast field of interdisciplinary study. This
has happened in conjunction with the rise of digital
music distribution and the changes in the habits of
music consumption. Also, success stories of applying
MIR research to consumer and business-to-business
technologies and services have already surfaced; the
applications in the query by example domain discussed in
this article are a fine example of this.
34
Though there are at the moment no scientific journals
that focus solely on MIR, the discoveries of MIR studies
have been reported in various journals of computer
science, computational musicology, mathematics,
and other related areas. Several textbooks and other
monographs considering topics of MIR studies have
emerged, as well.
In addition to scientific journals, a significant amount
of MIR studies are published in conference proceedings.
The arguably most important conference of MIR is
known as ISMIR [10, 15], abbreviated from International
Society for Music Information Retrieval. ISMIR has
been held annually since 2000, and it has expanded
into a large-scale forum for discussion of recent
discoveries in MIR studies. In addition to ISMIR, many
other conferences related to music and multimedia
frequently publish studies from the field of MIR - some
of the most important ones including ACM Multimedia
(ACM-MM), International Computer Music Conference
(ICMC), International Symposium on Computer Music
Modeling and Retrieval (CMMR), and IEEE International
Conference on Multimedia & Expo (ICME). Also several
smaller workshops have already established themselves,
the most notable ones including the International
Workshop on Advances in Music Information Research
(AdMIRe), International Workshop on Machine Learning
and Music (MML), and International Workshop on
Content-Based Multimedia Indexing (CBMI).
A highly important factor in the recent years of
MIR studies has been the introduction of the Music
Information Retrieval Evaluation eXchange (MIREX)
[14], held in conjunction with ISMIR since 2005.
Borrowing the concept of the textual information
retrieval evaluation TREK, MIREX is a community-based
effort for objective and comparative evaluations of MIR
applications. MIREX provides researchers and groups
with a possibility to evaluate their applications with large
sets of music data and compare the performance rate
against other submitted state-of-the-art approaches, all
without the risk of committing a copyright infringement
by distributing material used in evaluations. The success
of MIREX is noteworthy; for example, in 2012 a total of
205 evaluations were run in the MIREX session.
In addition to the research, MIR-related topics are
currently taught in various institutions around the world,
for both graduate and undergraduate students.
7.3 Current trends
The work in MIR can roughly be divided into three
categories: symbolic music (such as MIDI data, MusicXML,
and other representations with high semantic
information), audio (dealing with raw data of timeamplitude representations, in practice often applying
methods of signal processing for extracting more indepth information of the music), and metadata (such
as tags and other user-generated information, but also
including lyrics). The first two are commonly referred
to as content-based MIR, because they rely solely on the
information contained in the music itself, without the
aid of metadata information. Although the categories
are diverse, several studies incorporate discoveries from
various categories and combine them into novel and
more improved systems; for example, combining audio
classification methods with metadata information such as
lyrics can be beneficial for mood classification (e.g. [8]).
Music retrieval. The basis of MIR research altogether;
retrieving music in a similar manner as how textual or
other information can be retrieved. The task of music
retrieval is a combination of several problems: what
features should be extracted from the music, how the
music or the features should be represented, how the
matching algorithm should be constructed to be both
accurate and robust at the same time, and how the data
should be indexed to allow efficient matching. We will
discuss a subfield of content-based music retrieval called
query by example in more detail later in this article.
Music identification. One of the core problems in MIR
is measuring similarity between pieces of music and
determining whether they can be identified or otherwise
deemed to be highly similar. Cover song identification
[45], a task that is definitely difficult but also highly
applicable when performed successfully, is a key example
of the problem. Approaches and applications of cover
song identification are also discussed in more detail in
this article.
Music categorization. A plethora of applications
applying methodologies of similarity measuring in
combination with machine learning techniques, in order
to classify or cluster sets of music data, have emerged. In
classification, a set of data is used to train a classifier
to label unknown pieces, whereas in clustering the
set of unknown pieces is divided into smaller clusters
according to their similarities. One of the most wellstudied problems is the task of genre classification, with
various methodologies (see e.g. [43] for a tutorial) existing
and high success rates achieved in the related MIREX
task. However, it has also been discussed whether the
rather subjective definition of genre should be used for
criteria in automatic music classification [33].
Outside of genre classification, various different tasks
exist. Recently, classifying music according to the mood it
represents has been studied extensively.
35
Feature extraction and processing. In both the
symbolic and the audio domains of MIR research, a
crucial factor is to extract meaningful content from the
music into representations that can be used for similarity
measuring. This includes different methods, ranging from
low-level signal processing techniques to methods where
the task is to provide a so-called mid-level representation
of the music: a representation that captures desired
features of the music in a both efficiently computable
and robust way, without attempting to extract any highlevel semantic information. The features extracted from
the signal can be processed into representations that
include descriptors of melodic [32], harmonic [7], and
rhythmic [21] content. Such features are also applied in
various tasks from key (e.g. [41]) and chord sequence (e.g.
[40]) estimation to audio thumbnailing and structure
discovery (e.g. [5]). Also instrument detection (e.g. [18])
and signal decomposition (e.g. [48]) can be included as
tasks of this category.
Automatic transcription. The idea of automatic
transcription is to estimate notes, chords, beats, keys,
and other musical descriptors from the audio signal to
provide a more semantically rich high-level description
of the content of the audio signal than the methodologies
described in the previous paragraph; however, these
methodologies are to some extent applied also in
automatic transcription. At its most successful, an
automatic transcription system could be applied as a tool
for producing sheet music representations from pieces of
music in audio format. Requiring methods such as signal
decomposition, fundamental tone estimation, beat detection,
and harmony approximation, automatic transcription in its
current form is anything but a solved task.
Music synchronization. Not to be confused with the
music business usage of the same term: using feature
extraction methods and representations, the task of
music synchronization attempts to synchronize music
from different sources, be it different audio files, or a
combination of music from audio and symbolic domains.
One of the most common tasks of music synchronization
is known as score following [29], where the goal is to
synchronize the sheet music representation to the
corresponding audio file, thus requiring both creating
robust representations for the music to be synced
and synchronizing them with sequence alignment
algorithms.
Optical Music Recognition. Abbreviated OMR, or
occasionally Music OCR, optical music recognition has
been for years an active area of study, where the goal is
to provide a computer-understandable representation
36
from a piece of music described in sheet music format;
see for example [4] for a tutorial. To some extent, OMR
is the vice versa process of automatic transcription,
described previously. Nowadays OMR is often considered
as one of the subfields of MIR, and successful applications
of OMR could be further applied in various MIR tasks.
The modern western music notation system is arguably
one of the most difficult writing systems ever developed
by mankind, consequently making the task of OMR
greatly more difficult than conventional optical character
recognition (which itself is already a highly challenging
task). Out of all different applications for OMR each have
their own particular weaknesses, and a higher accuracy
in OMR could be achieved by combining multiple
OMR systems in order to overcome their individual
shortcomings.
User-based applications and interfaces. End-userbased applications have always been an essential goal
of MIR research, and in recent studies the focus has
especially been on creating innovative interfaces for
accessing music, including novel methodologies for
visualizing music information [39], and techniques
of automatic music recommendation [38]. Automatic
playlist generation (e.g. [3]) has also gained interest from
the research community. In this task, the challenge is to
learn connective features from pieces of music, in order
to produce lists of music that could be deemed enjoyable
and coherent by a human listener. The task is difficult, as
it is based on remarkably subjective qualities of music,
and thus is usually based on applying metadata (e.g.[25]),
collected by mining the web, for obtaining features such
as large sets of tags from services like Last.fm, which
describe collective impressions from pieces of music.
Also, the subjective nature requires that the evaluations
need to be conducted by human listeners, adding more
challenge to the task and making comparison between
systems difficult.
Lyrics. In recent years, including lyrical content in MIR
tasks has been widely accepted, because lyrics can be
seen as a readily available powerful form of metadata;
different lyricists have their individual styles, different
genres use similar vocabularies, and most notably, the
content of the lyrics often correlates with the mood of
the music, thus providing a multimodal approach for
the challenging task of mood classification. Because of
this, using lyrical descriptors in conjunction with audio
features has been widely implemented especially in mood
classification (e.g. [27]).
Non-western music. As MIR is practically an offspring
of research conducted in western institutes by western
scholars, within the field there has always been a slight
bias towards western music. As the data used are
pieces of western music (often gathered from personal
collections or public domain repositories), the applied
features and tools developed for MIR are more or less
developed from a western point of view. As such, they
might not be trivially applicable for different music
cultures from all over the world, since the scales,
rhythms, timbres, and other relevant musical features
often differ significantly. Interest towards MIR for nonwestern music has increased in the last few years [12],
and MIR methods have been developed for music from
cultures such as India (e.g. [49]), China (e.g. [22]), Turkey
(e.g. [17]), and Africa (e.g. [34]), to name but a few.
Legal, business, and philosophical issues. Music is
not just an aesthetic phenomenon; it has a significant
importance for our daily lives, and has gradually become
a relevant area of economic and juridical matters. These
aspects need to be considered when conducting MIR
research.
7.4 Audio music retrieval
Query by example (QbE) means retrieving data by
matching example input to the database. In MIR, QbE is
a task where the goal is to retrieve music from a database
by using an input query that is an example of music. The
example could be either a complete song, or just a short
sample section of the song. A common case for a QbE
end-user would be to use the system to identify a piece of
music played on the radio, in order to discover the name
of the unidentified piece.
Audio music retrieval also includes related concepts with
slightly different approaches. Query by humming (QbH), or
query by singing (QbS), means retrieving music data using
an input query that is obtained as a sung or hummed
version of the prominent musical cue (usually the lead
melody of the piece), which is then matched against the
pieces included in the database. Also query by tapping
(QbT) (e.g. [19]) has been introduced as an alternative
method of providing audio queries that are strongly based
on the prominent rhythmic qualities of the music.
Successful query by example techniques can be applied
for various tasks, from accessing music innovatively and
musically, to identifying music through the comparison
of audio fingerprints. One of the key challenges for query
by example systems is the management of the target
database. In order to be valid practically, the database
needs to contain very large amounts of music data, also
requiring efficient indexing techniques and fast matching
processes.
Features
As it is rather difficult to directly match differently
produced sounds, systems applying query by humming,
singing, or tapping (QbH, QbS, QbT) need to reduce the
audio input query into a symbolic representation in
order to match them against the pieces in the database.
Similarly, the database needs to be converted into a
similar representation before the matching. Starting
from the signal processing of Fourier transform, the
audio signal is converted into a symbolic representation
that allows fast and robust pattern matching techniques
to be applied. However, these methods are clearly prone
to various errors that can occur in the conversion process.
Instead of this, query by example (QbE) systems usually
rely on techniques of audio fingerprinting and detecting
similar segments of music, without processing the audio
into an oversimplified representation. As the query by
example techniques search for identical matches they do
not need to extract any mid- or higher-level semantic
information from the signal.
With query by humming (QbH) or query by singing
(QbS), the matching should be key-invariant; unless the
input is provided by a trained singer or with the aid of a
reference pitch, there is no guarantee that the melody is
in the same key as the target pieces in the database, thus
making matching based on exact note pitches completely
unreliable. Another kind of robustness is also needed.
Using the terminology by Lemstrom and Wiggins [30],
a matching process needs to be both time-scale invariant
(allowing temporal fluctuation) and pitch-scale invariant
(allowing invariances in tone heights). Different systems
solve these problems with different methods.
Similar demands for robustness apply for query by
example (QbE) systems, too, although the key or rhythm
will not cause problems. Here, the imperfect nature
of the input is not caused by the possibly musically
unprofessional user, but instead by noise and distortion
caused by the sampling process; for example, the quality
of the recording equipment, constant or temporary
background noise, or encoding of the input sample for the
transmission process.
37
Applications
The query by example (QbE) methodologies have been
fruitfully processed into two commercially successful
consumer applications.
Shazam (http://www.shazam.com) [53, 52] was one of
the first successful public applications utilizing MIR
techniques, originally launched in 2002 in the United
Kingdom. Shazam is a software for mobile devices that
allows users to record and submit 15 second samples of
audio which are then matched against a large database,
and a list of the nearest matches for the query is
returned. In the case of two or more audio files playing
simultaneously, Shazam is usually capable of returning
a list of the pieces played [52]. Occasionally the list of
false positives might include pieces of music that were
sampled on the query [52]; as such, techniques of Shazam
could be applied for detecting (possibly unauthorized)
sampling.
The features used by Shazam are audio fingerprints
constructed from the spectrogram peaks of the signal.
The spectrogram peaks allow constructing hash
representations from the signal, and the matching
algorithm based on comparing the hashes is fast,
although requiring far more storage space. Because of
using the spectrogram as the starting point, Shazam is a
completely query by example (QbE) -based application,
and thus it compares only recorded performances. Due
to this, cover or live version detection, or QbS, is not
possible using Shazam.
Another query by example (QbE) application that has
recently gained both a fair amount of users and acclaim
from the research community is known as SoundHound
(http://www.soundhound.com). The services offered
by SoundHound are more versatile than the query by
example scheme would suggest; SoundHound also allows
users to input queries by singing or humming, making it
a hybrid of QbE and QbS technologies. SoundHound uses
the backend of the midomi (http://midomi.com) query
by singing service, and utilizes the database of usergenerated renditions as the target data. The technology
behind SoundHound is known as Sound2Sound, and
it is explained on the SoundHound web page in highlevel, unscientific terms, describing “audio crystals”
as the chosen representation. We are unaware of any
publications on the technology behind SoundHound, but
a patent application [35] by SoundHound Inc. exists. The
application explains the feature extraction and matching
process in a more accurate manner. The audio signal is
frame-wisely constructed to a sequence of pitch values
and pauses. The matching process is conducted using
technique entitled Dynamic Transition Matching, which
is a dynamic programming technique and similar to the
Needleman-Wunsch algorithm.
7.5 Cover song identification
Instead of discovering the title of the piece of music as
in the query by example case, it would occasionally be
more appropriate to discover whether a piece of music
has similarities to other pieces; in practice, if the query
piece is either a cover version, a piece of plagiarism, or
just otherwise a highly similar composition. This task is
commonly known as cover song identification, although
it should be noted that the term “cover” is used in a rather
broad sense here: a cover version could be anything
from a remix to a live version, or from a variation to an
alternative arrangement, such as an “unplugged” version.
Whereas the task is somewhat trivial for a human
listener, it is far more challenging for a computer; such
features as arrangements, tempos, keys, and lyrics may
change between the versions, and thus cannot be relied
on in the similarity measuring process. Because of this,
cover song identification requires methods that are at the
same time both robust for the differences between the
versions but also able to capture the essential similarity
in the music.
38
Because cover song identification is clearly an objective
task (a song either is a cover version or not), it provides
a reliable way to evaluate how well similarity measuring
algorithms perform. It also yields important information
on what similarity in music is, and how such similarity
can be represented and measured. Ultimately, successful
cover song identification systems can be applied for
various tasks, most notably plagiarism detection.
The cover version identification task has been a MIREX
challenge since 2006. The setup of the MIREX evaluation
is as follows. A dataset of 1000 files includes 30 cover song
sets, that is, sets of one original performance and 10 cover
versions of the piece. This totals up to 330 pieces that are
used as queries. The remaining 670 pieces are irrelevant
pieces of music, so-called “noise tracks” to make the
task more challenging. Each of the 330 pieces is used as a
query, and for each query, pairwise distances between the
query and each piece in the database is calculated, and a
distance matrix based on these is returned.
The performance is evaluated by several measures, mean
of average precisions (MAP) being probably the most
relevant measure; it describes how well the queries are
answered, taking into account the order of the answers
(i.e. do the correct answers have the smallest distances).
So far, the best-performing algorithm has achieved a MAP
value of 0.75, with 1 meaning a perfect identification.
Based on this, the cover song identification is definitely
not a solved task yet. For a more thorough review of cover
song identification technologies, we refer to a survey by
Joan Serra et al. [45].
Features
Considering that cover versions often feature different
timbral characteristics, using common acoustic
fingerprinting matching is out of the question. Instead,
a robust representation that captures the essential tonal
information – melodies and harmonies – is required. Also,
similarity measuring cannot be based on short segments
of the pieces; with different structures and possibly
highly diverse parts between the versions, it is commonly
considered that cover song identification must be based
on complete pieces of music.
The most frequently used feature that is applied in cover
song identification is known as a chromagram. Also
known as pitch class profile, a chromagram is a sequence
of vectors obtained from the audio signal with short-time
Fourier transform and several steps of post-processing.
Each vector of the sequence describes a small portion
of the piece, usually under one second long, and is
commonly 12-dimensional, thus corresponding to the 12
pitch classes of the western tonal scale (occasionally, a
more fine-grained representation of 24 or 36 dimensions
is used). The continuous vector bin values describe
the relative energy of the pitch classes in the frame,
meaning that for a segment of music where a C major
chord is played, the vector bins corresponding to pitch
classes c, e, and g, have the highest values. For two pieces
sharing common tonal features, the chromagrams are
likely to have similar characteristics, and thus in cover
song identification the task is to measure the similarity
between two chromagrams.
Various methods for measuring the similarity
between chromagrams exist. Several methods apply a
discretization process to turn the continuous-valued
chroma vectors to a symbolic representation and then
use various techniques of pattern matching (e.g. edit
distance [6], dynamic time warping [28], and normalized
compression distance [2, 1]).
Other techniques include calculating cross-correlation
[16] and Euclidian distance [20] between the
chromagrams without attempting to discretize the
chromagram data. Some of the highest-performing cover
song identification methods (e.g. [47]) produce a binary
similarity matrix between the chromagrams, and calculate
the longest path in the matrix, thus using the longest
match between the chromagram sequences as the value
of similarity between the pieces.
The chromagram is robust against changes in
instrumentation and timbre, but two features that might
affect identification are key and tempo. If a piece is
transposed to a different key, it will have a chromagram
with the same values as the original, but in different pitch
class bins. This would make even highly similar pieces of
music seem differing.
In order to measure key-invariant similarity, several
methods are applied. One is to calculate the similarity
between the original and each 12 transpositions of the
cover version chromagram, and select the largest value as
the distance. A more sophisticated method is to transpose
the chromagrams to a common key, either by key
estimation or with methods such as optimal transposition
index [44]. The third possibility is to produce a
representation that describes the relative changes in the
chromagram and measure the similarity between such
representations.
Tempo changes may not have such a drastic effect on
the chroma profiles, but several techniques are also
applied to obtain tempo invariant similarity measuring.
One is to estimate the tempo with beat tracking and
filter the chromagram representations according to the
estimated beats. More commonly, similarity measuring
based on dynamic programming can overcome the
tempo invariances in the similarity measuring. Some
methodologies ignore tempo invariance altogether and
suggest that the obtained results are actually better
without tempo invariance than with such methods, as
unreliable beat estimation might be a weak link in the
similarity measuring process (e.g. [1, 46]).
Occasionally, studies applying melody-based cover
version identification appear. It is difficult to evaluate
the performances between melody- and chromagrambased methodologies, as almost all approaches use
different data sets. It should be noted, though, that the
chromagram also captures the melodic information,
so a successful chromagram-based approach is likely
to include, at least indirectly, some of the melodic
information similarity measuring in the process.
39
Applications
To our knowledge, there are currently very few
commercial applications that actually utilize cover song
identification techniques. One that we are aware of is
known as BMAT Vericast Covers (http://www.bmat.
com). The BMAT Vericast is an application developed
for detection of music played in radio streams; that is, to
provide real-time query by example audio fingerprint
matching in order to distribute royalties over music
played in commercial radios. The Covers version adds to
the technology the cover song identification methods by
Serra et al. [46, 47], and as such provides a method that
could be applied to detect different renditions of a piece
of music from a stream.
An example of such technology could be automatic setlist
identification; the application would examine a recorded
performance and compare the segments against the back
catalog of the artist, providing an estimation of the setlist
performed at the concert, which could again be used for
distributing public performance royalties.
7.6 Conclusions
In this article, we have introduced the basics of music
information retrieval, reviewed the history of the area
of research and depicted some of the current trends and
challenges in the area. In addition, the methodologies
behind two commonly used MIR technologies of music
retrieval and identification were explained in closer
detail, with examples of applications utilizing these
technologies.
The task of audio query by example retrieval requires
methods that match similarities between recordings by
first using audio fingerprinting to represent the spectral
information contained in the music, and then efficiently
measuring the similarity using techniques that are robust
against the noise that might be added to the example
during the recording process. The methodologies of query
by example retrieval have successfully been adapted
in commercial applications, most notably Shazam and
SoundHound, the latter which incorporates query by
singing/humming techniques in the identification
process.
The task of cover song identification, on the other hand,
requires methods that do not attempt to measure
similarities in the signals, but instead extract features
that describe the tonal contents of the pieces, and
then measure similarities between these feature
representations. As the cover versions often vary
intentionally, the methodologies must be robust
by allowing even drastic changes in arrangements,
structures, keys, and tempos in the cover version while
still capturing the essential tonal similarity; that is, the
features that make a piece of music a cover version, most
notably melodic cues and harmonic structures.
40
Both methods were chosen to be presented in this article
for their potential practical appliances for copyright
management organizations such as Teosto (http://www.
teosto.fi) in Finland. Both methods could be applied for
different tasks of managing the distribution of collected
royalties. Query by example techniques are applicable
for monitoring the music of radio broadcasts; there
are already implementations of such systems that have
been put into operation. Cover song identification
has so far not been applied on a large scale, but such
systems are likely to appear in the near future. Also,
copyright infringement detection could possibly benefit
from introduction of MIR techniques. With query by
example, this would most likely mean detection of
usage of unauthorized sampled material. With cover
song identification, methodologies could be applied for
plagiarism detection.
At the same time, it should be noted that the research
problems of MIR are anything but solved. Although
several methodologies and technologies produce highlevel performances in real world tasks, such as the
SoundHound application, there is always room for
improvement, and for most tasks, very few large-scale
implementations even exist. In most cases, the best way
to objectively compare the performance of the state-ofthe-art systems with large data sets depends on whether
they have been submitted to the MIREX evaluation or not.
In addition, considering the amount of published music,
the methodologies for several tasks in MIR still need to
demonstrate their capabilities for managing extremely
large amounts of data in order to enable producing
practical applications and reliable solutions. Also, the
question of a possible glass ceiling of performance is valid
in several tasks.
7.7 Bibliography
[1] Teppo E. Ahonen. Combining chroma features for cover
version identification. In Proceedings of the 11th International Society
for Music Information Retrieval Conference, pages 165–170, 2010.
[2] Teppo E. Ahonen and Kjell Lemstrom. Cover song
identification using normalized compression distance. In
Proceedings of the International Workshop on Machine Learning and
Music, 2008.
[3] Jean-Julien Aucouturier and Francois Pachet. Scaling up
music playlist generation. In IEEE International Conference on
Multimedia and Expo 2002, pages 105–108, 2002.
[4] David Bainbridge and Tim Bell. The challenge of optical
music recognition. Computers and the Humanities, 35:95–121, 2001.
[5] Mark A. Bartsch and Gregory H. Wakefield. To catch
a chorus: using chroma-based representations for audio
thumbnailing. In Proceedings of the 2001 IEEE Workshop on
the Applications of Signal Processing to Audio and Acoustics, pages
15–18, 2001.
[6] Juan P. Bello. Audio-based cover song retrieval using
approximate chord sequences: testing shifts, gaps, swaps and
beats. In Proceedings of the 8th International Conference on Music
Information Retrieval, 2007.
[7] Juan P. Bello and Jeremy Pickens. A robust mid-level
representation for harmonic content in music signals.
In Proceedings of the 6th International Conference on Music
Information Retrieval, pages 304–311, 2005.
[8] Kerstin Bischoff, Claudiu S. Firan, Raluca Paiu, Wolfgang
Nejdl, Cyril Laurier, and Mohammed Sordo. Music mood
and theme classification – a hybrid approach. In Proceedings
of the 10th International Society for Music Information Retrieval
Conference, pages 657–662, 2009.
[9] Donald Byrd and Tim Crawford. Problems of music
information retrieval in the real world. In Information Processing
and Management, pages 249–272, 2002.
[10] Donald Byrd and Michael Fingerhut. The history of ISMIR –
a short happy tale. D-Lib Magazine, 8(11), 2002.
[11] Donald Byrd and Megan Schindele. Prospects for improving
optical music recognition with multiple recognizers. In
Proceedings of the 7th International Conference on Music Information
Retrieval, pages 41–46, 2006.
[12] Olmo Cornelis, Micheline Lesaffre, Dirk Moelants, and Marc
Leman. Access to ethnic music: Advances and perspectives in
content-based music information retrieval. Signal Processing,
90(4):1008 – 1031, 2010.
[13] J. Stephen Downie. Music information retrieval. Annual
Review of Information Science and Technology, 37:295–340, 2003.
[14] J. Stephen Downie. The music information retrieval
evaluation exchange (2005–2007): A window into music
information retrieval research. Acoustical Science and Technology,
29(4):247–255, 2008.
[15] J. Stephen Downie, Donald Byrd, and Tim Crawford. Ten
years of ISMIR: Reflections on challenges and opportunities. In
Proceedings of the 10th International Society for Music Information
Retrieval Conference, pages 13–18, 2009.
[16] Daniel P.W. Ellis and Graham E. Poliner. Identifying ’cover
songs’ with chroma features and dynamic programming beat
tracking. In IEEE Conference on Acoustics, Speech, and Signal
Processing, 2007.
[17] Ali C. Gedik and Barı3 Bozkurt. Pitch-frequency histogrambased music information retrieval for turkish music. Signal
Processing, 90(4):1049–1063, 2010.
[18] Perfecto Herrera-Boyer, Anssi Klapuri, and Manuel Davy.
Automatic classification of pitched musical instrument sounds.
In Signal Processing Methods for Music Transcription, pages
163–200. Springer US, 2006.
[19] Jyh-Shing Roger Jang, Hong-Ru Lee, and Chia-Hui Yeh.
Query by tapping: A new paradigm for content-based music
retrieval from acoustic input. In Advances in Multimedia
Information Processing, volume 2195 of Lecture Notes in Computer
Science, pages 590–597. Springer Berlin Heidelberg, 2001.
[20] Jesper Hojvang Jensen, Mads G. Christensen, and
Soren Holdt Jensen. A chroma-based tempo-insensitive
distance measure for cover song identification using the 2d
autocorrelation function. In Proceedings of the Music Information
Retrieval Evaluation eXchange 2008, 2008.
[21] Kristoffer Jensen. A causal rhythm grouping. In Proceedings
of the Second International Conference on Computer Music Modeling
and Retrieval, pages 83–95, 2004.
[22] Kristoffer Jensen, Jieping Xu, and Martin Zachariasen.
Rhythm-based segmentation of popular chinese music.
In Proceedings of the 6th International Conference on Music
Information Retrieval, pages 374–380, 2005.
[23] Michael Kassler. Toward musical information retrieval.
Perspectives of New Music, 4(2):59–67, 1966.
[24] Anssi Klapuri and Manuel Davy, editors. Signal Processing
Methods for Music Transcription. Springer, New York, 2006.
[25] Peter Knees, Tim Pohle, Markus Schedl, and Gerhard
Widmer. Combining audio-based similarity with web-based
data to accelerate automatic music playlist generation. In
Proceedings of the 8th ACM International Workshop on Multimedia
Information Retrieval, pages 147–154, 2006.
[26] Cyril Laurier. Automatic Classification of Musical Mood by
Content Based Analysis. PhD thesis, Universitat Pompeu Fabra, 2011.
41
[27] Cyril Laurier, Jens Grivolla, and Perfecto Herrera.
Multimodal music mood classification using audio and lyrics. In
Proceedings of the 7th International Conference on Machine Learning
and Applications, pages 688–693, 2008.
[28] Kyogu Lee. Identifying cover songs from audio using
harmonic representation. In Proceedings of the Music Information
Retrieval Evaluation eXchange 2006, 2006.
[29] Serge Lemouton, Diemo Schwarz, and Nicola Orio. Score
following: State of the art and beyond. In Proceedings of the
Conference on New Instruments for Musical Expression, 2003.
[30] Kjell Lemstrom and Geraint A. Wiggins. Formalizing
invariances for content-based music retrieval. In Proceedings
of the 10th International Society for Music Information Retrieval
Conference, pages 591–596, 2009.
[31] Tao Li, Mitsunori Ogihara, and George Tzanetakis, editors.
Music Data Mining. CRC Press, 2012.
[32] Matija Marolt. A mid-level representation for melody-based
retrieval in audio collections. IEEE Transactions on Multimedia,
10(8):1617–1625, December 2008.
[33] Cory McKay and Ichiro Fujinaga. Musical genre
classification: Is it worth pursuing and how can it be improved?
In Proceedings of the 7th International Conference on Music
Information Retrieval, 2006.
[34] Dick Moelants, Olmo Cornelis, and Marc Leman. Exploring
african tone scales. In Proceedings of the 10th International Society
for Music Information Retrieval Conference, pages 489–494, 2009.
[35] Keyvan Mohajer, Majid Emami, Michal Grabowski, and
James M. Hom. System and method for storing and retrieving
non-text-based information. United States Patent 8041734 B2,
2011.
[36] Meinard Muller. Information Retrieval for Music and Motion.
Springer Verlag, 2007.
[42] Zbigniew W. Ras and Alicja A. Wieczorkowska, editors.
Advances in Music Information Retrieval. Springer, 2010.
[43] Nicolas Scaringella, Giorgio Zoia, and Daniel Mlynek.
Automatic genre classification of music content: a survey. IEEE
Signal Processing Magazine, 23(2):133–141, 2006.
[44] Joan Serra, Emilia Gomez, and Perfecto Herrera.
Transposing chroma representations to a common key. In
Proceedings of the IEEE CS Conference on The Use of Symbols to
Represent Music and Multimedia Objects, pages 45–48, 2008.
[45] Joan Serra, Emilia Gomez, and Perfecto Herrera. Audio
Cover Song Identification And Similarity: Background, Approaches,
Evaluation, And Beyond, volume 274 of Studies in Computational
Intelligence, chapter 14, pages 307–332. Springer-Verlag Berlin /
Heidelberg, 2010.
[46] Joan Serra, Emilia Gomez, Perfecto Herrera, and Xavier
Serra. Chroma binary similarity and local alignment applied to
cover song identification. IEEE Transactions on Audio, Speech and
Language Processing, 16:1138–1151, 08 2008.
[47] Joan Serra, Xavier Serra, and Ralph G. Andrzejak. Cross
recurrence quantification for cover song identification. New
Journal of Physics, 11:093017, 09 2009.
[48] Paris Smaragdis and Judith C. Brown. Non-negative matrix
factorization for polyphonic music transcription. In Proceedings
of the IEEE Workshop on Applications of Signal Processing to Audio
and Acoustics, 2003.
[49] Rajeswari Sridhar and T.V.Geetha. Raga identification of
carnatic music for music infomation retrieval. International
Journal of Recent Trends in Engineering, 1(1):571–574, 2009.
[37] Nicola Orio. Music retrieval: A tutorial and review.
Foundations and Trends in Information Retrieval, 1(1):1–90, 2006.
[50] Wei-Ho Tsai, Hung-Ming Yu, and Hsin-Min Wang. Using
the similarity of main melodies to identify cover versions
of popular songs of music document retrieval. Journal of
Information Science and Engineering, 24(6):1669–1687, 2008.
[38] Oscar Celma and Paul Lamere. If you like the beatles
you might like. . . : a tutorial on music recommendation. In
Proceedings of the 16th ACM international conference on Multimedia,
pages 1157–1158, 2008.
[51] Rainer Typke, Frans Wiering, and Remco C. Veltkamp.
A survey of music information retrieval systems. In The 6th
International Conference on Music Information Retrieval, pages
153–160, 2005.
[39] Elias Pampalk, Andreas Rauber, and Dieter Merkl. Contentbased organization and visualization of music archives.
In Proceedings of the 10th ACM International Conference on
Multimedia, pages 570–579, 2002.
[52] Avery Wang. An industrial-strength audio search
algorithm. In Proceedings of the 4th International Conference on
Music Information Retrieval, 2003.
[40] Helene Papadopoulos and Geoffroy Peeters. Large-scale
study of chord estimation algorithms based on chroma
representation and hmm. In Proceedings of the 5th International
Conference on Content-Based Multimedia Indexing, 2007.
42
[41] Geoffroy Peeters. Chroma-based estimation of musical key
from audio-signal analysis. In Proceedings of the 7th International
Conference on Music Information Retrieval, 2006.
[53] Avery Wang. The shazam music recognition service.
Communications of the ACM, 49(8):44–48, August 2006.
8. Summary of project results
Music recognition and broadcast monitoring market
According to the market study carried out by Music Ally,
acoustic fingerprinting music recognition technologies
have a strong potential for providing means through
which to monitor the entirety of the music performed
in any given territory, across radio and television
broadcasts, as well as night clubs and other public
environments.
While new open-source acoustic fingerprinting services
mean that it is now easier for organisations to develop
automated music recognition systems in-house (or
commission them from software developers), this
might not necessarily lead to the most proficient and
cost-effective solutions. To balance this, the increasing
availability of end-to-end monitoring services suggests
that competition is increasing, with the potential to drive
prices down, as well as to lead to the development of
improved technologies.
For performance rights societies, this means that said
technologies could have a very positive impact on
their operations, enabling them to further enhance the
accuracy with which they distribute their collections
amongst their members, while simplifying the logistics of
monitoring music usage.
Nevertheless, partnerships between performance rights
societies and service providers in this space require a
deep level of collaboration, in order to ensure that the
latter have a robust, up-to-date set of fingerprints in their
databases, and that the former are provided with reports
that meet the depth of data their operations require.
In Europe, the push for the reduction of rights
management fragmentation is expected to increase
competition between different countries’ performing
rights societies. Consequently, having music recognition
and monitoring systems in place could be one of the ways
in which organizations might seek to gain competitive
advantages over the others.
Live music identification pilot
In this research pilot, in June 2012, BMAT’s music
identification technology was applied for the automated
reporting of musical works performed at a live music
event. The cover music identification technology
currently being developed by BMAT was for the first time
put to the test in a real life festival setting, to provide
input for a new live music reporting concept currently
being developed at Teosto.
version to a studio recording of the same song The piloted
technology compared the live audio to a reference set of
studio recordings and provides a list of matching song
pairs.
The pilot was carried out with three Finnish bands
(Nightwish, PMMP and Notkea Rotta) at Provinssirock,
one of Finland’s largest rock festivals. The shows were
recorded, analysed by BMAT and the results were
evaluated by Teosto.
The results of the technology pilot were promising: the
piloted technology provided very good results for two out
of the three pilot shows, and worked especially well for
works in the mainstream pop/rock genre. While certain
limitations remain, it is likely that music identification
technologies can in the near future provide a reliable
way for music copyright organizations and other music
industry players to detect and verify setlist data from live
music events.
While automated music broadcast monitoring for online,
radio and TV is already an established market, the
automated identification of musical works performed at
live shows remains a technological challenge. Changes in
tempo, key, instrumentation and song structure, together
with audience sounds and the acoustic characteristics of
a live venue, make it difficult to accurately match a live
Evaluating the pilot results, Teosto also recognized
three possible use scenarios for an automated live music
identification service: use in large music festivals, use
at active music venues such as clubs and cruise ships
(integrated with a general music monitoring system
for club use), and artist or tour specific use (to replace
manual/online reporting).
43
DJ club monitoring pilot
The pilot was carried out in December 2012 – March 2013.
Five recorded DJ live sets from four Finnish DJs were
submitted to BMAT by Teosto, and tested against BMAT’s
commercial broadcast monitoring service Vericast,
which is already in use by more than 30 performance
rights organizations, for worldwide music broadcast
monitoring.
The DJ sets were tested against three different reference
audio sets. One reference set was provided by Teosto,
based on the setlists provided by the DJs, the second was
put together by BMAT (including e.g. alternate versions
of the reference songs), and a third, larger reference
set (also by BMAT) was also used, mainly for testing the
performance of the algorithm.
The results of the DJ pilot were positive: out of all
detectable tracks (excluding tracks with time stretching,
see below), the BMAT Vericast system recognized and
reported correctly all of the tracks (100%). Issues that
need to be solved, however, include the effects of time
stretching and other filtering commonly used by DJs that
in the pilot led to five tracks (out of a total number of 44
tracks, 11%) not being identified by the piloted system.
The recognized potential use cases for the piloted DJ
club monitoring system are similar to the live music
identification technology, with integrated club systems
and artist/tour specific uses seen as more prominent for
the DJ genre.
Consumer survey
In addition to the two technology pilots, the project
included a consumer survey on the possibilities and
potential of using crowdsourcing methods for gathering
information about live gig setlists directly from the
audience. The web survey was carried out in the web
consumer panel of the Finnish market research company
Taloustutkimus Oy (http://www.taloustutkimus.
fi) in December 2012 – January 2013. The 639 survey
respondents were persons who visit live music events at
least once a month.
44
The consumer survey showed that among active Finnish
music fans who frequently attend live shows, there is
an interest towards using interactive services focused
around gig set lists. However, this will probably only be a
niche market at best, and from the point of view of Teosto,
crowdsourcing set lists from audience members on a
large scale is currently not a viable alternative to manual
reporting and/or automated reporting technologies.
However, verifying automatically generated set lists by
fans and audience members could be a possibility for
improving the accuracy of automated set list creation.
9. Conclusions
The starting point for this research project was to try
and analyze some of the effects that new and emerging
digital technologies will have on the music industry,
and more specifically from the point of view of a music
performance rights society, on the managing of different
music related copyrights.
As mapping out all the possible effects of different new
technologies on the management of music rights would
be a task beyond the scope of a single research project,
the focus of this project was narrowed down to a more
manageable size. We decided to focus on one area of
interest that has emerged during the last decade and
will have a big impact on the way music usage data is
gathered and processed, within the music industry or
collective management organizations: automated music
recognition.
In the music industry, different types of automated
content recognition technologies are already widely
used for tasks such as the monitoring of broadcast music,
or classifying and screening content in different ways
in online music and video services. While in use, these
technologies are probably not yet employed to their full
potential, for a number of reasons – the most important
of which is the lack of universal databases that would
link metadata from recorded and broadcast music to
the metadata provided by copyright organizations and
publishers on the authors of musical works.
A new and emerging application area for these
technologies is using automated music recognition
systems for identifying live music and cover versions –
i.e. identifying songs, or musical works, instead of finding
and reporting identical matches of recorded versions
broadcast on radio or TV. Computationally, the task
of developing reliable commercial music recognition
services for live music is a much more difficult challenge
than the monitoring and reporting radio and TV music
use. While academic research on the subject has been
active, in 2012 there was to our knowledge no commercial
live music content recognition services available on the
market.
Having followed some of the academic research on the
subject, and on the other hand, having been in talks with
providers of different broadcast monitoring services,
Teosto wanted to test – if possible – one or several
available solutions to the problem of live music content
recognition in practice.
So in the end, the focus of the current research project
was threefold: first, if possible, to carry out pilot research
on live music recognition in an actual live music setting,
to evaluate the technology and to create a concept for the
use of the technology for music copyright organizations
such as Teosto. Second, to evaluate the current music
recognition and broadcast monitoring market in Europe
and globally. And third, to survey the current academic
research on the subject of automated music recognition
systems (mainly carried out within the interdisciplinary
research field of Music Information Retrieval), and try to
draw conclusions about the possible applications of that
research for the management of music rights and within
the broader music industry.
In the course of the project, two independent research
pilots and a separate consumer survey were carried
out. The goal of the technology pilots were to test new
music identification systems in a real life setting in
order to gather data about the system itself, and to create
an understanding of the requirements that need to be
in place should a collecting society like Teosto adopt
these types of technologies as a part of our reporting
process. The focus was mainly on building a proof-ofconcept for the technology; business implications such
as investments, costs of running the system, cost/benefit
analyses etc. were outside the scope of the present
project, as the technology evaluated was not a released
commercial product and from the outset probably not
mature enough to be implemented in its current state.
The first pilot was carried out in June 2012 in the
Provinssirock rock festival in Seinäjoki, Finland. The
purpose of the festival pilot was to test and evaluate
BMAT’s live music identification technology Vericast
Covers in a live setting, with three Finnish bands from
three different genres. The live shows were recorded in
two versions: one from the mixing desk and one from the
audience, in order to be able to determine whether audio
quality would have an effect on the identification results.
The second technology pilot focused on DJ sets, performed
in a club environment. Five DJ sets from four Finnish
DJs were recorded and tested against BMAT’s Vericast
broadcast monitoring setup. From the results of the
pilots, three usage scenarios/concepts were formulated
from the point of view of a collection society.
The two technology pilots were successful in providing
proof that automated music recognition services can,
in addition to broadcast music monitoring, already be
used in a club environment for identifying and reporting
music played by DJs, and in 1-2 years time possibly also
in a live music setting for automated set list creation.
45
The pilots did also point out a number of challenges
and limitations that need to be solved before adopting
the technologies for large scale use. The main challenge
for all automated music recognition systems to work is
twofold: in order to work in an efficient way, they need a
representative reference audio database that is constantly
updated, and there also needs to be a reliable way to
automatically match the identification results to relevant
metadata – in the case of performance rights societies like
Teosto, to the relevant author and publisher information.
In addition, the tested live music identification and
club/DJ monitoring systems also had certain technical
limitations that need to be improved upon to ensure
reliable results.
46
The consumer survey showed that among active Finnish
music fans who frequently attend live shows, there is
an interest towards using interactive services focused
around gig set lists. However, the potential user base is
very small, and from the point of view of Teosto, using
crowdsourcing to collect set list information from
audience members on a large scale is currently not a
viable alternative to manual reporting and/or automated
reporting technologies. However, using information
gathered from fans and audience members to verify
automatically generated set lists could be a possibility for
improving the accuracy of automated set list creation in
the future.
10. References
Project deliverable
Date
Type
Author
Teosto – BMAT Vericast Covers Pilot Report
29.8.2012
Technical pilot results report
BMAT
Teosto – BMAT Vericast Clubs Pilot Report
21.3.2013
Technical pilot results report
BMAT
Analysis of the Automatic Live Music Detection Experiment
17.12.2012
Research article
Teppo Ahonen
State Of The Art In Music Information Retrieval: What Could Be Applied For Copyright Management
17.12.2012
Research article Teppo Ahonen
Music recognition and broadcast monitoring market research
18.12.2012
Market research report
Music Ally
Teosto ry – biisilistapalvelututkimus
28.1.2013
Market research report
Taloustutkimus
Project final report
28.3.2013
Project final report
Teosto
Seminar
DateType
Location
Musiikin tekijänoikeudet 2010-luvulla. 31.1.2013 Pilot results seminar
Ennakkoinfo projektin tuloksista.
Erottajan Kasino,
Helsinki
Technology, Music Rights, Licensing
Finlandia Hall, Helsinki
Presentation/other
21.3.2013
Project final seminar
DateType
Author
Teosto and BMAT carry out a pioneering live music identification pilot in Finland
26.1.2013
Press release
Teosto
Teosto ja BMAT kehittävät ensimmäisenä maailmassa livekeikkojen automaattista musiikintunnistusta
26.1.2013
Press release
Teosto
Musiikintunnistuspalvelut - markkinakatsaus 31.1.2013
Presentation
Ano Sirppiniemi
Mikä biisi tää on? Musiikintunnistusta livekeikoilla. Livepilotin toteutus ja tulokset
31.1.2013
Presentation
Turo Pekari
Biisilistoja yleisöltä? Crowdsourcingin mahdollisuudet Suomessa. Kyselytutkimuksen tulokset
31.1.2013
Presentation
Turo Pekari
Mikä biisi tää on? Musiikintunnistuspilotti Provinssissa
8.2.2013 Presentation Ano Sirppiniemi, Turo Pekari
Emerging Technologies: Teosto’s Live Music 21.3.2013 Presentation
Recognition and DJ Club Monitoring Pilots
Turo Pekari,
Alex Loscos (BMAT)
State Of The Art In Music Information Retrieval 21.3.2013 Presentation
Research: Applications For Copyright
Teppo Ahonen
(University of Helsinki)
Majority Report: Visions On the Music Monitoring Landscape - Alex Loscos
Alex Loscos (BMAT)
21.3.2013
Presentation
Keynote: Issues In the Music Rights Value Chain
21.3.2013 Presentation
Karim Fanous
(Music Ally)
47
Finnish Composers’ Copyright Society Teosto
Lauttasaarentie 1, 00200 Helsinki, Finland
Tel. +358 9 681 011,
[email protected]
www.teosto.fi