What`s that song?
Transcription
What`s that song?
What’s that song? Automated music recognition technologies for live music and DJs Teosto research report 1/2013 Helsinki, 28.3.2013 This report details the results of Teosto’s research project Music copyright in the 2010s (January 2012 – March 2013) that focused on automated music recognition and broadcast monitoring technologies, and how they can be applied to the monitoring and reporting processes of a music performance rights society. The project was partly funded by the Finnish Ministry of Education and Culture. Teosto wants to thank everyone that participated in the project for their contributions: the artists, bands, songwriters, performers and their organisations (PMMP, Nightwish, Notkea Rotta, Darude, K-System, Orkidea, Riku Kokkonen), The Ministry of Education and Culture; Alex Loscos, Johannes Lyda and the team at BMAT; Karim Fanous and Leonardo Toyama at Music Ally; Teppo Ahonen, Anne Kosonen at Taloustutkimus, Erno Kulmala at YLE, Provinssirock, and everyone involved in the project at Teosto. Ano Sirppiniemi, Head of Research, Teosto Turo Pekari, Researcher, Teosto About Teosto Finnish Composers’ Copyright Society Teosto is a non-profit organisation founded in 1928 by composers and music publishers to administrate and protect their rights. Teosto represents approximately 27,000 Finnish and almost three million foreign composers, lyric writers, arrangers and music publishers. Teosto’s research activities include market research with clients and research partners, looking into new technologies and their applications, partnering with companies and research partners for joint research projects, and working with Teosto’s extensive data on music use in Finland. Contact: Ano Sirppiniemi, Head of Research ([email protected] , tel. +358 9 6810 1287, mobile: +358 50 325 6530) More information: www.teosto.fi/en Cover photo: Turo Pekari Contents 1. Executive summary 2. Project outline and structure 4 6 3. The global music recognition and broadcast monitoring market 9 2.1 Project description 2.2 Partners 2.3 Project timeline 2.4 Project deliverables 2.5 Communication of project results 2.6 Structure of the report 3.1 Automated music recognition 3.2 Business landscape 3.3 Business to business (B2B) 3.4 Business to consumer (B2C) 3.5 Companies developing technologies for their own use 6 6 6 7 7 8 9 10 11 12 13 4. Live music identification pilot in Provinssirock 14 5.24DJ/Club monitoring pilot 24 6. Consumer survey 32 7. State of the art in Music Information Retrieval: what could 34 34 34 35 37 38 40 41 4.1 BMAT system description 14 4.2 Pilot scenario 14 4.3 Identification process 14 4.4 Test case results 15 4.4.1 PMMP 15 4.4.2 Nightwish 18 4.4.3 Notkea Rotta21 4.5. Evaluation of the results 23 4.6 Possible use scenarios 23 5.1 Identification technology 24 5.2 Pilot setup 24 5.3 Test case results 26 5.3.1 K-System 26 5.3.2 Orkidea 27 5.3.3 Riku Kokkonen 28 5.3.4 Darude29 5.4 Evaluation of the results 31 5.5 Possible use scenarios 31 6.1 Background 6.2 Crowdsourcing potential 6.3 Consumer interest in an interactive online setlist service 6.4 Conclusions be applied for copyright management? (Teppo Ahonen) 7.1 Music Information Retrieval background 7.2 State of the research 7.3 Current trends 7.4 Audio music retrieval 7.5 Cover song identification 7.6 Conclusions 7.7 Bibliography 8. Summary of project results 43 9. Conclusions 45 10. References 32 32 32 33 43 45 47 1. Executive summary Teosto’s research project Music copyright in the 2010s (January 2012 – March 2013) aimed to shed light on the effects of new and emerging technologies on the administration of music copyrights. The main focus of the project was on music recognition and broadcast monitoring technologies: the global market for these technologies and how they could be used in new areas such as automated live music identification. During the project we also reviewed relevant academic research on the subject. The project was partly funded by the Finnish Ministry of Education and Culture. The project consisted of four parts: 1.A market study on the European and global music recognition and broadcast monitoring market 2. Background research on recent and relevant academic studies in the field of Music Information Retrieval (MIR), a field of research that frequently provides the technology innovations for the music recognition market These technologies are already successfully being used for monitoring music use in radio, broadcast TV, and a number of digital music and video services, by broadcast companies and service providers, music publishers, record companies, artists and collective management organizations. There are a number of international companies operating in the music monitoring field that provide monitoring services to businesses and copyright societies. Also, companies such as Google (YouTube) and Last.fm have proprietary content recognition systems in place for their own use that they don’t license to others. 4.A consumer survey on the potential of crowdsourcing live concert set list information from audience members of live shows In addition to the growing business-to-business market for automated music recognition solutions, there are also a few notable consumer applications based on the same technologies, such as Shazam and SoundHound, mobile music identification apps that both have over a 100 million users, and claim to be efficient in converting the “tagging” of songs by consumers to actual music download sales for their partnering online music stores and services. There’s a growing market for automated music recognition technologies, and a huge potential in employing them for managing music rights. In the coming years we will see the adoption of these technologies also in new areas and domains, such as live music and the club/DJ scene, in addition to the already established markets such as broadcast music monitoring. While there are a number of technological challenges to overcome, in order to improve the quality of the core The focus of this research was not on broadcast monitoring, but on two new application areas for automated music recognition systems: clubs and DJs, and live music. Both these domains are technologically more challenging than broadcast music monitoring, and remain relevant research problems for academic researchers as well. The computational complexity present in trying to match tracks played and manipulated by a DJ to an original recording, or trying to match a live 3. 4 identification technologies, and more importantly, in order to ensure a smooth exchange of information and metadata between technology providers and users, automated music recognition technologies have a strong potential for providing means through which to monitor the entirety of the music performed in any given territory, across radio and television broadcasts, live shows, as well as night clubs and other public environments. Two separate technology pilots: one on automated live music identification at a Finnish rock festival, and another on monitoring music use in a DJ/club setting version of a song to an original recorded version of that song, is well beyond that of identifying and reporting broadcast music. Compared to a recorded version of a song, a live version can be in another key or tempo, might have a totally different instrumentation or song structure, and correct identification can be further complicated by things like audio quality and audience noise. From academic research, we were aware that algorithms and solutions exist for matching live (or cover) versions of songs to a reference database including recorded versions of the songs. However, there are to our knowledge no commercial services available for live or cover version identification. Thus our aim in this project was to test some of the existing solutions in a real live event setting in order to evaluate their quality, to gather data about the process, and to prepare possible and potential use cases or scenarios for live music identification systems from the point of view of a music performance rights society. The two technology pilots were successful in providing proof that automated music recognition services can, in addition to broadcast music monitoring, already be used in a club environment for identifying and reporting music played by DJs, and in 1-2 years time possibly also in a live music setting for automated set list creation. Evaluating the pilot results, Teosto also identified three potential use cases for the piloted music identification technologies that were developed by the Spanish music technology company BMAT. way to automatically match the identification results to relevant metadata – in the case of performance rights societies like Teosto, to the relevant author and publisher information for each musical work. In addition, the tested live music identification and club/DJ monitoring systems also had certain technical limitations that need to be improved upon to ensure reliable results. We approached the technology pilots from the point of view of a music performance rights society, but the general results are applicable for other users, such as live event organizers, club owners, publishers, and artist organizations. These organizations could come up with other use scenarios for the automatically generated club or live show set list data than what was devised in this project, such as services based on real time track data from live shows and/or clubs. The consumer survey showed that among active Finnish music fans who frequently attend live shows, there is an interest towards using interactive services focused around gig set lists. However, the potential user base is very small, and from the point of view of Teosto, using crowdsourcing to collect set list information from audience members on a large scale is currently not a viable alternative to manual reporting and/or automated reporting technologies. However, using information gathered from fans and audience members to verify automatically generated set lists could be a possibility for improving the accuracy of automated set list creation in the future. The technology pilots, while successful from a proofof-concept point of view, did also point out a number of challenges and limitations that need to be solved before adopting the technologies for large scale use. The main challenge for all automated music recognition systems to work is twofold: in order to work in an efficient way, they need a representative reference audio database that is constantly updated, and there also needs to be a reliable 5 2. Project outline and structure 2.1 Project description This report details the results of Teosto’s research project Music copyright in the 2010s (January 2012 – March 2013) that focused on music recognition and broadcast monitoring technologies, and how they can be applied to the monitoring and reporting processes of a music performance rights society. The project was partly funded by the Finnish Ministry of Education and Culture. The project consisted of four parts: 1) A market study on the European and global music recognition and broadcast monitoring market 2) Background research on recent and relevant academic studies in the field of music information retrieval (MIR), a field of research that frequently provides the technology innovations for the music recognition market 3) Two separate technology pilots: one on automated live music identification at a Finnish rock festival, and another on monitoring music use in a DJ/club setting 4) A consumer survey on the potential of crowdsourcing setlist information from audience members of live shows 2.2 Partners The technology pilots were made possible by the participating artists, performers, songwriters, crew members and artist organizations that gave us their consent for participating in the research: PMMP, Nightwish, Notkea Rotta, Darude, K-System, Orkidea and Riku Kokkonen. Work on different parts of this project was carried out on Teosto’s behalf by three organizations specialized in technology and research: BMAT, Music Ally and Taloustutkimus. Researcher Teppo Ahonen provided the project with a review of relevant academic research on the subject area. BMAT (Spain) is a music technology company operating globally since 2006. The company specializes in providing music monitoring services, servicing more than 30 performing rights organizations and collecting societies. BMAT’s monitoring network, present in more than 50 countries, listens across more than 2,000 radios and channels every day. BMAT also services singing rating and music recommendation technologies to companies such as Samsung, Yamaha, Intel and Movistar. (http:// bmat.com) 2.3 Project timeline The project started in January 2012 and ended in March 2013. The main project activities are outlined in are outlined in Table 1. 6 Music Ally (UK) is a digital music business information and strategy company that has been providing publications, consulting, research, events, and training to the music and technology industries since 2001. (http:// www.musically.com) Taloustutkimus Oy (Finland) established in 1971, is a privately owned market research company, and currently the second largest market research company in Finland. Teppo Ahonen (M.Sc.) is currently finishing his Ph.D. in computer science at University of Helsinki. In his work, he focuses on measuring tonal similarity with information theory based metrics. We also want to thank Provinssirock (http://www. provinssi.fi) and YLE /Erno Kulmala for their cooperation and help during the project. Project task Start End Partners Project administration Jan 2012 Mar 2013 Teosto Technology pilot 1: Live music identification Feb 2012 Sep 2012 BMAT, Teosto, Provinssirock, YLE, PMMP, Nightwish, Notkea Rotta Technology pilot 2: Clubs/DJs Oct 2012 Mar 2013 BMAT, Teosto, Orkidea, Darude, K-System, Riku Kokkonen Consumer survey: Crowdsourcing of set lists Oct 2012 Jan 2013 Taloustutkimus Oy, Teosto Market study: Music recognition and broadcast monitoring market Sep 2012 Dec 2012 Music Ally, Teosto Background research: relevant academic research Aug 2012 Dec 2012 Teppo Ahonen (University of Helsinki) Table 1. Project timeline. 2.4 Project deliverables The deliverables of this project include (in addition to this project report) the results reports of the two technology pilots by BMAT, market research reports by Taloustutkimus and Music Ally, as well as two background articles on relevant academic research and evaluation of the live music pilot by Teppo Ahonen. The project deliverables are listed in the table below. Deliverable DateType Author Teosto – BMAT Vericast Covers Pilot Report 29.8.2012 Technical pilot results report BMAT Teosto – BMAT Vericast Clubs Pilot Report 21.3.2013 Technical pilot results report BMAT Analysis of the Automatic Live Music Detection Experiment 17.12.2012 Research article Teppo Ahonen State Of The Art In Music Information Retrieval: What Could Be Applied For Copyright Management 17.12.2012 Research article Teppo Ahonen Music recognition and broadcast monitoring market research 18.12.2012 Market research report Music Ally Teosto ry – biisilistapalvelututkimus 28.1.2013 Market research report Taloustutkimus Project final report 28.3.2013 Project final report Teosto Table 2. List of project deliverables. 2.5 Communication of project results The project results have been presented in two research seminars for Finnish music industry professionals in January and March 2013. The seminar presentation materials are available online on the project seminar website and Teosto’s Slideshare account (Teosto presentations). This final report will also be distributed in pdf form on project/final seminar website in April 2013. In addition to the above, Teosto has presented the findings of the live music identification pilot in a Finnish music industry event MARS (http://www.marsfestivaali.fi) in Seinäjoki in February 2013, and published a press release on the live music identification pilot in January 2013. The two project seminars were arranged in January and March. The first seminar focused on the technology pilot results and the second seminar was the final results seminar for the project. The two invited keynote speakers for the final seminar were Karim Fanous (Head of research, Music Ally) and Alex Loscos (CEO, BMAT). 7 Seminar DateType Location Musiikin tekijänoikeudet 2010-luvulla. 31.1.2013 Pilot results seminar Ennakkoinfo projektin tuloksista. Erottajan Kasino, Helsinki Technology, Music Rights, Licensing Finlandia Hall, Helsinki 21.3.2013 Project final seminar Table 3. List of project seminars. Seminar presentations and other material related to the project are listed in the table below. Presentation/other DateType Author Teosto and BMAT carry out a pioneering live music identification pilot in Finland 26.1.2013 Press release Teosto Teosto ja BMAT kehittävät ensimmäisenä maailmassa livekeikkojen automaattista musiikintunnistusta 26.1.2013 Press release Teosto Musiikintunnistuspalvelut - markkinakatsaus Mikä biisi tää on? Musiikintunnistusta livekeikoilla. 31.1.2013 Presentation Ano Sirppiniemi Livepilotin toteutus ja tulokset 31.1.2013 Presentation Turo Pekari Biisilistoja yleisöltä? Crowdsourcingin mahdollisuudet Suomessa. Kyselytutkimuksen tulokset 31.1.2013 Presentation Turo Pekari Mikä biisi tää on? Musiikintunnistuspilotti Provinssissa Emerging Technologies: Teosto’s Live Music 8.2.2013 Presentation Ano Sirppiniemi, Turo Pekari Recognition and DJ Club Monitoring Pilots 21.3.2013 Presentation Turo Pekari, Alex Loscos (BMAT) State Of The Art In Music Information Retrieval 21.3.2013 Presentation Research: Applications For Copyright Teppo Ahonen (University of Helsinki) Majority Report: Visions On the Music Monitoring Landscape - Alex Loscos 21.3.2013 Presentation Alex Loscos (BMAT) Keynote: Issues In the Music Rights Value Chain 21.3.2013 Presentation Karim Fanous (Music Ally) Table 4. List of project presentations and press releases. 2.6 Structure of the report This report details the findings of the research project. Chapter 3 focuses on the automated music recognition and broadcast monitoring market, looking at the market structure and key companies operating in the market. Chapters 4 and 5 detail the results of the two technology pilots on live music identification and club/ DJ monitoring, as well as the identified use scenarios for both piloted technologies. Chapter 6 lists the key findings of the consumer survey. Chapter 7 is a background article on academic MIR research, State Of The Art In Music Information Retrieval: What Could Be Applied For Copyright Management by Teppo Ahonen. 8 Chapter 8 summarizes the key project results, and conclusions about the results are presented in Chapter 9. Chapter 10 is a list of references. This report summarizes and extends on the results material listed above in Table 2 (Project deliverables). Sources and references for each chapter are listed at the end of each chapter. 3. The global music recognition and broadcast monitoring market Automated music recognition services are already widely used in different fields of the music industry, including author societies and performance rights organizations, for tasks such as monitoring broadcast content in order to carry out music reporting. As the traditional process of reporting broadcast music to the performance rights organizations –with producers and broadcasters providing cue sheets to the organizations - is known to have difficulties with accuracy, speed, and sometimes also the amount of manual work required, automated music identification technologies could be a solution for making royalty payments more efficient. At the same time they could give societies an advantage in the increasing competition between performing rights societies globally. The music identification business is expected to grow within the next couple of years, when more and more performance rights organisations are starting to adapt new technologies for gathering music usage data. The growing number of competition in this field of business is making the use of new technology more cost effective and attractive. Interestingly, despite the fact that most music recognition technologies operate in similar ways (based on so called acoustic fingerprinting), most of the companies that offer music recognition services have set up their operations in a way that would differentiate them from the competition. To identify the content in music recognition services, metadata must be attached to the digital content. In consumer services like Shazam, metadata is focused on the recording (e.g. track title, artist, album title and cover image). Companies providing services to performance rights organisations must be able to set up a more complex metadata scheme, including or providing ways to include information on the composers and publishers of the musical works. This requires partnerships and close collaboration between service providers and the organisations. The London based research company Music Ally carried out a music recognition technology market research for Teosto in December 2012, and the following market overview is based on the findings of their report. 3.1 Automated music recognition The advent of digital media, machine readable databases, high-bandwidth computer processing power and internet communications is widely considered to have an enormous potential for improving the reporting of music usage in the broadcasting sector, both in terms of accuracy and speed. One of the earlier digital technologies developed for the purpose of music recognition is watermarking. The system involves embedding an identification tag in the inaudible spectrum of digital music files, for them to be later identified through specialised software monitoring those specific frequencies. This system did not prosper, as it failed to identify re-encoded files, and proved too complicated to implement in an ecosystem with a wide range of codecs and compression settings in place. Nevertheless, research in alternative music recognition methods continued, and now several companies provide services in this space. While their range of products, clients and partnerships vary greatly, the core mechanics of their technologies are actually very similar, revolving around the concept of digital acoustic fingerprint matching. The following are the main elements involved in the process: 1) Acoustic fingerprinting Via means of an algorithm (or a sum of algorithms), a computer program generates a condensed digital summary of an audio signal. In order for the system to work, the program must be capable of distinguishing between any two different pieces of content, generating a unique digital picture for each and every one of them – this being the reason why they are called ‘fingerprints’. A fingerprint consists of the whole duration of the audio signal, in order for it to be possible to identify a piece of content at a later stage from any given sample. 2) Data collection A database of acoustic fingerprints is stored by a company or body for the purpose of comparing existing and new fingerprints. Each fingerprint has a unique code assigned and linked to metadata describing the content. In general terms, the bigger the database is, the wider the repertoire the system will be capable of identifying. More often than not, in the case of music recognition, acoustic fingerprint databases are stored in an online server. 3) Recognition A computer program takes a fingerprint of an audio signal’s sample, which is then compared against a fingerprint database. Upon finding a match, the corresponding metadata is provided in order to identify the content. While most music identification services are based on 9 these principles, there are some differences in their capabilities, the most notable being the capacity for background recognition. This involves being able to identify a piece of music within a signal overlaid with redundant audio. Examples of this include a radio DJ speaking over the song, or bar noise mixed with music coming out of a speaker. One of the most intricate complexities in the development of music recognition programs is the fact that the system must be robust enough to accurately pinpoint any piece of music amidst tens of millions of songs, yet also flexible enough to associate the same fingerprint to very different samples of the same music work. This is so because most audio compression formats will generate very different digital files of the very same recording, and a fingerprint algorithm that is too strict in its identification would not associate them with the same music work. Despite these complexities, several fingerprint systems have been successfully developed, and the general consensus is that the overall field is mature enough, with most technologies available today providing rather high degrees of accuracy. 4) Metadata In general terms, and for the purpose of this research, metadata can be defined as a small piece of data attached to digital content in order to describe said content. The type of metadata attached to the identified song varies according to the end-user of the music identification technology. In the case of services such as Shazam, the metadata focuses on the recording (i.e. track title, artist, album title and cover image). This can be somewhat more complicated when the identification is provided for the music publishing sector, as performance rights societies require information regarding the songs’ composers and publishers. In turn, this means that for music identification companies to provide their services to performance rights societies they first need to collaborate in setting up the set of rules required of the medatada. 5) Broadcast monitoring The basic music recognition system described can see a fifth component added in the space of broadcast monitoring, consisting of the means to input transmissions into the fingerprinting system. From a purely computational point of view, this is a much more elemental mechanism, consisting at its most basic of an aerial receiver in the case of over-the-air broadcasts, and digital conversion software in the case of webcasts, both of which feed the transmissions into the fingerprint algorithm. Depending on the monitored market, the logistics of the over-the-air system can vary from having a radio receiver plugged to a computer system running the identification software, to a whole network of standalone receivers spread throughout a territory and feeding a centralised server which, in turn, analyses all of the collected broadcasts. 3.2 Business landscape Interestingly, in the music recognition and broadcast monitoring business, most companies operating in this area have carved their respective niches separate from each other, despite the fact that their core technologies operate in very similar ways. In order to better understand the market’s segmentation, the market can be separated into business-to-consumer B2B (B2C) and business-to-business (B2B) segments. Apart from these, also other, more prominent corporations operate to a certain extent in the music identification space. Acoustic fingerprinting is not at the core of their business and/or they do not license their music recognition technology to third parties. This is the case, for instance of Last.fm and Google. B2C Figure 1. Segments of the music recognition and broadcast monitoring market. Each of these segments is briefly outlined below. 10 Companies developing technologies for their own use 3.3 Business to business (B2B) Focus here will be on business-to-business services that license their technology to third parties, since these services are the logical partners for performance rights organisations. Music Ally identified 13 prominent companies in this category, that also offer their services in Finland: The business-to-business segment can be further divided into four different fields. Each of the B2B companies can operate in one or several of them at the same time. These categories will be shortly described here, with examples of relevant companies /service providers. 1) Music identification and broadcast monitoring for authors and publishers Service providers in this space are paid by authors, composers and publishers to quantify the use of their works by broadcasters. Clients would supply their contents to the service provider in order to ensure that its fingerprint database contains all the works that need to be identified. Examples: TuneSat, Kollector 2) Music monitoring and broadcast monitoring for PROs Service providers in this space are paid by performing rights organisations to quantify the use of the works of their members by broadcasters. Typically, the performing rights organisations would supply their contents to the service provider in order to ensure that the latter’s fingerprint database contains all the works that need to be identified. The service providers supply reports to the performing rights organisation in the agreed format (for example composition code, author code, rights owner) and frequency (for example daily, monthly, annually). Examples: BMAT, mufin, Nielsen and Soundmouse 3) Identification and cue sheeting for broadcasters Service providers in this space are paid by broadcasters to identify the music used on their shows, and generate the corresponding cue sheets for the broadcaster to deliver to performing rights organisations, as per the latter’s guidelines. Service providers need to have robust fingerprint databases, which is why they often approach rights holders in order to secure the data directly from these owners. Examples: Soundmouse, TuneSat 4) Providing identification technologies / technology platforms Companies in this space develop acoustic fingerprinting algorithms and/or fingerprint databases, in order to license them to third parties whom, in turn, use them for Audible Magic (USA) mufin (Germany) BMAT (Spain) Nielsen (USA) Civolution (Netherlands) Rovi (USA) DJ Monitor (Netherlands) Soundaware (Netherlands) The Echo Nest (USA) Soundmouse (UK) Gracenote (USA) Tunesat (USA) Kollector (Belgium) Table 5. A list of companies that operate in the music recognition and broadcast monitoring market, and offer their services also for third parties, in Europe. developing their own B2B or B2C services. Examples: The Echo Nest, Gracenote For most practical purposes, performing rights organisations use music identification companies in a similar way that authors and publishers do (1 & 2). Focusing on service providers for these two fields, probably the most widely known company that offers services for both is the US company Nielsen (founded in 1923), one of the worlds’ biggest research corporations with 5,5 billion USD in annual revenues. Nielsen currently offers TV, radio and internet monitoring services with its Nielsen Music and Nielsen BDS brands. Apart from publishers, authors and PROs, Its key clients include also radio and TV networks and labels. Nielsen outsources its fingerprinting technology and the current technology provider has not been publicly announced. Another significant service provider for authors, publishers and PROs is the Barcelona-based BMAT. The company was established under the umbrella of the Music Technology Group of the Universitat Pompeu Fabra in 2006, and its core product is Vericast airplay monitoring and acoustic fingerprinting service. BMAT monitors over 2000 radio and TV channels worldwide, and offers an end-to-end service to its clients with all technology developed in-house. Other products include the music personalization service Ella. BMAT was Teosto’s research partner for this research, piloting an automated cover version recognition technology at a live rock festival in summer 2012. However, this technology is not in production yet. Broadcasting companies are buying identification services for generating cue sheets for PROs (3). For example, London-based Soundmouse offers this kind of service to BBC from monitoring to delivering cue sheets to PROs for BBC. Soundmouse is a privately owned company and uses in-house technology. 11 Some music identification companies also outsource technology from partners. A number of companies are specialized in licensing technologies to third parties, who then develop their own B2B or B2C services. The US company Rovi is one of the leading technology providers in this field with 691 million USD in revenues (2011). Its clients include manufacturers, service providers and app developers. For music identification, Rovi has both a music metadata service with 3 million albums and 30 million tracks, as well as a media recognition service. Many consumer services, including SoundHound, use Rovi’s metadata to provide cover images and recording details to the end users. It’s also notable that Rovi also outsources some of its fingerprinting technology - from another important technology provider, US company Audible Magic. Another example of a metadata /music recognition analytics platform is The Echo Nest, established in 2005. Echo Nest is one of the big metadata technology providers for app developers, with over 350 music apps built on their platform. The Echo Nest database is an open source API that provides tools for playlisting, taste profiling etc., and they have a 34 million songs and 1,12 trillion data points in it. They provide the software for free and make money by providing support for it. Because of its open nature, the Echo Nest is used more actively by third party developers than any other platform in the field, but their strengths could lie more in their developer-friendly approach than their fingerprinting technology. 3.4 Business to consumer (B2C) The widespread adoption of smartphones with 3G connectivity and the growth of mobile application ecosystems –such as Apple’s App Store and Google’s Play Store- have had a huge impact across the hardware, software and content industries since the launch of Apple’s iPhone in 2007. This has in turn also driven innovations in the field of music recognition. Increasing mobile computational power and online connectivity means that now a smartphone user can take a sample (or fingerprint) of an acoustic audio signal (such as music being played at a bar or a store), upload it to a fingerprint database server for matching, and instantly receive metadata with details of the identified content. The popularised term describing this process is called ‘tagging’. The best known companies operating in the B2C music recognition area are Shazam and SoundHound. They both offer mobile applications available on the iOS, Android, Blackberry and Windows Phone platforms. The typical business models of consumer music recognition services include mobile software licensing, service subscription fees, a commission on the sale of linked music services, as well as advertising. Shazam is originally a US based company, formed in 1999. Shazam was the first B2C company providing music recognition services to mobile phone users, and has been downloaded to 250 million mobile devices since then. The American performing rights society BMI bought Shazams 12 technology in 2005 to set up its own music identification service (Landmark Digital Services). Since then, Shazam has re-acquired the technology back from BMI, but the society remains a shareholder in the company. The company has headquarters in London and is privately held. Shazams products are two different mobile apps with unlimited tagging: free with ads and paid version, depending on the ecosystem. Shazam apps offer clickto-buy and streaming links, lyrics display and 30 second streamed previews of the identified songs. Becoming the first mass-market app in this field has been a benefit for Shazam, although its competitor SoundHound’s technology is considered to be more innovative. Since late 2010, Shazam has been expanding beyond its core music business, and into developing a range of second-screen implementations for its audio recognition technology. The move has seen the company seeking to boost advertisers, brands and video content owners’ engagement with audiences, delivering promotional information, coupons and other materials to viewers using the Shazam mobile app. The company has repeatedly expressed its interest in cementing its position in these new areas, and currently describes itself as “the best way to discover, explore and share more music TV shows and ads you love”. However, Shazam’s music business is still its largest source of revenues. While Shazam has actively sought and publicised content and advertising partnerships, the company has also aimed to keep its technology development in house. Shazam outsources metadata from Rovi. Shazam benefits from its widespread adoption, resulting more from being the first mass-market smartphone product of its kind, than from any purely technological superiority. Another factor which has driven Shazam’s leading position in the B2C market has been the company’s expansion towards the second-screen space, maintaining its own brand, rather than merely licensing its technology. This has not only helped its association with music through partnerships with the likes of American Idol and the Grammy Awards, but also furthered its novelty profile by providing a new interactive experience in general TV and advertising viewing. The other popular B2C company, California-based SoundHound, started as the music recognition service Midomi in 2007. Midomi was later rebranded to SoundHound in 2009. SoundHound claims to have more than 100 million app downloads. SoundHound offers three different applications: free with ads, ad-free (free or paid, depending on the ecosystem) and a free version called Hound, that enables a voice search for songs. Like Shazam, the apps offer click-to-buy and streaming links to the identified songs. SoundHound’s technology differs from Shazam’s in its capability to identify songs not only from recordings, but also from humming or singing. This feature and faster identification process are the reasons its technology is believed to be superior to Shazam’s, although Shazam is more widely used. Like Shazam, SoundHound outsources its metadata from technology provider Rovi. 3.5 Companies developing technologies for their own use Some well known companies are big players in music identification but not as their core business. Google uses fingerprinting technology for monitoring UGC content in YouTube and has its own identification widget in Google Play music store. Streaming radio service Last.fm added identification feature to its Audioscrobbler technology in 2007. Audioscrobbler monitors music consumption in media players, online stores and streaming services. Unlike companies in other segments, Google and Last.fm do not licence their technology to others and they do not offer services for PROs or broadcasters. This chapter is based on the report “Music recognition and broadcast monitoring market research” by Music Ally (2012). 13 4. Live music identification pilot in Provinssirock First of the two technology pilots was carried out in June 2012 in the Provinssirock (http://www.provinssi. fi) rock festival in Seinäjoki, Finland. The purpose of this pilot study was to test and evaluate BMAT’s live music identification technology Vericast Covers in a live setting, with three Finnish bands from three different genres. The live shows were recorded in two versions: one from the mixing desk and one from the audience, in order to be able to determine whether audio quality would have an effect on the identification results. The live recordings were compared to a reference audio set that consisted of the whole recorded catalogue of each band. 4.1 BMAT system description An automatic cover song identification application is a system that takes a piece of music as input and distinguishes if the piece is a cover version of one of the songs included in the reference database. Here, the term “cover” has a broad meaning: cover versions could include also live versions, remixes, variations, or other different renditions of a composition. Although the task is often easy for a human listener, it is very difficult for a computer, as different arrangements, tempos, rhythmics, languages of the lyrics, and other features of the music might vary significantly. Thus, the challenge is to extract meaningful features from the audio data and compare them in a robust way. The identification application used by BMAT is based on the best performing method developed for cover song identification to date. In the international MIREX Evaluation for applications in music information retrieval, the method has so far the highest-ever performance in cover song identification (at MIREX 2009). BMAT’s Vericast Covers adds several new features to the original method to make it more suitable for setlist identification. 4.2 Pilot scenario Teosto provided BMAT with a total of 231 refererence tracks (in mp3 format) from the three bands (PMMP, Nightwish and Notkea Rotta), and a total of six audio recordings from the three live performances of the bands in the Provinssirock festival (recorded in June 15 – 17, 2012). The reference audio set included the whole recorded catalogue of each band. Each live performance was recorded twice, one version recorded directly from the mixing desk (in wav format), and another from the audience near/in front of the mixing desk (in wav format), using a basic handheld wave/mp3 recorder (Roland R-05). Based on the material, the pilot was separated into 3 test categories (for each band), with two test cases (mixing desk and audience recording) for each band. 4.3 Identification process On the BMAT server, the recordings were analysed using the available fingerprints of the reference song collection for each band. In general, the cover identification is processed on a song vs. song basis and returns a distance between the two input songs, a grade of similarity. In a first step, the algorithm extracts a Harmonic Pitch Class Profile (HPCP) for each song. The Harmonic Pitch Class Profile is a technology that overlaps or folds the audible spectrum into a one octave space and represents the relative intensity for each of the 12 semitones of the Western music chromatic scale. Next, the similarity of the HPCPs is computed and returned as a similarity distance (a value between 0 and 100). 14 In this test case, the live performance recordings were segmented into audio stream segments that lasted 30 seconds each, and the segments were compared to the reference audio database. The result of the analysis was a similarity matrix containing the distances between each audio stream segment and each reference track. In a final step, a report about all matches for the live performance audio stream is extracted from the similarity matrix. There might exist several candidates for one 30 second segment, in which case the matches are marked as conflicting. The conflicting matches are resolved by looking at consecutive audio segments and the similarity distances of the matches and applying a threshold for an acceptable distance. 4.4 Test case results The audio setlist test was conducted by using live performance material of three Finnish groups: PMMP, Nightwish, and Notkea Rotta. The groups represent the genres of pop, heavy metal, and hip hop, respectively. The object of the experiment was to successfully determine the setlists using only audio recordings of the performances as queries and the back catalogues of the groups as targets. Teosto provided BMAT both the query and the target audio data. BMAT reported the results in a report describing the (possibly conflicting groups of) identified tracks with their durations and timestamps, along with the minimum, average, and maximum distances of the composite segment matches. The matrices of distances between each performance segment query and reference data target were provided as electronical appendices. The report also discusses the time consumption of performing the test cases. In all cases, the machine-based setlist approximation performed far more efficiently than browsing through the performance recordings manually, suggesting that the system can be taken into production without too much concerns of the sufficiency of computational resources. 4.4.1 PMMP The reference audio set for PMMP included 68 tracks, for which BMAT extracted digital fingerprints. The bands’ live performance at Provinssirock main stage (on Saturday, June 16, 2012, lasting 90 minutes), was recorded from the mixing desk by PMMP’s FOH sound engineer, and the audience recording was recorded using a handheld digital wave/mp3 recorder (Roland R-05) from the audience, stage center, from directly in front of the mixing desk. Both test cases for the PMMP live performance (mixing desk and audience recording) resulted in very good identification results. All 19 tracks included in the PMMP set list were correctly identified from both recorded versions, resulting in a perfect 100% accuracy. Photo: Turo Pekari 15 Datetime 2012-06-16 23:31:00 Duration Track Artist Conflicting 210 Suojelusenkeli PMMP C 2012-06-16 23:32:00 180KorkeasaariPMMPC 2012-06-16 23:32:00 180 Etkö ymmärrä PMMP C 2012-06-16 23:35:00 270HeliumpalloPMMPC 2012-06-16 23:36:30 90 Kesä -95 PMMP C 2012-06-16 23:40:30 180MatkalauluPMMP 2012-06-16 23:43:30 390 Pikku Lauri PMMP C 2012-06-16 23:44:00 270Merimiehen vaimoPMMPC 2012-06-16 23:50:00 150MatojaPMMP 2012-06-16 23:53:00 270 Jeesus Ei Tule Oletko Valmis PMMP C 2012-06-16 23:55:00 120 Tässä elämä on PMMP C 2012-06-16 23:56:00 150 Etkö ymmärrä PMMP C 2012-06-16 23:58:00 270RakkaalleniPMMPC 2012-06-17 00:02:30 240KesäkaveritPMMP 2012-06-17 00:06:30 210Päät soittaaPMMP 2012-06-17 00:11:00 240LautturiPMMPC 2012-06-17 00:11:30 210 Kesä -95 PMMP C 2012-06-17 00:15:00 240PariterapiaaPMMP 2012-06-17 00:19:00 270TytötPMMPC 2012-06-17 00:20:00 180 Etkö ymmärrä PMMP C 2012-06-17 00:24:00 240Joku rajaPMMP 2012-06-17 00:29:30 210Viimeinen valitusvirsiPMMPC 2012-06-17 00:30:00 150 Suojelusenkeli PMMP C 2012-06-17 00:34:00 270JoutsenetPMMPC 2012-06-17 00:34:00 180 Pikku Lauri PMMP C 2012-06-17 00:36:00 90 Suojelusenkeli PMMP C 2012-06-17 00:39:30 210ToivoPMMPC 2012-06-17 00:42:30 300 Suojelusenkeli PMMP C 2012-06-17 00:43:30 240Tässä elämä onPMMPC 2012-06-17 00:43:30 60 Pikkuveli PMMP C 2012-06-17 00:44:00 60 Mummola PMMP C 2012-06-17 00:48:30 180Koko ShowPMMPC 2012-06-17 00:48:30 150 Pikku Lauri PMMP 2012-06-17 00:52:00 210KohkausrockPMMP Table 6. Identification report for PMMP mixing desk recording from Provinssirock, 16.6.2012, including conflicting matches (resolved matches in bold). 16 C Figure 2. PMMP similarity matrix distribution for the mixing desk recording. The blue lines represent matches. Source: BMAT PMMP setlist, Provinssirock, Seinäjoki (16.6.2012) Identified correctly (Y/N) 1. Korkeasaari Y 2. Heliumpallo Y 3. Matkalaulu Y 4. Merimiehen vaimo Y 5. Matoja Y 6. Jeesus ei tule oletko valmis Y 7. Rakkaalleni Y 8. Kesäkaverit Y 9. Päät soittaa Y 10. Lautturi Y 11. Pariterapiaa Y 12. Tytöt Y 13. Joku raja Y 14. Viimeinen valitusvirsi Y 15. Joutsenet Y 16. Toivo Y 17. Tässä elämä on Y 18. Koko show Y 19. Kohkausrock Y Table 7. Final setlist of 19 songs from the PMMP live performance at Provinssi (16.6.2012) However, it should be noted that out of the identified 19 pieces 13 were included in conflicting groups in the desk recording and 12 similarly in the audience recording. Using the minimum and average distance, all conflicting groups could be resolved correctly. Based on the results, we can assume that the PMMP live versions were somewhat closer to the original performances than those of the other groups in the experiment. It is also likely that the parameters and threshold of the system were trained with music of the pop genre, thus enabling a chance of slight parameter overfitting for the PMMP case. Nevertheless, the perfect setlist identification is a very good result for the tested cover song identification system. 17 4.4.2 Nightwish The reference audio set for Nightwish included 101 tracks, for which BMAT extracted digital fingerprints. The bands’ live performance at Provinssirock main stage (on Friday, June 15, 2012, lasting 89 minutes), was recorded from the mixing desk by Nightwish’s FOH sound engineer, and the audience recording was recorded using a handheld digital wave/mp3 recorder (Roland R-05) from the audience, stage center, from directly in front of the mixing desk. For the Nightwish live performance, a total of 13 songs out of the 16 performed were correctly identified from the mixing desk recording, resulting in a accuracy of 81%. Out of the three unidentified songs, one (“Finlandia” by Sibelius) was not included in the reference audio set and thus was impossible to identify in the test scenario. The other two songs, “Come Cover Me” and “Over The Hills And Far Away” were included in the reference audio set, but were not correctly identified by the system. Figure 3. Nightwish similarity matrix distribution for the mixing desk recording. The blue lines represent matches. Source: BMAT In the Nightwish case, also several false positive songs were repeatedly detected as match candidates, including “Instrumental (Crimson Tide /Deep Blue Sea)”, “Lappi (Lapland)”, “Taikatalvi” and “Sleepwalker”. “Come Cover Me” was identified as a candidate match, but the matching failed because they did not exceed the minimum threshold. Both unidentified songs were recorded by another lead singer in 2006, and a possible reason for the identification to fail was the greater distance between the live version and the recorded version because of this. 18 From the Nightwish audience recording, 9 out of 16 songs were correctly identified. Analysis showed that a further five songs were correctly identified as match candidates, but failed because of the set minimum distance threshold. Datetime Duration Track 2012-06-15 23:30:30 60 Taikatalvi Nightwish C 2012-06-15 23:31:00 210 Lappi (Lapland) II: Witchdrums [*] - Nightwish C 2012-06-15 23:31:30 120 Sleepwalker Nightwish C 2012-06-15 23:35:00 150 Storytime Nightwish C 2012-06-15 23:35:30 120 Lappi (Lapland) III: This Moment Is Eternity [*] Nightwish C 2012-06-15 23:35:30 120 Etiäinen Nightwish C 2012-06-15 23:38:30 90 Storytime Nightwish C 2012-06-15 23:38:30 60 Lappi (Lapland) IV: Etiäinen [*] Nightwish C 2012-06-15 23:38:30 60 Etiäinen Nightwish C 2012-06-15 23:38:30 90 Lappi (Lapland) III: This Moment Is Eternity [*] Nightwish C Artist Conflicting Datetime Duration Track Artist 2012-06-15 23:40:00 210 Wish I Had An Angel Nightwish C 2012-06-15 23:40:30 60 Instrumental (Crimson Tide / Deep Blue Sea) Nightwish C 2012-06-15 23:41:00 120 Lappi (Lapland) III: This Moment Is Eternity [*] Nightwish C 2012-06-15 23:41:30 120 Sleepwalker Nightwish C 2012-06-15 23:42:00 90 Bare Grace Misery Nightwish C 2012-06-15 23:43:00 90 Taikatalvi Nightwish C 2012-06-15 23:43:30 270 Amaranth Nightwish C 2012-06-15 23:44:30 210 Instrumental (Crimson Tide / Deep Blue Sea) Nightwish C 2012-06-15 23:49:00 120 Scaretale Nightwish 2012-06-15 23:52:00 60 Scaretale Nightwish 2012-06-15 23:54:30 120 Instrumental (Crimson Tide / Deep Blue Sea) Nightwish 2012-06-15 23:56:30 210 Dead To The World Nightwish C 2012-06-15 23:58:00 180 Instrumental (Crimson Tide / Deep Blue Sea) Nightwish C 2012-06-16 00:00:30 300 I Want My Tears Back Nightwish C 2012-06-16 00:04:30 60 Sleepwalker Nightwish C 2012-06-16 00:05:00 510 Instrumental (Crimson Tide / Deep Blue Sea) Nightwish C 2012-06-16 00:07:00 150 Sleepwalker Nightwish C 2012-06-16 00:07:30 90 Etiäinen Nightwish C 2012-06-16 00:08:00 240 Lappi (Lapland) III: This Moment Is Eternity [*] Nightwish C 2012-06-16 00:10:30 60 Sleepwalker Nightwish C 2012-06-16 00:11:30 270 Last of the Wilds Nightwish C 2012-06-16 00:18:00 180 Planet Hell Nightwish C 2012-06-16 00:19:00 210 Instrumental (Crimson Tide / Deep Blue Sea) Nightwish C 2012-06-16 00:22:00 270 Ghost River Nightwish C 2012-06-16 00:28:00 150 Nemo Nightwish 2012-06-16 00:33:00 480 Lappi (Lapland) III: This Moment Is Eternity [*] Nightwish C 2012-06-16 00:34:30 60 Lappi (Lapland) IV: Etiäinen [*] Nightwish C 2012-06-16 00:39:30 270 Song Of Myself Nightwish C 2012-06-16 00:47:00 360 Lappi (Lapland) III: This Moment Is Eternity [*] Nightwish C 2012-06-16 00:47:00 240 Last Ride Of The Day Nightwish C 2012-06-16 00:48:00 300 Lappi (Lapland) IV: Etiäinen [*] Nightwish C 2012-06-16 00:49:00 240 Taikatalvi Nightwish C 2012-06-16 00:52:00 60 Forever Yours Nightwish C 2012-06-16 00:54:00 210 Imaginaerum Nightwish C 2012-06-16 00:54:30 60 Arabesque Nightwish C Conflicting Table 8. Identification report for Nightwish mixing desk recording from Provinssirock, 15.6.2012, including conflicting matches (resolved matches in bold). 19 Nightwish setlist, Provinssirock, Seinäjoki (15.6.2012) Identified correctly (Y/N) 1. Finlandia - 2. Storytime Y 3. Wish I Had an Angel Y 4. Amaranth Y 5. Scaretale Y 6. Dead To the World Y 7. I Want My Tears Back Y 8. Come Cover Me N 9. Last Of The Wilds Y 10. Planet Hell Y 11. Ghost River Y 12. Nemo Y 13. Over the Hills and Far Away N 14. Song of Myself Y 15. Last Ride of the Day Y 16. Imagenaerum Y Table 9. Final setlist of 16 songs from the Nightwish live performance at Provinssi (16.6.2012) Photo: Turo Pekari 20 4.4.3. Notkea Rotta The reference audio set for Notkea Rotta included 62 tracks, for which BMAT extracted digital fingerprints. The bands’ live performance at Provinssirock YleX stage (on Sunday, June 17, 2012, lasting 60 minutes), was recorded by YLE for a TV broadcast, and the audience recording was recorded using a handheld digital wave/mp3 recorder (Roland R-05) from the audience, stage center, from directly in front of the mixing desk. The Notkea Rotta live performance analysis resulted in only three composite matches for the mixing desk recording and four for the audience recording, all of which were found to be false positives. Thus the identification rate for Notkea Rotta was 0%. Figure 4. Notkea Rotta similarity matrix distribution for the mixing desk recording. The blue lines represent matches. Source: BMAT Two main reasons were found for the failure of the system to correctly identify the Notkea Rotta live performance. The first observation is based on the match of the tested identification method with the genre in question, in broad terms, hip hop. The songs performed contain much less harmonic variation than songs performed by the pop and metal bands in the other test cases, making it more difficult to differentiate songs based on their harmonic structure. Second, the differences in tempo, instrumentation and structure compared to the original/reference recordings were considerably greater in the Notkea Rotta case. Whereas in the Nightwish case the test results could be improved by tuning the system parameters (e.g. the minimum distance threshold), in the Notkea Rotta case this would not make the results any better. Photo: Turo Pekari 21 4.5 Evaluation of the results The algorithm tested in the live music identification pilot detected 100% of the performed songs for both the mixing desk and the audience recordings for the PMMP live performance. It also showed good results for the desk recording for the Nightwish performance (81%). On the other hand, the Notkea Rotta live performance resulted in few and only false positive matches, resulting in 0% accuracy in identification. for the cover identification technology, but another reason is related to the nature of the cover version identification task itself: most Western pop music tracks are in relative terms so similar to each other, in terms of the harmonic, structural and melodic choices, that a comparison of one song to millions of other songs will always produce a large number of positive matches – songs that are in this sense related to each other. Looking at the results, there was no marked difference between the mixing desk recordings and the audience recordings. The audience did however show lower distances and a lower time precision than the desk recordings. This has two reasons: one is that in the audience recording, the actual live performance is only one part of the audio signal, and the other part is the audience itself. Another reason is the effects like echos for drums and vocals that depend on the position of the recording equipment and that occur between the PA loudspeaker and the recording microphone. However, this limitation can be overcome by limiting the size of the reference dataset as was done in this pilot research. For practical use scenarios, this would mean that the system would need the name of the band or other information before the analysis that can be used to narrow down the search. The setlist detection was completed in a reasonable amount of computational time consumption, easily surpassing the effiency of manual labor needed for the task. The reference audio set in this test was limited to the recorded catalogue of each artist. The tested cover identification technology cannot currently be scaled to work in a similar manner as broadcast monitoring systems, where each track can be compared to a reference database of millions or tens of millions of tracks. One reason for this is the computational complexity needed 22 The uneven performance of the system in the three cases makes it difficult to provide a definitive opinion of the system, but several positive notions and ideas for future work can be presented from the experiment. Based on the results obtained from the experiment, the system could be put into operation, but only with music from certain genres. Also, manual verification for the results will probably be needed, as the unsolvable conflicting groups seem to be inevitable. While a completely reliable automatic setlist detection system is somewhat an impossible task to build, a system that needs manual assistance only in the conflicting identifications and other possible obscurities could well be constructed from the tested system, based on the results and the report. 4.6 Possible use scenarios From the live music identification pilot research, Teosto identified three possible use scenarios for the technology from the point of view of a music performance rights society: large music festivals, active live music clubs, and artist or tour specific uses. All the use scenarios take as a starting point that the technological limitations of the tested system and other process related factors (such as automated matching the identification results to Teosto’s database of authors, publishers and works, which was not tested in this pilot research) could be solved, and the total costs of the system would be comparable to the costs of the live music reporting scheme currently in place. Large music festivals could be the most cost-effective use scenario, as the ratio of the number of identified performances or number of identified songs to the performance royalties collected and distributed would be beneficial. The automated identification system would replace the manual sending of setlists and/or the manual inputting of setlists by artists to Teosto’s systems. In music festivals, the data from the identification system could possibly also be used as input for other services for festival visitors and other consumers. Active live music venues, such as large clubs, ski resorts, and cruise ships could be another use scenario that could be cost-effective because of the large number of live shows per year. The live music identification system could be integrated into the venues music system, and live music identification could be combined with automated identification and reporting of mechanical music (background music, dance music, DJs). A third use scenario could be artist and/or tour specific use of the live music identification service, where the artist would use the service to replace manual inputting or sending in of set lists to Teosto. In all the use scenarios listed above, the quality of the identification results could be improved by introducing a verification step, where the automatically generated set list would be verified by the artist before acceptance. This chapter is based on the articles “Analysis of the Automatic Live Music Detection Experiment” by Teppo Ahonen (2012) and “Teosto – BMAT Vericast Covers Pilot Report” by BMAT (2012). Photo: Turo Pekari 23 5. DJ/Club monitoring pilot Teosto and BMAT tested the Vericast audio identification technology on DJ club performances for automatic setlist creation. The 3 month pilot was started in December 2012 and four Finnish DJs took part in the pilot: Darude, K-System, Orkidea and Riku Kokkonen. The DJ sets were recorded in late 2012. 5.1 Identification technology For the audio identification BMAT applied their Vericast audio fingerprint technology, which is currently used by over 30 performance rights organizations for worldwide music broadcast monitoring, serving as a tool for royalty distribution and market statistic calculation. The technology scales to multi-million reference song databases as well as real-time audio recording input. The performance of the identification of a recording against a song database is influenced by three different factors. In the identification process, the user provides the reference audio by uploading the content or by providing access to an online stream. Each recording is matched against the reference collection with its resulting identifications, after which the identifications are enriched with audio content metadata (e.g. artist, title, ISRC, ISWC, album), and an identification report is prepared. 2) The merging of the results of two neighboring fingerprint segments and the elimination of conflicting candidates has a much bigger impact on the calculation effort and performance of the algorithm. A larger database creates more possible false positives that have to be resolved. For the matching process an audio fingerprint - a simplified data representation in the frequency domainis extracted for each reference audio song and for the audio recording. This process is supported by a hash function, similar to cryptographic hash functions, that converts the large amount of bits from the input audio signal to a significant and representative amount of bits. An important constraint for this hash function is that perceptually similar audio objects result in similar fingerprints, and on the other hand, that dissimilar audio objects result in dissimilar fingerprints. The reference fingerprints are collected in a database, optimized for querying input fingerprint fragments. The algorithm is optimized for identification of reference audio of minimum 3 seconds playtime. A fingerprint fragment in the configuration for this pilot represents an audio segment of 6 seconds, which results in a duration precision of +/- 3 seconds. 1) The size of the database has a small impact on searching possible candidates for fingerprint fragments as the data structure is optimized for this kind of queries. 3) The last influence is the duration of the recording, which has a linear impact on the search performance in the database and can have an even bigger influence on the resolution depending on the coverage and constellation of the musical content inside the recording. BMAT fingerprint solution is robust and resistant to channel distortion and background noise. It is optimized and tuned to have a detection rate of 99% with no false positives and over 90% detection for background music. The solution is restricted to the reference collection of the user. If a recording contains an unavailable reference or a new recorded version of that reference (e.g. live performance, time stretched version) we talk about a dissimilar audio object in regards to the reference collection. The algorithm will not be able to match the song occurrence in the recording. The identification technology is currently operating in a production platform that is constantly analyzing over 2.000 radio and TV channels against a music database of nearly 20 million reference songs. 5.2 Pilot setup For the pilot BMAT received 5 DJ performances recordings from TEOSTO, of which 4 were recorded from the line-out signal of the DJ set and 1 recording was recorded from a 24 microphone placed at the DJ desk. From the recordings, four have a duration of around one hour (playtime) and one recording has a length of 30 minutes. Test DJ Type Recording Duration 1 K-System line-out k_system.mp3 01:00:00.16 2 Orkidea line-out orkidea.mp3 01:00:11.03 3 Riku Kokkonen line-out riku_kokkonen.mp3 00:28:06.17 4 Darude line-out darude_radio.mp3 01:00:05.49 5 Darude microphone darude_seattle.mp3 01:00:00.16 Table 10. Delivered recordings of DJ sets for the pilot. Source: BMAT Reference collections For each DJ performance, TEOSTO prepared a detailed setlist with the artist and song title as well as the order of appearance. BMAT created the following reference collections which were used for matching against the recordings: ORIGINAL: In addition to the recordings, BMAT received 56% (40 references) of the played content from TEOSTO and could retrieve another 21% (15 references) from BMAT internal music database. This content is used as the ORIGINAL reference collection to match against the recordings and has an overall coverage of 76% (55 of 72 references) ofthe played music in the recordings. VERSION: Apart from the original references, BMAT retrieved 166 songs from the internal music database that were not the played version of the song, but another release or remix from the artist (e.g. original, extended version, club, mix, radio edit, featuring...). This collection of songs was called VERSION dataset and was used to bias the results and investigate the relation between the different versions. BIGDATA: This third collection was used for performance estimations and validation and it contains 10K of popular club music references, including the ORIGINAL references that were played in the delivered recordings. Photo: Turo Pekari 25 5.3 Test case results 5.3.1 K-System K-System is a Finnish DJ and producer since 1999. K-System’s recorded DK set (length: 60 minutes) contained a electronic and house music mix. For the pilot, 13 out of the 14 performed tracks were available to be used as reference audio. Of the 13 available songs, 13 could be directly identified with the ORIGINAL database. Time detection is precise to 3 seconds and overlaps of two songs playing at the same time are detected in several cases up to 16 seconds. In some cases the outros of the songs do not coincide with the reference songs because they contain remixes or heavy filters applied from the DJ. Table 11 indicates the identifications against the ORIGINAL collection. The unavailable songs were marked yellow, the false negative detection are colored red. Table 12 shows the matches with the VERSION collection. The rows marked green are matches that were missed in the identification against the ORIGINAL collection, but could be identified with a similar version of the references song in the VERSION collection. StartEnd Artist Title Comment 1 0 303 Deniz Koyu Tung! (Original Mix) CORRECT 2 297 559 Cedric Gervais Molly (Original Mix) CORRECT 3 553 846 Swedish House Mafia Greyhound OTHER VERSION 4 824 1107 Hard Rock Sofa, Swanky Tunes Here We Go (Original Mix) CORRECT 5 1039 1378 HeavyWeight Butterknife CORRECT 6 K-System & Probaker Lingerie Demo 1 NOT AVAILABLE 7 1731 2008 Sebastian Ingrosso & Alesso Calling - Original Instrumental Mix OTHER VERSION 8 1956 2141 Chocolate Puma, Firebeatz Just One More Time Baby (Original Mix) OTHER VERSION 9 2135 2438 Pyero Ole (Original Mix) CORRECT 10 2473 2699 Quintino, Sandro Silva Epic (Original Mix) CORRECT 11 2683 2920 Nari & Milani Atom (Original Mix) CORRECT 12 2913 3130 Daddy’s Groove Turn The Light Down (David Guetta Re Work) CORRECT 13 3123 3370 Basto I Rave You (Original Mix) CORRECT 14 3354 3647 Firebeatz, Schella Dear New York (Original Mix) CORRECT Table 11. Identification for K-System DJ Set against ORIGINAL collection (Source: BMAT) StartEnd Artist Title Comment 1 0 165 Deniz Koyu Tung! (Edit) CORRECT 2 297 559 Cedric Gervais Molly (Club Radio Edit) CORRECT 3 553 841 Swedish House Mafia Greyhound (Radio Edit) OTHER VERSION 4 824 1107 Hard Rock Sofa, Swanky Tunes Here We Go CORRECT 7 1725 2008 Alesso, Sebastian Ingrosso Calling (Lose My Mind) Feat. Ryan Tedder (Extended Club Mix) OTHER VERSION 10 2437 2694 Quintino, Sandro Silva Epic CORRECT 11 2616 2925 Nari - Milani Atom CORRECT 13 3123 3360 Basto I Rave You CORRECT Table 12. Identification for K-System DJ Set against VERSION collection (Source: BMAT) 26 5.3.2 Orkidea The Finnish electronic music artist DJ Orkidea, performed for the second test case for this pilot. The recording was recorded from the line out of the DJ desk (length: 60 minutes) and contains mainly techno and trance music. StartEnd Artist Title Comment 1 0 88 Tiësto Ten Seconds Before Sunrise CORRECT 2 87 529 Lowland Cheap Shrink CORRECT 3 Attractive Deep Sound pres. Little Movement The Anthem! NOT AVAILABLE 4 Full Tilt Class War TIME STRETCH 5 Steve Brian Vueltas (Thomas Datt Instrumental) TIME STRETCH 6 1485 1788 Orkidea Liberateon (Mystery Islands Remix) CORRECT 7 Paul Oakenfold Southem Sun (Orkdea´s Tribute Mx) NOT AVAILABLE 8 2207 2525 Solarstone & Giuseppe Ottaviani Falcons (Giuseppe Ottaviani OnAir Mix) CORRECT 9 2596 2920 Alex M.O.R.P.H. Etemal Flame (Alex M.O.R.P.H.s Reach Out For The Stars Mix CORRECT 10 Mark Sherry My Love (Outburst Vocal Mix) NOT AVAILABLE 11 Omnia Infina TIME STRETCH Table 13. Identification for Orkidea DJ Set against ORIGINAL collection. There were no identifications results with the VERSION database for this test case, but in comparison to the other test cases, only 2 out of 166 references from the VERSION database are related to Orkidea DJ Set. The algorithm can detect 5 of 8 available songs against the ORIGINAL database. All 3 cases of false negative could be related to the fact that the algorithm is not resistant to timescale-pitch modification. Part of the report are 3 audio pairs, each pair contains an audio sample of the reference and recording, which show the timescale examples for the songs #4, #5 and #11. It results that for these songs a correct reference phonogram was not available and the algorithm detected 100% of the specified and detectable cases. The total detection rate for this test case was 62,5%. 27 5.3.3. Riku Kokkonen the pilot. With a length of 28 minutes, this test case has the shortest recording of the pilot with the most different reference songs. Riku Kokkonen is a Finnish DJ playing Electronic and electronic and house music. The delivered DJ set contained 18 of which 11 were available for reference for StartEnd Artist Title Comment 1 Avicii Levels (Acapella) NOT AVAILABLE 2 Taio Cruz Hangover (Acapella) NOT AVAILABLE 3 Dada Life Kick out the epic motherfucker (Original Mix) NOT AVAILABLE 4 164 221 Basement Jaxx Wheres Your Head At Your CORRECT 5 Nari & Milani Up (Acapella) NOT AVAILABLE 6 282 349 Showtek & Justin Prime Cannonball (Original Mix) CORRECT 7 343 534 Swedish House Mafia Greyhound CORRECT 8 538 605 Daniel Portman & Stanley Ross Sampdoria (Original Mix) CORRECT 9 Fatboy Slim Right Here, Right Now (Acapella) NOT AVAILABLE 10 635 754 Plastik Funk, Tujamo WHO (Original Mix) CORRECT 11 758 831 Chuckie Who Is Ready To Jump (Dzeko & Torres Remix) CORRECT 12 819 892 Firebeatz & Schella Dear New York (Original Mix) CORRECT 13 House Of Pain Jump Around OTHER MIX 14 Jay-Z & Kanye West Otis (A Skillz Remix) NOT AVAILABLE 15 TJR Funky Vodka NOT AVAILABLE 16 1208 1358 Oliver Twizt Love Trip (David Jones Remix) CORRECT 17 1347 1594 Steve Angello & Third Party Lights (Original Mix) CORRECT 18 1567 1685 The Aston Shuffle & Tommy Trash Sunrise (Won´t Get Lost) CORRECT Table 14. Identification for Riku Kokkonen DJ Set against ORIGINAL collection. StartEnd Artist Title Comment 3 0 170 Dada Life Kick Out The Epic Motherf**ker NOT AVAILABLE 4 164 216 Basement Jaxx Where’s Your Head At (Radio Edit) CORRECT 7 338 534 Swedish House Mafia Greyhound (Radio Edit) CORRECT 10 625 744 Plastik Funk, Tujamo Who CORRECT 11 794 831 Chuckie Who Is Ready To Jump (Ryan Riback Remix) CORRECT 13 850 1005 House Of Pain Jump Around (Deadmau5 Edit) OTHER MIX 16 1213 1358 Oliver Twizt Love Trip CORRECT 17 1347 1594 Steve Angello & Third Party Lights CORRECT Table 15. Identification for Riku Kokkonen DJ Set against VERSION collection. database. Overlaps in between songs could be identified by the algorithm correctly up to 27 seconds (see Table 14, #17,#18). In different cases the end of a song could not be detected as it was remixed from the DJ. The song #13 was annotated incorrectly in the set list, it is not the original phonogram, but a variation and a different phonogram. In the VERSION database the correct phonogram was 28 available and detected by the algorithm. Same for song #3 which is played at the beginning of the DJ set and was not available in its annotated version for the pilot, but could be detected correctly with the VERSION database. The system detected all tracks that were available for detection. The overall detection rate for this test case was 90,9% In this test case, we can see the linear impact of the recording length on the performance. In comparison to the prior test cases, the computation time is nearly cut by half. The performance is not absolutely linear to the other test cases, because we have the same amount of different references on the same search space and a big part of the computation is used to resolve possible overlaps and merge neighboring partial matches. For the BIGDATA database the algorithm computes the results 487 times real time according to the recording length. 5.3.4 Darude Darude Radio DJ Darude is a Finnish trance producer and DJ. The recording of this test case is 60 minutes long and contains 14 songs from Electronic and Trance music genres. The recording StartEnd Artist 1 36 344 was carried out during a studio performance for a radio show. Title Comment Morgan Page, Andy Caldwell & Jonathan Mendelsohn Where Did You Go (Tom Fall Remix) CORRECT 2 313 580 Ashley Wallbridge Grenade (Original Mix) CORRECT 3 527 769 Marco V GOHF (Kris O’Neil Remix) CORRECT 4 Cosmic Gate & JES Flying Blind (TwisteDDiskO Club Mix) NOT AVAILABLE 5 968 1286 Dada Life Rolling Stones T-Shirt (Original Mix) CORRECT 6 Ferry Corsten Radio Crash (Progressive Mix) NOT AVAILABLE 7 Above & Beyond feat. Zoe Johnston Love Is Not Enough (Maor Levi & Bluestone Club Mix) TIME STRETCH 8 Nitrous Oxide Tiburon (Sunny Lax Remix) TIME STRETCH 9 1956 2238 Above & Beyond feat. Andy Moor Air For Life (Norin & Rad Remix) CORRECT 10 2263 2300 Philip Aelis & Tiff Lacey Heart In Blazing Night (David Kane Remix) CORRECT 11 2483 2838 Majai Emotion Flash (Incognet Vocal Mix) CORRECT 12 2806 3129 Jonathan Gering Let You Go (Original Mix) CORRECT 13 3113 3365 Nick Wolanski I Love Mandy (Original Mix) CORRECT 14 3349 3599 Ercola vs. Heikki L. Deep At Night (Adam K & Soha Remix) CORRECT Table 16. Identification for Darude Radio DJ Set against ORIGINAL collection. For the pilot, 12 songs were available and 10 songs could be detected in the ORIGINAL database. The 2 false negative cases are related to a slight time stretch that was introduced by the DJ during his performance. Regarding the detectable songs we achieve a detection rate of 100%, and the total detection rate for this test case was 83,3%. StartEnd Artist Title Comment 4 870 907 Cosmic Gate & JES Flying Blind NOT AVAILABLE 5 968 1286 Dada Life Rolling Stones T-Shirt CORRECT Table 17. Identification for Darude Radio DJ Set against VERSION collection. In the VERSION database the song #4, that wasn’t available in the ORIGINAL database could be detected correctly with a different reference (see Table 17). This shows again that the algorithm is robust to mixes that do not destroy the timescale in the frequency domain. The observations for the performance are similar to the test cases above. The recording could be analyzed and matched in 29 Darude Seattle DJ Set The material for this test case was recorded by the same DJ as in the previous test, Darude, and is a mix of a microphone and DJ set recording. The recording is 60 minutes long and contains 15 songs. The device used for the recording was a Zoom H4N recorder, with a 120 ° stereo microphone directed from the DJ desk towards the audience. The audience and room were recorded on a single stereo track and another track was recorded from the mixer line. Both tracks were mixed together using Logic Pro software. The final mix was compressed and signals have been adjusted and equalized, aiming for a balanced ambiance. The final mix is approximately 90% of the clean mixer signal, with the audience track on top to create atmosphere. StartEndArtist In this case, BMAT’s identification technology could detect only 5 out of 11 available songs within the ORIGINAL database. The analysis against the VERSION database did not resolve any further matches. The results indicate a very low similarity and confidence. After a deep analysis on the frequency spectrum of the references and the recording we could find significant differences which cause the current implementation and configuration of the fingerprint extraction algorithm and the later search to fail for this material. These differences are probably caused by the post processing of the recording e.g. compression, equalizing and ambiance balancing, as well as the mixing of the two signals. For the detection of this audio material, a different extraction of the fingerprints - both for the recordings and for the reference ones- is necessary. The total detection rate for this test case was 45,5%. Title Comment 1 36 344 Morgan Page, Andy Caldwell & Jonathan Mendelsohn Where Did You Go (Tom Fall Remix) CORRECT 2 313 580 Ashley Wallbridge Grenade (Original Mix) CORRECT 3 527 769 Marco V GOHF (Kris O’Neil Remix) CORRECT 4 Cosmic Gate & JES Flying Blind (TwisteDDiskO Club Mix) NOT AVAILABLE 5 Dada Life Rolling Stones T-Shirt (Original Mix) CORRECT Ferry Corsten Radio Crash (Progressive Mix) NOT AVAILABLE 7 Above & Beyond feat. Zoe Johnston Love Is Not Enough (Maor Levi & Bluestone Club Mix) TIME STRETCH 8 Nitrous Oxide Tiburon (Sunny Lax Remix) TIME STRETCH 9 1956 2238 Above & Beyond feat. Andy Moor Air For Life (Norin & Rad Remix) CORRECT 10 2263 2300 Philip Aelis & Tiff Lacey Heart In Blazing Night (David Kane Remix) CORRECT 11 2483 2838 Majai Emotion Flash (Incognet Vocal Mix) CORRECT 12 2806 3129 Jonathan Gering Let You Go (Original Mix) CORRECT 13 3113 3365 Nick Wolanski I Love Mandy (Original Mix) CORRECT 14 3349 3599 Ercola vs. Heikki L. Deep At Night (Adam K & Soha Remix) CORRECT 968 1286 6 Table 18. Identification for Darude Seattle DJ Set against ORIGINAL collection. The performance of the algorithm for the ORIGINAL and VERSION database are very similar to the other test cases. A difference can be found in the BIGDATA database. As the fingerprints from the recording and the references 30 are not compatible the amount of possible candidates is very low and has a more significant impact on greater databases. Nevertheless the recording could be processed in 657x real time. 5.4 Evaluation of the results In In Table 19, we can see the global detection results of the pilot. We separated the results into #Available (number of available songs) #Detectable, number of songs that should be detectable from the technology (e.g. excluding cases of timescalepitch), #Detected (number of songs that were detected by BMAT) #Version (number of songs that were detected with another version than indicated) and #Total, the resulting total of detection by BMAT including detected versions. RESULTS # Available # Detectable #Detected % Detected # Version # Total % Total K-System 13 13 13 100% 0 13 100% Orkidea 8 5 5 62,50% 0 5 100% Riku Kokkonen 11 11 10 90,91% 1 11 100% Darude Radio 12 10 10 83,33% 0 10 100% Darude Seattle 11 11 5 45,45% 0 5 45,45% Table 19. Overall results of the algorithm for the complete pilot. During the pilot, we could detect 5 cases of time stretching in the delivered material. Those changes were introduced by the DJ and that could not be detected by the algorithm of the structural changes in the songs. Excluding these cases, we see that the technology could detect 100% of all recordings that have been taken directly from the mixer. In the case of the mixed recording- that contained mixer signal and microphone recorded audience signal- the current configuration of the algorithm shows a poor detection rate. The reason behind that is that the algorithm is currently optimized for radio and broadcast recordings, that does not normally contain signal manipulations in this dimension 5.5 Possible use scenarios The technology used in the DJ pilot is already in use for broadcast monitoring by a number of clients around the world, which means that it’s more mature than the solution tested in the live music identification pilot. For the intended use – for monitoring music use in clubs and by DJs- the technology has some limitations, most notably that the use of time-stretching by DJs in pilot resulted in tracks not being identified. If the limitations that were recognized in this pilot can be overcome, and the costs of setting up and running this type of service proves economically viable, Teosto sees three possible use scenarios for a club /DJ automated music monitoring system. The use scenarios are similar to the use scenarios for a live music identification system (see chapter 4.6): festivals, venues and artist/tour specific uses. For the club and DJ monitoring technology, the venue based use scenario could be the most interesting, as permanent installation of monitoring systems/services to active venues (including e.g. clubs, ski resorts, cruise ships) would also allow for monitoring of background music usage, and in time, also installation of possible live music identification services. The live festival and artist /tour specific use scenarios would be similar to the scenarios presented in chapter 4.6. We are aware that for example in the Netherlands this type of a monitoring service has already been tested on DJ/EDM tours, and experiences from these trials should be taken into consideration in further planning of these use scenarios. This chapter is based on “Teosto –BMAT Vericast Clubs Pilot Report” by BMAT (2012). 31 6. Consumer survey In addition to the two technology pilots, the project included a consumer survey on the possibilities and potential of using crowdsourcing methods for gathering information about live gig setlists directly from the audience. The web survey was carried out in the web consumer panel of the Finnish market research company Taloustutkimus Oy (http://www.taloustutkimus. fi) in December 2012 – January 2013. The 639 survey respondents were persons who visit live music events at least once a month. 6.1 Background The respondents were 15 to 40 years of age, and were all active concert goers (gigs, clubs, concerts, festivals). Out of a total of 3920 initial respondents, 16% belonged to the target group (visits live events at least once a month). The most active age group in the survey was the 31 to 35 year olds, of whom almost one in four respondents (23%) go to live events every week or several times a month. Majority of the survey respondents could be described as fans. 81% of the survey respondents could name a favorite artist (one or several), and 24% of these respondents say they are members of a fan club or otherwise active in online fan communities of one or several artists. The artists that were mentioned most often were Finnish metal artists, such as Stam1na, Mokoma, Nightwish and Amorphis; international bands such as Metallica and Muse, and Finnish pop acts like PMMP and Chisu. Three out of four respondents that have a favorite artist have also bought artist merchandise. Bands and artists that were most often mentioned for merchandise purchases were Metallica, Mokoma, Stam1na and Iron Maiden. Male respondents and respondents aged 26 to 35 are more active when it comes to buying artist merchandise. The respondents seek information about upcoming live events online (87%), usually buy their tickets beforehand (84%), and try to go to every local gig by their favorite artist(s) (64%). 6.2 Crowdsourcing potential 78% of respondents say they can “name most of the songs played by their favorite artist in their live show”, an encouraging result from the point of view of crowdsourcing potential. Further, the survey target group is already actively engaged in social media and a majority of them are also active smartphone users. In fact, 41% of respondents say they have (at least once) “posted an update to a social media service directly from a live gig”. Fans are also interested in following up on a gig by their favorite artist by searching for gig reviews (45%) and other information (photos, setlists, fan reviews, etc.) about the gig (54%) online after the gig. There’s a marked difference between fans searching for material posted online by other fans, and actively contributing (producing, uploading) material themselves: whereas 54% search for updates made by other fans, only 8% say they write gig reviews themselves, and 9% say they contribute set lists to online fan communities and/or social media. However, the respondents who are fan club members of one or several artists, or who are engaged in online fan communities, are also more active in this regard than other respondents. 6.3 Consumer interest in an interactive online setlist service One main purpose of the consumer survey was to try and estimate how interested active concert goers and fans would be in a service that would provide gig set lists (lists of songs performed at a gig) and would enable the uploading of set lists by the fans themselves. The interest was measured by asking the respondents how interested they would be in using such a service, and how interested they would be in uploading material into such a service themselves. 32 As a baseline and for comparison, the respondents were also asked about their current use of several live music related online services: Finnish online ticketing services (lippupalvelu.fi, tiketti.fi, lippu.fi), live music information services (meteli.net, songkick.com), services that already focus on setlists (setlist.fm), and one mobile fan engagement service/platform (Mobile Backstage). The ticketing services were used actively, with over 75% of respondents using (use or has tried the service) all three major Finnish ticketing sites. 41% had used or tried the Finnish live music information service Meteli. net, and 9% used or had tried Setlist.fm. Songkick and Mobile Backstage (which is often branded for each artist and thus probably not known to fans as a separate service) received only a few mentions – artist fan sites and Facebook were mentioned more often as information sources on live music events. In the end, a total of 30% of all respondents surveyed showed interest in a service that would enable viewing set lists, and posting set lists by consumers and fans. Respondents who could name one or several favorite artists, were slightly more interested (33%), and fan club members even more clearly so (47%). Out of different age groups, respondents aged 15 to 20 showed more interest in this type of a service than older age groups. A smaller amount of respondents (12%) said they would also be interested in posting material to a setlist service themselves. Again, the fan club members had the largest amount of interest, with 26% of fan club members saying they would be interested in posting set lists to this type of service themselves. 6.4 Conclusions It has to be noted that converting this type of general interest shown by consumers in a web survey into actual users for a service will not be easy or straightforward. Nevertheless, for the purposes of the present research project, the aim of the survey was to identify potential target groups for these types of services, and from this point of view the survey results are interesting. First, we can see that the total number of consumers that could be the target group for an online set list service in Finland is small. This will probably rule out any large scale implementation of crowdsourcing methodologies for gathering set list information by a performance rights society like Teosto, at least for the time being. On the other hand, there are certain groups of consumers – specifically, fan club members and persons actively engaged in artist fan communities – that could be genuinely interested in using and posting set list information for their own favorite artists. This could leave room for crowdsourcing solutions that could be used for verifying setlists provided by automated solutions. The quality of automated reports could be verified by both fans and artists themselves in order to add a layer of quality control. From the artists’ point of view this could also potentially be a way to further engage active fans. References: Teosto ry – Biisilistapalvelututkimus. Taloustutkimus Oy, 28.1.2013 33 7. State of the art in Music Information Retrieval: what could be applied for copyright management? Teppo Ahonen The distribution and consumption of music is undergoing a drastic change. Whereas music used to be distributed in physical media such as vinyl albums, cassettes, and compact discs, the current trend favors online distribution, in either download stores such as iTunes, or streaming services such as Spotify. Although physical albums are still manufactured and sold, music is nowadays more and more stored in various hard drives, from large servers to personal computers and handheld devices. Such vast amounts of music data have created a demand for efficient, reliable, and innovative methods for accessing music. Music information retrieval (MIR) [13, 37] is a relatively young area of research that studies how information can be extracted, retrieved, and analyzed from large amounts of music data. MIR is an interdisciplinary area of research that combines studies from at least computer science, musicology, mathematics, acoustics, music psychology, and library sciences. The target groups of MIR studies fall into various categories. Firstly, MIR research provides tools and information for music scholars. Another example of a MIR target group is the consumer, a person who wants to find and access music online. But more importantly, in recent years also the music industry has also experienced a growing attraction towards MIR research. For example, a search engine for content-based music retrieval would clearly have a wide user base. In a similar manner, the technological discoveries may prove useful for organizations directly related to the music industry, such as music copyright societies. The innovations of MIR can be, and to some extent already have been, applied for detecting the use of copyrighted material. The purpose of this article is to provide a review of the current state of the art in MIR, and offer insight on methodologies that could be applied to different tasks in the area of copyright management. 7.1 Music Information Retrieval background Pinpointing the origins of MIR research is difficult. Since the early days of information retrieval research, there has been an interest to study whether the same methodologies could also be applied for different data, including music. Several suggestions on the very first MIR systems date back to the 1960s, with ideas of representing music as programming language. Clearly, the computational capabilities back then were too limited for the algorithms and data representations required for efficient music retrieval. In the decades that followed, the progress in computational and storage resources led into the development of real-world applications for information retrieval of musical data. At first the focus was on the conventional methodologies of textual information retrieval, combined with music. This can be deemed rather limiting for a phenomenon as diverse as music [15], and since the beginning of the 21st century, more and more studies have focused on the music content itself. The vast majority of MIR studies still focus on information retrieval, but a growing amount of studies focus on other topics that could be defined as “informatics in music”, for lack of a better term. It should be noted that studies of automatic composition and other similar areas of computational creativity are not deemed to be MIR studies in the strict sense of the word. 7.2 State of the research In the past ten years, the research area of MIR has grown into a vast field of interdisciplinary study. This has happened in conjunction with the rise of digital music distribution and the changes in the habits of music consumption. Also, success stories of applying MIR research to consumer and business-to-business technologies and services have already surfaced; the applications in the query by example domain discussed in this article are a fine example of this. 34 Though there are at the moment no scientific journals that focus solely on MIR, the discoveries of MIR studies have been reported in various journals of computer science, computational musicology, mathematics, and other related areas. Several textbooks and other monographs considering topics of MIR studies have emerged, as well. In addition to scientific journals, a significant amount of MIR studies are published in conference proceedings. The arguably most important conference of MIR is known as ISMIR [10, 15], abbreviated from International Society for Music Information Retrieval. ISMIR has been held annually since 2000, and it has expanded into a large-scale forum for discussion of recent discoveries in MIR studies. In addition to ISMIR, many other conferences related to music and multimedia frequently publish studies from the field of MIR - some of the most important ones including ACM Multimedia (ACM-MM), International Computer Music Conference (ICMC), International Symposium on Computer Music Modeling and Retrieval (CMMR), and IEEE International Conference on Multimedia & Expo (ICME). Also several smaller workshops have already established themselves, the most notable ones including the International Workshop on Advances in Music Information Research (AdMIRe), International Workshop on Machine Learning and Music (MML), and International Workshop on Content-Based Multimedia Indexing (CBMI). A highly important factor in the recent years of MIR studies has been the introduction of the Music Information Retrieval Evaluation eXchange (MIREX) [14], held in conjunction with ISMIR since 2005. Borrowing the concept of the textual information retrieval evaluation TREK, MIREX is a community-based effort for objective and comparative evaluations of MIR applications. MIREX provides researchers and groups with a possibility to evaluate their applications with large sets of music data and compare the performance rate against other submitted state-of-the-art approaches, all without the risk of committing a copyright infringement by distributing material used in evaluations. The success of MIREX is noteworthy; for example, in 2012 a total of 205 evaluations were run in the MIREX session. In addition to the research, MIR-related topics are currently taught in various institutions around the world, for both graduate and undergraduate students. 7.3 Current trends The work in MIR can roughly be divided into three categories: symbolic music (such as MIDI data, MusicXML, and other representations with high semantic information), audio (dealing with raw data of timeamplitude representations, in practice often applying methods of signal processing for extracting more indepth information of the music), and metadata (such as tags and other user-generated information, but also including lyrics). The first two are commonly referred to as content-based MIR, because they rely solely on the information contained in the music itself, without the aid of metadata information. Although the categories are diverse, several studies incorporate discoveries from various categories and combine them into novel and more improved systems; for example, combining audio classification methods with metadata information such as lyrics can be beneficial for mood classification (e.g. [8]). Music retrieval. The basis of MIR research altogether; retrieving music in a similar manner as how textual or other information can be retrieved. The task of music retrieval is a combination of several problems: what features should be extracted from the music, how the music or the features should be represented, how the matching algorithm should be constructed to be both accurate and robust at the same time, and how the data should be indexed to allow efficient matching. We will discuss a subfield of content-based music retrieval called query by example in more detail later in this article. Music identification. One of the core problems in MIR is measuring similarity between pieces of music and determining whether they can be identified or otherwise deemed to be highly similar. Cover song identification [45], a task that is definitely difficult but also highly applicable when performed successfully, is a key example of the problem. Approaches and applications of cover song identification are also discussed in more detail in this article. Music categorization. A plethora of applications applying methodologies of similarity measuring in combination with machine learning techniques, in order to classify or cluster sets of music data, have emerged. In classification, a set of data is used to train a classifier to label unknown pieces, whereas in clustering the set of unknown pieces is divided into smaller clusters according to their similarities. One of the most wellstudied problems is the task of genre classification, with various methodologies (see e.g. [43] for a tutorial) existing and high success rates achieved in the related MIREX task. However, it has also been discussed whether the rather subjective definition of genre should be used for criteria in automatic music classification [33]. Outside of genre classification, various different tasks exist. Recently, classifying music according to the mood it represents has been studied extensively. 35 Feature extraction and processing. In both the symbolic and the audio domains of MIR research, a crucial factor is to extract meaningful content from the music into representations that can be used for similarity measuring. This includes different methods, ranging from low-level signal processing techniques to methods where the task is to provide a so-called mid-level representation of the music: a representation that captures desired features of the music in a both efficiently computable and robust way, without attempting to extract any highlevel semantic information. The features extracted from the signal can be processed into representations that include descriptors of melodic [32], harmonic [7], and rhythmic [21] content. Such features are also applied in various tasks from key (e.g. [41]) and chord sequence (e.g. [40]) estimation to audio thumbnailing and structure discovery (e.g. [5]). Also instrument detection (e.g. [18]) and signal decomposition (e.g. [48]) can be included as tasks of this category. Automatic transcription. The idea of automatic transcription is to estimate notes, chords, beats, keys, and other musical descriptors from the audio signal to provide a more semantically rich high-level description of the content of the audio signal than the methodologies described in the previous paragraph; however, these methodologies are to some extent applied also in automatic transcription. At its most successful, an automatic transcription system could be applied as a tool for producing sheet music representations from pieces of music in audio format. Requiring methods such as signal decomposition, fundamental tone estimation, beat detection, and harmony approximation, automatic transcription in its current form is anything but a solved task. Music synchronization. Not to be confused with the music business usage of the same term: using feature extraction methods and representations, the task of music synchronization attempts to synchronize music from different sources, be it different audio files, or a combination of music from audio and symbolic domains. One of the most common tasks of music synchronization is known as score following [29], where the goal is to synchronize the sheet music representation to the corresponding audio file, thus requiring both creating robust representations for the music to be synced and synchronizing them with sequence alignment algorithms. Optical Music Recognition. Abbreviated OMR, or occasionally Music OCR, optical music recognition has been for years an active area of study, where the goal is to provide a computer-understandable representation 36 from a piece of music described in sheet music format; see for example [4] for a tutorial. To some extent, OMR is the vice versa process of automatic transcription, described previously. Nowadays OMR is often considered as one of the subfields of MIR, and successful applications of OMR could be further applied in various MIR tasks. The modern western music notation system is arguably one of the most difficult writing systems ever developed by mankind, consequently making the task of OMR greatly more difficult than conventional optical character recognition (which itself is already a highly challenging task). Out of all different applications for OMR each have their own particular weaknesses, and a higher accuracy in OMR could be achieved by combining multiple OMR systems in order to overcome their individual shortcomings. User-based applications and interfaces. End-userbased applications have always been an essential goal of MIR research, and in recent studies the focus has especially been on creating innovative interfaces for accessing music, including novel methodologies for visualizing music information [39], and techniques of automatic music recommendation [38]. Automatic playlist generation (e.g. [3]) has also gained interest from the research community. In this task, the challenge is to learn connective features from pieces of music, in order to produce lists of music that could be deemed enjoyable and coherent by a human listener. The task is difficult, as it is based on remarkably subjective qualities of music, and thus is usually based on applying metadata (e.g.[25]), collected by mining the web, for obtaining features such as large sets of tags from services like Last.fm, which describe collective impressions from pieces of music. Also, the subjective nature requires that the evaluations need to be conducted by human listeners, adding more challenge to the task and making comparison between systems difficult. Lyrics. In recent years, including lyrical content in MIR tasks has been widely accepted, because lyrics can be seen as a readily available powerful form of metadata; different lyricists have their individual styles, different genres use similar vocabularies, and most notably, the content of the lyrics often correlates with the mood of the music, thus providing a multimodal approach for the challenging task of mood classification. Because of this, using lyrical descriptors in conjunction with audio features has been widely implemented especially in mood classification (e.g. [27]). Non-western music. As MIR is practically an offspring of research conducted in western institutes by western scholars, within the field there has always been a slight bias towards western music. As the data used are pieces of western music (often gathered from personal collections or public domain repositories), the applied features and tools developed for MIR are more or less developed from a western point of view. As such, they might not be trivially applicable for different music cultures from all over the world, since the scales, rhythms, timbres, and other relevant musical features often differ significantly. Interest towards MIR for nonwestern music has increased in the last few years [12], and MIR methods have been developed for music from cultures such as India (e.g. [49]), China (e.g. [22]), Turkey (e.g. [17]), and Africa (e.g. [34]), to name but a few. Legal, business, and philosophical issues. Music is not just an aesthetic phenomenon; it has a significant importance for our daily lives, and has gradually become a relevant area of economic and juridical matters. These aspects need to be considered when conducting MIR research. 7.4 Audio music retrieval Query by example (QbE) means retrieving data by matching example input to the database. In MIR, QbE is a task where the goal is to retrieve music from a database by using an input query that is an example of music. The example could be either a complete song, or just a short sample section of the song. A common case for a QbE end-user would be to use the system to identify a piece of music played on the radio, in order to discover the name of the unidentified piece. Audio music retrieval also includes related concepts with slightly different approaches. Query by humming (QbH), or query by singing (QbS), means retrieving music data using an input query that is obtained as a sung or hummed version of the prominent musical cue (usually the lead melody of the piece), which is then matched against the pieces included in the database. Also query by tapping (QbT) (e.g. [19]) has been introduced as an alternative method of providing audio queries that are strongly based on the prominent rhythmic qualities of the music. Successful query by example techniques can be applied for various tasks, from accessing music innovatively and musically, to identifying music through the comparison of audio fingerprints. One of the key challenges for query by example systems is the management of the target database. In order to be valid practically, the database needs to contain very large amounts of music data, also requiring efficient indexing techniques and fast matching processes. Features As it is rather difficult to directly match differently produced sounds, systems applying query by humming, singing, or tapping (QbH, QbS, QbT) need to reduce the audio input query into a symbolic representation in order to match them against the pieces in the database. Similarly, the database needs to be converted into a similar representation before the matching. Starting from the signal processing of Fourier transform, the audio signal is converted into a symbolic representation that allows fast and robust pattern matching techniques to be applied. However, these methods are clearly prone to various errors that can occur in the conversion process. Instead of this, query by example (QbE) systems usually rely on techniques of audio fingerprinting and detecting similar segments of music, without processing the audio into an oversimplified representation. As the query by example techniques search for identical matches they do not need to extract any mid- or higher-level semantic information from the signal. With query by humming (QbH) or query by singing (QbS), the matching should be key-invariant; unless the input is provided by a trained singer or with the aid of a reference pitch, there is no guarantee that the melody is in the same key as the target pieces in the database, thus making matching based on exact note pitches completely unreliable. Another kind of robustness is also needed. Using the terminology by Lemstrom and Wiggins [30], a matching process needs to be both time-scale invariant (allowing temporal fluctuation) and pitch-scale invariant (allowing invariances in tone heights). Different systems solve these problems with different methods. Similar demands for robustness apply for query by example (QbE) systems, too, although the key or rhythm will not cause problems. Here, the imperfect nature of the input is not caused by the possibly musically unprofessional user, but instead by noise and distortion caused by the sampling process; for example, the quality of the recording equipment, constant or temporary background noise, or encoding of the input sample for the transmission process. 37 Applications The query by example (QbE) methodologies have been fruitfully processed into two commercially successful consumer applications. Shazam (http://www.shazam.com) [53, 52] was one of the first successful public applications utilizing MIR techniques, originally launched in 2002 in the United Kingdom. Shazam is a software for mobile devices that allows users to record and submit 15 second samples of audio which are then matched against a large database, and a list of the nearest matches for the query is returned. In the case of two or more audio files playing simultaneously, Shazam is usually capable of returning a list of the pieces played [52]. Occasionally the list of false positives might include pieces of music that were sampled on the query [52]; as such, techniques of Shazam could be applied for detecting (possibly unauthorized) sampling. The features used by Shazam are audio fingerprints constructed from the spectrogram peaks of the signal. The spectrogram peaks allow constructing hash representations from the signal, and the matching algorithm based on comparing the hashes is fast, although requiring far more storage space. Because of using the spectrogram as the starting point, Shazam is a completely query by example (QbE) -based application, and thus it compares only recorded performances. Due to this, cover or live version detection, or QbS, is not possible using Shazam. Another query by example (QbE) application that has recently gained both a fair amount of users and acclaim from the research community is known as SoundHound (http://www.soundhound.com). The services offered by SoundHound are more versatile than the query by example scheme would suggest; SoundHound also allows users to input queries by singing or humming, making it a hybrid of QbE and QbS technologies. SoundHound uses the backend of the midomi (http://midomi.com) query by singing service, and utilizes the database of usergenerated renditions as the target data. The technology behind SoundHound is known as Sound2Sound, and it is explained on the SoundHound web page in highlevel, unscientific terms, describing “audio crystals” as the chosen representation. We are unaware of any publications on the technology behind SoundHound, but a patent application [35] by SoundHound Inc. exists. The application explains the feature extraction and matching process in a more accurate manner. The audio signal is frame-wisely constructed to a sequence of pitch values and pauses. The matching process is conducted using technique entitled Dynamic Transition Matching, which is a dynamic programming technique and similar to the Needleman-Wunsch algorithm. 7.5 Cover song identification Instead of discovering the title of the piece of music as in the query by example case, it would occasionally be more appropriate to discover whether a piece of music has similarities to other pieces; in practice, if the query piece is either a cover version, a piece of plagiarism, or just otherwise a highly similar composition. This task is commonly known as cover song identification, although it should be noted that the term “cover” is used in a rather broad sense here: a cover version could be anything from a remix to a live version, or from a variation to an alternative arrangement, such as an “unplugged” version. Whereas the task is somewhat trivial for a human listener, it is far more challenging for a computer; such features as arrangements, tempos, keys, and lyrics may change between the versions, and thus cannot be relied on in the similarity measuring process. Because of this, cover song identification requires methods that are at the same time both robust for the differences between the versions but also able to capture the essential similarity in the music. 38 Because cover song identification is clearly an objective task (a song either is a cover version or not), it provides a reliable way to evaluate how well similarity measuring algorithms perform. It also yields important information on what similarity in music is, and how such similarity can be represented and measured. Ultimately, successful cover song identification systems can be applied for various tasks, most notably plagiarism detection. The cover version identification task has been a MIREX challenge since 2006. The setup of the MIREX evaluation is as follows. A dataset of 1000 files includes 30 cover song sets, that is, sets of one original performance and 10 cover versions of the piece. This totals up to 330 pieces that are used as queries. The remaining 670 pieces are irrelevant pieces of music, so-called “noise tracks” to make the task more challenging. Each of the 330 pieces is used as a query, and for each query, pairwise distances between the query and each piece in the database is calculated, and a distance matrix based on these is returned. The performance is evaluated by several measures, mean of average precisions (MAP) being probably the most relevant measure; it describes how well the queries are answered, taking into account the order of the answers (i.e. do the correct answers have the smallest distances). So far, the best-performing algorithm has achieved a MAP value of 0.75, with 1 meaning a perfect identification. Based on this, the cover song identification is definitely not a solved task yet. For a more thorough review of cover song identification technologies, we refer to a survey by Joan Serra et al. [45]. Features Considering that cover versions often feature different timbral characteristics, using common acoustic fingerprinting matching is out of the question. Instead, a robust representation that captures the essential tonal information – melodies and harmonies – is required. Also, similarity measuring cannot be based on short segments of the pieces; with different structures and possibly highly diverse parts between the versions, it is commonly considered that cover song identification must be based on complete pieces of music. The most frequently used feature that is applied in cover song identification is known as a chromagram. Also known as pitch class profile, a chromagram is a sequence of vectors obtained from the audio signal with short-time Fourier transform and several steps of post-processing. Each vector of the sequence describes a small portion of the piece, usually under one second long, and is commonly 12-dimensional, thus corresponding to the 12 pitch classes of the western tonal scale (occasionally, a more fine-grained representation of 24 or 36 dimensions is used). The continuous vector bin values describe the relative energy of the pitch classes in the frame, meaning that for a segment of music where a C major chord is played, the vector bins corresponding to pitch classes c, e, and g, have the highest values. For two pieces sharing common tonal features, the chromagrams are likely to have similar characteristics, and thus in cover song identification the task is to measure the similarity between two chromagrams. Various methods for measuring the similarity between chromagrams exist. Several methods apply a discretization process to turn the continuous-valued chroma vectors to a symbolic representation and then use various techniques of pattern matching (e.g. edit distance [6], dynamic time warping [28], and normalized compression distance [2, 1]). Other techniques include calculating cross-correlation [16] and Euclidian distance [20] between the chromagrams without attempting to discretize the chromagram data. Some of the highest-performing cover song identification methods (e.g. [47]) produce a binary similarity matrix between the chromagrams, and calculate the longest path in the matrix, thus using the longest match between the chromagram sequences as the value of similarity between the pieces. The chromagram is robust against changes in instrumentation and timbre, but two features that might affect identification are key and tempo. If a piece is transposed to a different key, it will have a chromagram with the same values as the original, but in different pitch class bins. This would make even highly similar pieces of music seem differing. In order to measure key-invariant similarity, several methods are applied. One is to calculate the similarity between the original and each 12 transpositions of the cover version chromagram, and select the largest value as the distance. A more sophisticated method is to transpose the chromagrams to a common key, either by key estimation or with methods such as optimal transposition index [44]. The third possibility is to produce a representation that describes the relative changes in the chromagram and measure the similarity between such representations. Tempo changes may not have such a drastic effect on the chroma profiles, but several techniques are also applied to obtain tempo invariant similarity measuring. One is to estimate the tempo with beat tracking and filter the chromagram representations according to the estimated beats. More commonly, similarity measuring based on dynamic programming can overcome the tempo invariances in the similarity measuring. Some methodologies ignore tempo invariance altogether and suggest that the obtained results are actually better without tempo invariance than with such methods, as unreliable beat estimation might be a weak link in the similarity measuring process (e.g. [1, 46]). Occasionally, studies applying melody-based cover version identification appear. It is difficult to evaluate the performances between melody- and chromagrambased methodologies, as almost all approaches use different data sets. It should be noted, though, that the chromagram also captures the melodic information, so a successful chromagram-based approach is likely to include, at least indirectly, some of the melodic information similarity measuring in the process. 39 Applications To our knowledge, there are currently very few commercial applications that actually utilize cover song identification techniques. One that we are aware of is known as BMAT Vericast Covers (http://www.bmat. com). The BMAT Vericast is an application developed for detection of music played in radio streams; that is, to provide real-time query by example audio fingerprint matching in order to distribute royalties over music played in commercial radios. The Covers version adds to the technology the cover song identification methods by Serra et al. [46, 47], and as such provides a method that could be applied to detect different renditions of a piece of music from a stream. An example of such technology could be automatic setlist identification; the application would examine a recorded performance and compare the segments against the back catalog of the artist, providing an estimation of the setlist performed at the concert, which could again be used for distributing public performance royalties. 7.6 Conclusions In this article, we have introduced the basics of music information retrieval, reviewed the history of the area of research and depicted some of the current trends and challenges in the area. In addition, the methodologies behind two commonly used MIR technologies of music retrieval and identification were explained in closer detail, with examples of applications utilizing these technologies. The task of audio query by example retrieval requires methods that match similarities between recordings by first using audio fingerprinting to represent the spectral information contained in the music, and then efficiently measuring the similarity using techniques that are robust against the noise that might be added to the example during the recording process. The methodologies of query by example retrieval have successfully been adapted in commercial applications, most notably Shazam and SoundHound, the latter which incorporates query by singing/humming techniques in the identification process. The task of cover song identification, on the other hand, requires methods that do not attempt to measure similarities in the signals, but instead extract features that describe the tonal contents of the pieces, and then measure similarities between these feature representations. As the cover versions often vary intentionally, the methodologies must be robust by allowing even drastic changes in arrangements, structures, keys, and tempos in the cover version while still capturing the essential tonal similarity; that is, the features that make a piece of music a cover version, most notably melodic cues and harmonic structures. 40 Both methods were chosen to be presented in this article for their potential practical appliances for copyright management organizations such as Teosto (http://www. teosto.fi) in Finland. Both methods could be applied for different tasks of managing the distribution of collected royalties. Query by example techniques are applicable for monitoring the music of radio broadcasts; there are already implementations of such systems that have been put into operation. Cover song identification has so far not been applied on a large scale, but such systems are likely to appear in the near future. Also, copyright infringement detection could possibly benefit from introduction of MIR techniques. With query by example, this would most likely mean detection of usage of unauthorized sampled material. With cover song identification, methodologies could be applied for plagiarism detection. At the same time, it should be noted that the research problems of MIR are anything but solved. Although several methodologies and technologies produce highlevel performances in real world tasks, such as the SoundHound application, there is always room for improvement, and for most tasks, very few large-scale implementations even exist. In most cases, the best way to objectively compare the performance of the state-ofthe-art systems with large data sets depends on whether they have been submitted to the MIREX evaluation or not. In addition, considering the amount of published music, the methodologies for several tasks in MIR still need to demonstrate their capabilities for managing extremely large amounts of data in order to enable producing practical applications and reliable solutions. Also, the question of a possible glass ceiling of performance is valid in several tasks. 7.7 Bibliography [1] Teppo E. Ahonen. Combining chroma features for cover version identification. In Proceedings of the 11th International Society for Music Information Retrieval Conference, pages 165–170, 2010. [2] Teppo E. Ahonen and Kjell Lemstrom. Cover song identification using normalized compression distance. In Proceedings of the International Workshop on Machine Learning and Music, 2008. [3] Jean-Julien Aucouturier and Francois Pachet. Scaling up music playlist generation. In IEEE International Conference on Multimedia and Expo 2002, pages 105–108, 2002. [4] David Bainbridge and Tim Bell. The challenge of optical music recognition. Computers and the Humanities, 35:95–121, 2001. [5] Mark A. Bartsch and Gregory H. Wakefield. To catch a chorus: using chroma-based representations for audio thumbnailing. In Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics, pages 15–18, 2001. [6] Juan P. Bello. Audio-based cover song retrieval using approximate chord sequences: testing shifts, gaps, swaps and beats. In Proceedings of the 8th International Conference on Music Information Retrieval, 2007. [7] Juan P. Bello and Jeremy Pickens. A robust mid-level representation for harmonic content in music signals. In Proceedings of the 6th International Conference on Music Information Retrieval, pages 304–311, 2005. [8] Kerstin Bischoff, Claudiu S. Firan, Raluca Paiu, Wolfgang Nejdl, Cyril Laurier, and Mohammed Sordo. Music mood and theme classification – a hybrid approach. In Proceedings of the 10th International Society for Music Information Retrieval Conference, pages 657–662, 2009. [9] Donald Byrd and Tim Crawford. Problems of music information retrieval in the real world. In Information Processing and Management, pages 249–272, 2002. [10] Donald Byrd and Michael Fingerhut. The history of ISMIR – a short happy tale. D-Lib Magazine, 8(11), 2002. [11] Donald Byrd and Megan Schindele. Prospects for improving optical music recognition with multiple recognizers. In Proceedings of the 7th International Conference on Music Information Retrieval, pages 41–46, 2006. [12] Olmo Cornelis, Micheline Lesaffre, Dirk Moelants, and Marc Leman. Access to ethnic music: Advances and perspectives in content-based music information retrieval. Signal Processing, 90(4):1008 – 1031, 2010. [13] J. Stephen Downie. Music information retrieval. Annual Review of Information Science and Technology, 37:295–340, 2003. [14] J. Stephen Downie. The music information retrieval evaluation exchange (2005–2007): A window into music information retrieval research. Acoustical Science and Technology, 29(4):247–255, 2008. [15] J. Stephen Downie, Donald Byrd, and Tim Crawford. Ten years of ISMIR: Reflections on challenges and opportunities. In Proceedings of the 10th International Society for Music Information Retrieval Conference, pages 13–18, 2009. [16] Daniel P.W. Ellis and Graham E. Poliner. Identifying ’cover songs’ with chroma features and dynamic programming beat tracking. In IEEE Conference on Acoustics, Speech, and Signal Processing, 2007. [17] Ali C. Gedik and Barı3 Bozkurt. Pitch-frequency histogrambased music information retrieval for turkish music. Signal Processing, 90(4):1049–1063, 2010. [18] Perfecto Herrera-Boyer, Anssi Klapuri, and Manuel Davy. Automatic classification of pitched musical instrument sounds. In Signal Processing Methods for Music Transcription, pages 163–200. Springer US, 2006. [19] Jyh-Shing Roger Jang, Hong-Ru Lee, and Chia-Hui Yeh. Query by tapping: A new paradigm for content-based music retrieval from acoustic input. In Advances in Multimedia Information Processing, volume 2195 of Lecture Notes in Computer Science, pages 590–597. Springer Berlin Heidelberg, 2001. [20] Jesper Hojvang Jensen, Mads G. Christensen, and Soren Holdt Jensen. A chroma-based tempo-insensitive distance measure for cover song identification using the 2d autocorrelation function. In Proceedings of the Music Information Retrieval Evaluation eXchange 2008, 2008. [21] Kristoffer Jensen. A causal rhythm grouping. In Proceedings of the Second International Conference on Computer Music Modeling and Retrieval, pages 83–95, 2004. [22] Kristoffer Jensen, Jieping Xu, and Martin Zachariasen. Rhythm-based segmentation of popular chinese music. In Proceedings of the 6th International Conference on Music Information Retrieval, pages 374–380, 2005. [23] Michael Kassler. Toward musical information retrieval. Perspectives of New Music, 4(2):59–67, 1966. [24] Anssi Klapuri and Manuel Davy, editors. Signal Processing Methods for Music Transcription. Springer, New York, 2006. [25] Peter Knees, Tim Pohle, Markus Schedl, and Gerhard Widmer. Combining audio-based similarity with web-based data to accelerate automatic music playlist generation. In Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval, pages 147–154, 2006. [26] Cyril Laurier. Automatic Classification of Musical Mood by Content Based Analysis. PhD thesis, Universitat Pompeu Fabra, 2011. 41 [27] Cyril Laurier, Jens Grivolla, and Perfecto Herrera. Multimodal music mood classification using audio and lyrics. In Proceedings of the 7th International Conference on Machine Learning and Applications, pages 688–693, 2008. [28] Kyogu Lee. Identifying cover songs from audio using harmonic representation. In Proceedings of the Music Information Retrieval Evaluation eXchange 2006, 2006. [29] Serge Lemouton, Diemo Schwarz, and Nicola Orio. Score following: State of the art and beyond. In Proceedings of the Conference on New Instruments for Musical Expression, 2003. [30] Kjell Lemstrom and Geraint A. Wiggins. Formalizing invariances for content-based music retrieval. In Proceedings of the 10th International Society for Music Information Retrieval Conference, pages 591–596, 2009. [31] Tao Li, Mitsunori Ogihara, and George Tzanetakis, editors. Music Data Mining. CRC Press, 2012. [32] Matija Marolt. A mid-level representation for melody-based retrieval in audio collections. IEEE Transactions on Multimedia, 10(8):1617–1625, December 2008. [33] Cory McKay and Ichiro Fujinaga. Musical genre classification: Is it worth pursuing and how can it be improved? In Proceedings of the 7th International Conference on Music Information Retrieval, 2006. [34] Dick Moelants, Olmo Cornelis, and Marc Leman. Exploring african tone scales. In Proceedings of the 10th International Society for Music Information Retrieval Conference, pages 489–494, 2009. [35] Keyvan Mohajer, Majid Emami, Michal Grabowski, and James M. Hom. System and method for storing and retrieving non-text-based information. United States Patent 8041734 B2, 2011. [36] Meinard Muller. Information Retrieval for Music and Motion. Springer Verlag, 2007. [42] Zbigniew W. Ras and Alicja A. Wieczorkowska, editors. Advances in Music Information Retrieval. Springer, 2010. [43] Nicolas Scaringella, Giorgio Zoia, and Daniel Mlynek. Automatic genre classification of music content: a survey. IEEE Signal Processing Magazine, 23(2):133–141, 2006. [44] Joan Serra, Emilia Gomez, and Perfecto Herrera. Transposing chroma representations to a common key. In Proceedings of the IEEE CS Conference on The Use of Symbols to Represent Music and Multimedia Objects, pages 45–48, 2008. [45] Joan Serra, Emilia Gomez, and Perfecto Herrera. Audio Cover Song Identification And Similarity: Background, Approaches, Evaluation, And Beyond, volume 274 of Studies in Computational Intelligence, chapter 14, pages 307–332. Springer-Verlag Berlin / Heidelberg, 2010. [46] Joan Serra, Emilia Gomez, Perfecto Herrera, and Xavier Serra. Chroma binary similarity and local alignment applied to cover song identification. IEEE Transactions on Audio, Speech and Language Processing, 16:1138–1151, 08 2008. [47] Joan Serra, Xavier Serra, and Ralph G. Andrzejak. Cross recurrence quantification for cover song identification. New Journal of Physics, 11:093017, 09 2009. [48] Paris Smaragdis and Judith C. Brown. Non-negative matrix factorization for polyphonic music transcription. In Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2003. [49] Rajeswari Sridhar and T.V.Geetha. Raga identification of carnatic music for music infomation retrieval. International Journal of Recent Trends in Engineering, 1(1):571–574, 2009. [37] Nicola Orio. Music retrieval: A tutorial and review. Foundations and Trends in Information Retrieval, 1(1):1–90, 2006. [50] Wei-Ho Tsai, Hung-Ming Yu, and Hsin-Min Wang. Using the similarity of main melodies to identify cover versions of popular songs of music document retrieval. Journal of Information Science and Engineering, 24(6):1669–1687, 2008. [38] Oscar Celma and Paul Lamere. If you like the beatles you might like. . . : a tutorial on music recommendation. In Proceedings of the 16th ACM international conference on Multimedia, pages 1157–1158, 2008. [51] Rainer Typke, Frans Wiering, and Remco C. Veltkamp. A survey of music information retrieval systems. In The 6th International Conference on Music Information Retrieval, pages 153–160, 2005. [39] Elias Pampalk, Andreas Rauber, and Dieter Merkl. Contentbased organization and visualization of music archives. In Proceedings of the 10th ACM International Conference on Multimedia, pages 570–579, 2002. [52] Avery Wang. An industrial-strength audio search algorithm. In Proceedings of the 4th International Conference on Music Information Retrieval, 2003. [40] Helene Papadopoulos and Geoffroy Peeters. Large-scale study of chord estimation algorithms based on chroma representation and hmm. In Proceedings of the 5th International Conference on Content-Based Multimedia Indexing, 2007. 42 [41] Geoffroy Peeters. Chroma-based estimation of musical key from audio-signal analysis. In Proceedings of the 7th International Conference on Music Information Retrieval, 2006. [53] Avery Wang. The shazam music recognition service. Communications of the ACM, 49(8):44–48, August 2006. 8. Summary of project results Music recognition and broadcast monitoring market According to the market study carried out by Music Ally, acoustic fingerprinting music recognition technologies have a strong potential for providing means through which to monitor the entirety of the music performed in any given territory, across radio and television broadcasts, as well as night clubs and other public environments. While new open-source acoustic fingerprinting services mean that it is now easier for organisations to develop automated music recognition systems in-house (or commission them from software developers), this might not necessarily lead to the most proficient and cost-effective solutions. To balance this, the increasing availability of end-to-end monitoring services suggests that competition is increasing, with the potential to drive prices down, as well as to lead to the development of improved technologies. For performance rights societies, this means that said technologies could have a very positive impact on their operations, enabling them to further enhance the accuracy with which they distribute their collections amongst their members, while simplifying the logistics of monitoring music usage. Nevertheless, partnerships between performance rights societies and service providers in this space require a deep level of collaboration, in order to ensure that the latter have a robust, up-to-date set of fingerprints in their databases, and that the former are provided with reports that meet the depth of data their operations require. In Europe, the push for the reduction of rights management fragmentation is expected to increase competition between different countries’ performing rights societies. Consequently, having music recognition and monitoring systems in place could be one of the ways in which organizations might seek to gain competitive advantages over the others. Live music identification pilot In this research pilot, in June 2012, BMAT’s music identification technology was applied for the automated reporting of musical works performed at a live music event. The cover music identification technology currently being developed by BMAT was for the first time put to the test in a real life festival setting, to provide input for a new live music reporting concept currently being developed at Teosto. version to a studio recording of the same song The piloted technology compared the live audio to a reference set of studio recordings and provides a list of matching song pairs. The pilot was carried out with three Finnish bands (Nightwish, PMMP and Notkea Rotta) at Provinssirock, one of Finland’s largest rock festivals. The shows were recorded, analysed by BMAT and the results were evaluated by Teosto. The results of the technology pilot were promising: the piloted technology provided very good results for two out of the three pilot shows, and worked especially well for works in the mainstream pop/rock genre. While certain limitations remain, it is likely that music identification technologies can in the near future provide a reliable way for music copyright organizations and other music industry players to detect and verify setlist data from live music events. While automated music broadcast monitoring for online, radio and TV is already an established market, the automated identification of musical works performed at live shows remains a technological challenge. Changes in tempo, key, instrumentation and song structure, together with audience sounds and the acoustic characteristics of a live venue, make it difficult to accurately match a live Evaluating the pilot results, Teosto also recognized three possible use scenarios for an automated live music identification service: use in large music festivals, use at active music venues such as clubs and cruise ships (integrated with a general music monitoring system for club use), and artist or tour specific use (to replace manual/online reporting). 43 DJ club monitoring pilot The pilot was carried out in December 2012 – March 2013. Five recorded DJ live sets from four Finnish DJs were submitted to BMAT by Teosto, and tested against BMAT’s commercial broadcast monitoring service Vericast, which is already in use by more than 30 performance rights organizations, for worldwide music broadcast monitoring. The DJ sets were tested against three different reference audio sets. One reference set was provided by Teosto, based on the setlists provided by the DJs, the second was put together by BMAT (including e.g. alternate versions of the reference songs), and a third, larger reference set (also by BMAT) was also used, mainly for testing the performance of the algorithm. The results of the DJ pilot were positive: out of all detectable tracks (excluding tracks with time stretching, see below), the BMAT Vericast system recognized and reported correctly all of the tracks (100%). Issues that need to be solved, however, include the effects of time stretching and other filtering commonly used by DJs that in the pilot led to five tracks (out of a total number of 44 tracks, 11%) not being identified by the piloted system. The recognized potential use cases for the piloted DJ club monitoring system are similar to the live music identification technology, with integrated club systems and artist/tour specific uses seen as more prominent for the DJ genre. Consumer survey In addition to the two technology pilots, the project included a consumer survey on the possibilities and potential of using crowdsourcing methods for gathering information about live gig setlists directly from the audience. The web survey was carried out in the web consumer panel of the Finnish market research company Taloustutkimus Oy (http://www.taloustutkimus. fi) in December 2012 – January 2013. The 639 survey respondents were persons who visit live music events at least once a month. 44 The consumer survey showed that among active Finnish music fans who frequently attend live shows, there is an interest towards using interactive services focused around gig set lists. However, this will probably only be a niche market at best, and from the point of view of Teosto, crowdsourcing set lists from audience members on a large scale is currently not a viable alternative to manual reporting and/or automated reporting technologies. However, verifying automatically generated set lists by fans and audience members could be a possibility for improving the accuracy of automated set list creation. 9. Conclusions The starting point for this research project was to try and analyze some of the effects that new and emerging digital technologies will have on the music industry, and more specifically from the point of view of a music performance rights society, on the managing of different music related copyrights. As mapping out all the possible effects of different new technologies on the management of music rights would be a task beyond the scope of a single research project, the focus of this project was narrowed down to a more manageable size. We decided to focus on one area of interest that has emerged during the last decade and will have a big impact on the way music usage data is gathered and processed, within the music industry or collective management organizations: automated music recognition. In the music industry, different types of automated content recognition technologies are already widely used for tasks such as the monitoring of broadcast music, or classifying and screening content in different ways in online music and video services. While in use, these technologies are probably not yet employed to their full potential, for a number of reasons – the most important of which is the lack of universal databases that would link metadata from recorded and broadcast music to the metadata provided by copyright organizations and publishers on the authors of musical works. A new and emerging application area for these technologies is using automated music recognition systems for identifying live music and cover versions – i.e. identifying songs, or musical works, instead of finding and reporting identical matches of recorded versions broadcast on radio or TV. Computationally, the task of developing reliable commercial music recognition services for live music is a much more difficult challenge than the monitoring and reporting radio and TV music use. While academic research on the subject has been active, in 2012 there was to our knowledge no commercial live music content recognition services available on the market. Having followed some of the academic research on the subject, and on the other hand, having been in talks with providers of different broadcast monitoring services, Teosto wanted to test – if possible – one or several available solutions to the problem of live music content recognition in practice. So in the end, the focus of the current research project was threefold: first, if possible, to carry out pilot research on live music recognition in an actual live music setting, to evaluate the technology and to create a concept for the use of the technology for music copyright organizations such as Teosto. Second, to evaluate the current music recognition and broadcast monitoring market in Europe and globally. And third, to survey the current academic research on the subject of automated music recognition systems (mainly carried out within the interdisciplinary research field of Music Information Retrieval), and try to draw conclusions about the possible applications of that research for the management of music rights and within the broader music industry. In the course of the project, two independent research pilots and a separate consumer survey were carried out. The goal of the technology pilots were to test new music identification systems in a real life setting in order to gather data about the system itself, and to create an understanding of the requirements that need to be in place should a collecting society like Teosto adopt these types of technologies as a part of our reporting process. The focus was mainly on building a proof-ofconcept for the technology; business implications such as investments, costs of running the system, cost/benefit analyses etc. were outside the scope of the present project, as the technology evaluated was not a released commercial product and from the outset probably not mature enough to be implemented in its current state. The first pilot was carried out in June 2012 in the Provinssirock rock festival in Seinäjoki, Finland. The purpose of the festival pilot was to test and evaluate BMAT’s live music identification technology Vericast Covers in a live setting, with three Finnish bands from three different genres. The live shows were recorded in two versions: one from the mixing desk and one from the audience, in order to be able to determine whether audio quality would have an effect on the identification results. The second technology pilot focused on DJ sets, performed in a club environment. Five DJ sets from four Finnish DJs were recorded and tested against BMAT’s Vericast broadcast monitoring setup. From the results of the pilots, three usage scenarios/concepts were formulated from the point of view of a collection society. The two technology pilots were successful in providing proof that automated music recognition services can, in addition to broadcast music monitoring, already be used in a club environment for identifying and reporting music played by DJs, and in 1-2 years time possibly also in a live music setting for automated set list creation. 45 The pilots did also point out a number of challenges and limitations that need to be solved before adopting the technologies for large scale use. The main challenge for all automated music recognition systems to work is twofold: in order to work in an efficient way, they need a representative reference audio database that is constantly updated, and there also needs to be a reliable way to automatically match the identification results to relevant metadata – in the case of performance rights societies like Teosto, to the relevant author and publisher information. In addition, the tested live music identification and club/DJ monitoring systems also had certain technical limitations that need to be improved upon to ensure reliable results. 46 The consumer survey showed that among active Finnish music fans who frequently attend live shows, there is an interest towards using interactive services focused around gig set lists. However, the potential user base is very small, and from the point of view of Teosto, using crowdsourcing to collect set list information from audience members on a large scale is currently not a viable alternative to manual reporting and/or automated reporting technologies. However, using information gathered from fans and audience members to verify automatically generated set lists could be a possibility for improving the accuracy of automated set list creation in the future. 10. References Project deliverable Date Type Author Teosto – BMAT Vericast Covers Pilot Report 29.8.2012 Technical pilot results report BMAT Teosto – BMAT Vericast Clubs Pilot Report 21.3.2013 Technical pilot results report BMAT Analysis of the Automatic Live Music Detection Experiment 17.12.2012 Research article Teppo Ahonen State Of The Art In Music Information Retrieval: What Could Be Applied For Copyright Management 17.12.2012 Research article Teppo Ahonen Music recognition and broadcast monitoring market research 18.12.2012 Market research report Music Ally Teosto ry – biisilistapalvelututkimus 28.1.2013 Market research report Taloustutkimus Project final report 28.3.2013 Project final report Teosto Seminar DateType Location Musiikin tekijänoikeudet 2010-luvulla. 31.1.2013 Pilot results seminar Ennakkoinfo projektin tuloksista. Erottajan Kasino, Helsinki Technology, Music Rights, Licensing Finlandia Hall, Helsinki Presentation/other 21.3.2013 Project final seminar DateType Author Teosto and BMAT carry out a pioneering live music identification pilot in Finland 26.1.2013 Press release Teosto Teosto ja BMAT kehittävät ensimmäisenä maailmassa livekeikkojen automaattista musiikintunnistusta 26.1.2013 Press release Teosto Musiikintunnistuspalvelut - markkinakatsaus 31.1.2013 Presentation Ano Sirppiniemi Mikä biisi tää on? Musiikintunnistusta livekeikoilla. Livepilotin toteutus ja tulokset 31.1.2013 Presentation Turo Pekari Biisilistoja yleisöltä? Crowdsourcingin mahdollisuudet Suomessa. Kyselytutkimuksen tulokset 31.1.2013 Presentation Turo Pekari Mikä biisi tää on? Musiikintunnistuspilotti Provinssissa 8.2.2013 Presentation Ano Sirppiniemi, Turo Pekari Emerging Technologies: Teosto’s Live Music 21.3.2013 Presentation Recognition and DJ Club Monitoring Pilots Turo Pekari, Alex Loscos (BMAT) State Of The Art In Music Information Retrieval 21.3.2013 Presentation Research: Applications For Copyright Teppo Ahonen (University of Helsinki) Majority Report: Visions On the Music Monitoring Landscape - Alex Loscos Alex Loscos (BMAT) 21.3.2013 Presentation Keynote: Issues In the Music Rights Value Chain 21.3.2013 Presentation Karim Fanous (Music Ally) 47 Finnish Composers’ Copyright Society Teosto Lauttasaarentie 1, 00200 Helsinki, Finland Tel. +358 9 681 011, [email protected] www.teosto.fi