DanceReProducer: An Automatic Mashup Music Video Generation
Transcription
DanceReProducer: An Automatic Mashup Music Video Generation
3URFHHGLQJVRIWKH60&WK6RXQGDQG0XVLF&RPSXWLQJ&RQIHUHQFH-XO\3DGRYD,WDO\ DANCEREPRODUCER: AN AUTOMATIC MASHUP MUSIC VIDEO GENERATION SYSTEM BY REUSING DANCE VIDEO CLIPS ON THE WEB 1 Tomoyasu Nakano†1 Sora Murofushi‡3 Masataka Goto†2 Shigeo Morishima ‡3 † 1DWLRQDO ,QVWLWXWH RI $GYDQFHG ,QGXVWULDO 6FLHQFH DQG 7HFKQRORJ\ $,67 -DSDQ ‡ :DVHGD 8QLYHUVLW\ -DSDQ 2 3 t.nakano[at]aist.go.jp m.goto[at]aist.go.jp shigeo[at]waseda.jp ABSTRACT Original content (1st generation) Music video :H SURSRVH D GDQFH YLGHR DXWKRULQJ V\VWHP DanceReProducer WKDW FDQ DXWRPDWLFDOO\ JHQHUDWH D GDQFH YLGHR FOLS DSSURSULDWH WR D JLYHQ SLHFH RI PXVLF E\ VHJPHQWLQJ DQG FRQFDWHQDWLQJ H[LVWLQJ GDQFH YLGHR FOLSV ,Q WKLV SDSHU ZH IRFXV RQ WKH reuse RI HYHULQFUHDVLQJ XVHUJHQHUDWHG GDQFH YLGHR FOLSV RQ D YLGHR VKDULQJ ZHE VHUYLFH ,Q D YLGHR FOLS FRQVLVWLQJ RI PXVLF DXGLR VLJQDOV DQG LPDJH VHTXHQFHV YLGHR IUDPHV WKH LPDJH VHTXHQFHV DUH RIWHQ V\QFKURQL]HG ZLWK RU UHODWHG WR WKH PXVLF 6XFK UHODWLRQ VKLSV DUH GLYHUVH LQ GLIIHUHQW YLGHR FOLSV EXW ZHUH QRW GHDOW ZLWK E\ SUHYLRXV PHWKRGV IRU DXWRPDWLF PXVLF YLGHR JHQ HUDWLRQ 2XU V\VWHP HPSOR\V PDFKLQH OHDUQLQJ DQG EHDW WUDFNLQJ WHFKQLTXHV WR PRGHO WKHVH UHODWLRQVKLSV 7R JHQ HUDWH QHZ PXVLF YLGHR FOLSV VKRUW LPDJH VHTXHQFHV WKDW KDYH EHHQ SUHYLRXVO\ H[WUDFWHG IURP RWKHU PXVLF FOLSV DUH VWUHWFKHG DQG FRQFDWHQDWHG VR WKDW WKH HPHUJLQJ LPDJH VH TXHQFH PDWFKHV WKH UK\WKPLF VWUXFWXUH RI WKH WDUJHW VRQJ %HVLGHV DXWRPDWLFDOO\ JHQHUDWLQJ PXVLF YLGHRV 'DQFH5H 3URGXFHU RIIHUV D XVHU LQWHUIDFH LQ ZKLFK D XVHU FDQ LQWHU DFWLYHO\ FKDQJH LPDJH VHTXHQFHV MXVW E\ FKRRVLQJ GLIIHUHQW FDQGLGDWHV 7KLV ZD\ SHRSOH ZLWK OLWWOH NQRZOHGJH RU H[ SHULHQFH LQ 0$' PRYLH JHQHUDWLRQ FDQ LQWHUDFWLYHO\ FUHDWH SHUVRQDOL]HG YLGHR FOLSV Mashup music videos (User-generated video clips) Creators + Tomoyasu article Nakano distributed under Creative Commons Attribution 3.0 Unported License, et al. the which terms This is of the permits reuse + new reuse + reuse + 3rd generation reuse reuse + Nth generation Figure 1 *HQHUDWLRQ RI PDVKXS PXVLF YLGHRV XVHU JHQHUDWHG PXVLF YLGHR FOLSV E\ UHXVLQJ H[LVWLQJ RULJLQDO FRQWHQW WKH 0$' YLGHR FOLSV JHQHUDWHG E\ XVHUV FDQ EH FRQVLGHUHG 2nd generation (secondary or derivative) content )LJXUH ,Q D 0$' YLGHR FOLS JRRG PXVLFWRLPDJH V\QFKUR QL]DWLRQ ZLWK UHVSHFW WR UK\WKP LPSUHVVLRQ DQG FRQWH[W LV LPSRUWDQW $OWKRXJK LW LV HDV\ WR HQMR\ ZDWFKLQJ 0$' PRYLHV LW LV QRW HDV\ WR JHQHUDWH WKHP EHFDXVH D FUHDWRU QHHGV WR VHDUFK LQ H[LVWLQJ YLGHR FOLSV IRU LPDJH VHTXHQFHV WKDW JLYH LPSUHVVLRQV DSSURSULDWH WR D JLYHQ WDUJHW PXVL FDO SLHFH VHJPHQW DQG FRQFDWHQDWH LPDJH VHTXHQFHV WR ¿W WKH WDUJHW SLHFH DQG WLPHVWUHWFK WKH VHTXHQFHV WR PDWFK WKH WHPSR RI WKH WDUJHW SLHFH EHFDXVH H[LVWLQJ YLGHR FOLSV XVXDOO\ KDYH WHPSL GLIIHUHQW IURP WKH WHPSR RI WKH WDUJHW SLHFH 0RUHRYHU IRU EHWWHU PXVLFWRLPDJH V\QFKUR QL]DWLRQ WKH PXVLF VWUXFWXUH DQG FRQWH[W RI D PXVLFDO SLHFH DQG LPDJH VHTXHQFHV VKRXOG EH WDNHQ LQWR DFFRXQW EXW LW UHTXLUHV FRQVLGHUDEOH WLPH DQG HIIRUW 7R JLYH D FKDQFH RI HQMR\LQJ VXFK GLI¿FXOW 0$' PRYLH JHQHUDWLRQ WR HYHU\ERG\ ZH KDYH GHYHORSHG D QHZ V\V WHP FDOOHG DanceReProducer WKDW FDQ DXWRPDWLFDOO\ JHQ unre- stricted use, distribution, and reproduction in any medium, provided the original author and source are credited. reuse new 2nd generation reuse + + ... F 2011 open-access reuse new + 8VHUJHQHUDWHG YLGHR FOLSV FDOOHG MAD movies RU mashup videos HDFK RI ZKLFK LV D GHULYDWLYH PL[WXUH RU FRPELQDWLRQ RI VRPH RULJLQDO YLGHR FOLSV DUH JDLQLQJ SRS XODULW\ RQ WKH ZHE DQG D ORW RI WKHP KDYH EHHQ XSORDGHG DQG DUH DYDLODEOH RQ YLGHR VKDULQJ ZHE VHUYLFHV ,Q WKLV SD SHU ZH IRFXV RQ PXVLF YLGHR FOLSV RI GDQFH VFHQHV GDQFH YLGHR FOLSV LQ WKH IRUP RI 0$' PRYLHV RU PDVKXS YLGHRV 6XFK D 0$' PXVLF YLGHR FOLS FRQVLVWV RI D PXVLFDO SLHFH DXGLR VLJQDOV DQG LPDJH VHTXHQFHV YLGHR IUDPHV WDNHQ IURP RWKHU RULJLQDO YLGHR FOLSV 7KH RULJLQDO YLGHR FOLSV DUH FDOOHG 1st generation (primary or original) content DQG an reuse reuse 1. INTRODUCTION Copyright: Picture Music Dance video KWWSHQZLNLSHGLDRUJZLNL0$' 0RYLH 3URFHHGLQJVRIWKH60&WK6RXQGDQG0XVLF&RPSXWLQJ&RQIHUHQFH-XO\3DGRYD,WDO\ HUDWH D GDQFH YLGHR FOLS IRU DQ\ JLYHQ SLHFH RI PXVLF E\ VHJPHQWLQJ FRQFDWHQDWLQJ DQG VWUHWFKLQJ H[LVWLQJ GDQFH YLGHR FOLSV )LJXUH 7KLV V\VWHP SURYLGHV DQ LQWHUIDFH LQ ZKLFK D XVHU QRW RQO\ OLVWHQV WR PXVLF EXW DOVR HQMR\V PX VLF YLVXDOO\ E\ GLUHFWLQJ VXSHUYLVLQJ WKH VHPLDXWRPDWLF JHQHUDWLRQ RI GDQFH YLGHR LPDJH VHTXHQFHV ,I WKH DXWR PDWLFDOO\ JHQHUDWHG YLGHR FOLS LV VDWLVIDFWRU\ WKH XVHU FDQ MXVW ZDWFK LW EXW LI WKH XVHU GRHV QRW OLNH JHQHUDWHG LP DJH VHTXHQFHV IRU VRPH PXVLFDO VHFWLRQV e.g. $ % DQG & LQ )LJXUH WKH XVHU FDQ HDVLO\ FKRRVH DQRWKHU IDYRULWH LPDJH VHTXHQFH IURP UDQNHG FDQGLGDWHV IRU HDFK PXVLFDO VHFWLRQ 7KHVH FDQGLGDWHV DUH DOVR DXWRPDWLFDOO\ SURSRVHG E\ WKH V\VWHP DQG ZRXOG DOVR PDWFK D JLYHQ PXVLFDO VHF WLRQ RI WKH LQSXW SLHFH DFFRUGLQJ WR RXU PDSSLQJ PRGHO 7KLV PDSSLQJ PRGHO ZDV WUDLQHG WKURXJK DQ DQDO\VLV RI D ODUJH DPRXQW RI XVHUJHQHUDWHG GDQFH YLGHR FOLSV DYDLODEOH RQ D YLGHR VKDULQJ ZHE VHUYLFH ,Q SDUWLFXODU ZH IRFXV RQ WKH UHXVH RI YLGHR FOLSV RI WKH QG UG DQG N WK JHQHUDWLRQ FRQWHQW )LJXUH DV ZHOO DV WKH VW JHQHUDWLRQ FRQWHQW ,Q RWKHU ZRUGV RXU V\VWHP HQDEOHV D XVHU WR JHQHUDWH D QHZ PDVKXS YLGHR FOLS E\ UHXVLQJ H[LVWLQJ PDVKXS YLGHR FOLSV RQ WKH ZHE Automatic mashup music video generation system Web Video2 Video3 ... ... ... ... VideoN Stretch and Concatenate Output Chorus Image sequence Input Music Estimated music structure A A B B B A A B B B C C C Figure 2 $Q DXWRPDWLF PXVLF YLGHR JHQHUDWLRQ V\VWHP DanceReProducer E\ UHXVLQJ H[LVWLQJ PXVLF YLGHR FOLSV • Impression 9LVXDO LPSUHVVLRQV VXFK DV GDQFH PR WLRQ FRORU EULJKWQHVV DQG OLJKWLQJ DUH V\QFKURQL]HG ZLWK WKH PXVLFDO LPSUHVVLRQ Context relationships FULWHULD IRU FRQWH[W V\QFKURQL]D WLRQ EHWZHHQ PXVLF DQG LPDJH VHTXHQFHV 2. RELATED WORK 3UHYLRXV ZRUNV JHQHUDWHG YLVXDO SDWWHUQV EDVHG RQ VRPH PXVLFDO DVSHFWV VXFK DV YLVXDOL]LQJ PXVLF FKRUGV E\ FRORU >@ YLVXDOL]LQJ PXVLFDO PRRG >@ DQG FRQWUROOLQJ D FRPSXWHUJUDSKLFV GDQFHU XQGHU PXVLFDO EHDWV > @ 7KHUH ZHUH DOVR SUHYLRXV ZRUNV DXWRPDWLFDOO\ JHQHUDWLQJ PXVLFV\QFKURQL]HG YLGHR E\ UHXVLQJ PHGLD FRQWHQW IRU H[DPSOH VRPH UHXVHG LPDJHV DQG SKRWRJUDSKV IURP WKH ZHE > @ DQG RWKHUV UHXVHG KRPH YLGHRV > @ XQGHU DX GLR FKDQJHV >@ RU UHSHWLWLYH YLVXDO DQG DXUDO SDWWHUQV >@ 3UHYLRXV ZRUNV KRZHYHU GLG QRW UHXVH GDQFH YLGHR FOLSV RQ WKH ZHE WR JHQHUDWH D QHZ PDVKXS YLGHR FOLS • Music structure 9LVXDO LPSUHVVLRQ WHPSRUDO FKDQJHV DUH V\QFKURQL]HG ZLWK WKH PXVLF VWUXFWXUH e.g. YHUVH $ FKRUXV • Temporal continuity ,PDJH VHTXHQFH KDV WHPSR UDO FRQWLQXLW\ EXW YLVXDO LPSUHVVLRQ FDQ EH FKDQJHG HDVLO\ RQ D PXVLF VWUXFWXUH ERXQGDU\ 7KH DERYH FULWHULD DUH QRW DOO VDWLV¿HG DW DQ\ JLYHQ WLPH DQG DUH QRW PXWXDOO\ LQGHSHQGHQW +RZHYHU WKH\ SURYLGH D XVHIXO IRXQGDWLRQ IRU JHQHUDWLQJ DQ LPDJH VHTXHQFH DS SURSULDWH WR D SDUWLFXODU SLHFH RI PXVLF 3. SYSTEM DESIGN 3.2 Image sequence generation 7R GHYHORS 'DQFH5H3URGXFHU ZH ¿UVW FRQVLGHUHG WKH FUL WHULD WKDW SHRSOH XVH LQ MXGJLQJ ³ZKDW LV DQ DSSURSULDWH LP DJH VHTXHQFH IRU D SDUWLFXODU SLHFH RI PXVLF´ DV GHVFULEHG EHORZ :H WKHQ GHVFULEH IXQFWLRQV RI WKH V\VWHP LQWHUIDFH 7KH PDVKXS YLGHR JHQHUDWLRQ GRQH PDQXDOO\ LV GLI¿FXOW DQG WLPHFRQVXPLQJ 7R HQDEOH PRUH HI¿FLHQW JHQHUDWLRQ RXU V\VWHP ¿UVW DXWRPDWLFDOO\ JHQHUDWHV DQ LPDJH VHTXHQFH DSSURSULDWH WR WKH PXVLF +RZHYHU WKH JHQHUDWHG VHTXHQFH PD\ QRW EH WR WKH XVHU¶V WDVWH ,Q VXFK FDVHV RWKHU VH TXHQFH FDQGLGDWHV DUH VKRZQ RQ D VFUHHQ VR WKDW WKH XVHU FDQ VLPSO\ FKRRVH D SUHIHUUHG RQH (YHQ WKRXJK LW ZRXOG EH GLI¿FXOW IRU D XVHU WR PDQXDOO\ ¿QG DQRWKHU FDQGLGDWH IURP DPRQJ D KXJH QXPEHU RI FDQGLGDWHV LW LV HDV\ WR LQ WHUDFWLYHO\ FKRRVH D SUHIHUUHG FDQGLGDWH :H SURYLGH DQ RYHUYLHZ RI WKH LQWHUIDFH¶V LPDJH VH TXHQFH JHQHUDWLRQ DQG IXQFWLRQV EHORZ 3.1 Criteria of natural/skillful relationships between an image sequence and music 7R GHVLJQ WKH V\VWHP ZH FRQVLGHUHG WKH FULWHULD IURP WZR DVSHFWV ± ORFDO UHODWLRQVKLSV DQG FRQWH[W JOREDO UH ODWLRQVKLSV H[SODLQHG EHORZ ± WDNLQJ LQWR DFFRXQW SUHYLRXV ZRUN > @ DQG WKH FRPPHQWV RIIHUHG E\ KXPDQ FUHDWRUV RI 0$' PRYLHV 3.2.1 Automatic image sequence generation Local relationships FULWHULD IRU LPSUHVVLRQ V\QFKURQL]D WLRQ EHWZHHQ WKH PXVLF DQG LPDJH VHTXHQFHV 7R UHXVH H[LVWLQJ FRQWHQW ZH ¿UVW JDWKHU GDQFH YLGHR FOLSV RQ D YLGHR VKDULQJ ZHE VHUYLFH DQG WKH V\VWHP HVWLPDWHV WKH WHPSR DQG EDU OLQH RI WKH PXVLF DXGLR VLJQDOV LQ WKRVH YLGHR FOLSV :H DVVXPH WKH PXVLF DQG LWV GDQFH PRWLRQV ZLWKLQ HDFK YLGHR FOLS DUH V\QFKURQL]HG ZKLOH GHDOLQJ ZLWK WKH ORFDO UHODWLRQVKLSV DQG XVH HDFK EDU PHDVXUH RI WKH • Rhythm 9LVXDO UK\WKPV VXFK DV GDQFH PRWLRQ FDP HUD ZRUN DQG FXW e.g. GLVVROYH DUH V\QFKURQL]HG ZLWK EHDW DQG PXVLFDO DFFHQW Segment Music video clips Video1 6RPH FUHDWRUV GLVFORVHG WKHLU FUHDWLYH SURFHVVHV RQ WKH ZHE 3URFHHGLQJVRIWKH60&WK6RXQGDQG0XVLF&RPSXWLQJ&RQIHUHQFH-XO\3DGRYD,WDO\ 1 2 8 7 6 3 5 4 Figure 4 ([DPSOH RI LQWHUDFWLYH VHTXHQFH VHOHFWLRQ )RXU GLIIHUHQW LPDJH VHTXHQFH FDQGLGDWHV DUH SUHYLHZHG DQG WKH ORZHUULJKW FDQGLGDWH LV FKRVHQ E\ D XVHU Figure 3 ([DPSOH RI WKH 'DQFH5H3URGXFHU VFUHHQ PXVLF DV WKH PLQLPXP XQLW IRU VHJPHQWLQJ DQG FRQFDWH QDWLQJ LPDJH VHTXHQFHV +HUHDIWHU ZH GHQRWH DQ LPDJH VHTXHQFH VHULHV RI YLGHR IUDPHV IRU WKH EDUOHYHO PLQL PXP XQLW DV D visual unit 6HFRQG WKH V\VWHP VHDUFKHV IRU D YLVXDO XQLW DSSURSULDWH WR HDFK EDU IRU WKH LQSXW WDUJHW PXVLFDO SLHFH 7KH XQLWV DUH WLPHVWUHWFKHG XQGHU WKH WHPSR RI WKH LQSXW PXVLF DQG WKHQ DUH FRQFDWHQDWHG WR JHQHUDWH DQ LPDJH VHTXHQFH ,Q WKLV UHJDUG WR GHDO ZLWK WKH FRQWH[W UHODWLRQVKLSV WKH V\VWHP VHOHFWV YLVXDO XQLWV ZKLFK WDNH LQWR DFFRXQW PXVLF VWUXFWXUH DQG WHPSRUDO FRQWLQXLW\ 7R VDWLVI\ HDFK FULWHULRQ GHVFULEHG LQ ZH LPSOHPHQW WKH IROORZLQJ SURFHVVHV 7KLV LQWHUIDFH DOVR SURYLGHV WKH IROORZLQJ IXQFWLRQV WR UH ÀHFW WKH XVHU¶V SUHIHUHQFHV Interactive re-selection of a generated image sequence %\ FOLFNLQJ WKH 1* EXWWRQ )LJXUH WKH XVHU FDQ VHH RWKHU VHTXHQFH FDQGLGDWHV RQ D VFUHHQ DQG VLPSO\ FKRRVH WKH SUHIHUUHG RQH )LJXUH 7KH XVHU FDQ VHH DQG FRPSDUH GLIIHUHQW FDQGLGDWHV GXULQJ SOD\EDFN DQG FDQ FKRRVH KLVKHU IDYRULWH VHTXHQFH 6LQFH WKLV LQWHUDFWLYH UHVHOHFWLRQ IXQF WLRQ ZRUNV RQ HDFK VHFWLRQ RI WKH PXVLF VWUXFWXUH e.g. $ % DQG & LQ )LJXUH WKH XVHU FDQ XVH WKLV IXQFWLRQ WR HDVLO\ FRQVLGHU WKH PXVLF VWUXFWXUH DQG FRQWH[W Jumping to the beginning of sections %\ FOLFNLQJ WKH MXPS EXWWRQ )LJXUH RU YLVXDOL]HG VHFWLRQV D XVHU FDQ GLUHFWO\ MXPS WR DQG YLHZ WKH SUHYLRXV RU WKH QH[W VHFWLRQ RI D VRQJ Rhythmic synchronization $ PXVLFDO EDU LV XVHG DV WKH PLQLPXP XQLW IRU VHJPHQWLQJ DQG FRQFDWHQDWLQJ $ YLVXDO XQLW LV VWUHWFKHG XQGHU LQSXW PXVLF WHPSR Impression synchronization %\ PRGHOLQJ WKH PDSSLQJ EHWZHHQ WKH H[WUDFWHG DXGLR DQG YLVXDO IHDWXUHV IRU LPSUHVVLRQ WKH V\VWHP DXWRPDWLFDOO\ VHOHFWV DQ DS SURSULDWH YLVXDO XQLW WR LQSXW PXVLF LPSUHVVLRQ LQ HDFK EDU Music structure DQG Temporal continuity %\ LQWURGXFLQJ FRVWV UHSUHVHQWLQJ WKH WHPSRUDO FRQWLQXLW\ DQG PXVLF VWUXFWXUH RI WKH JHQHUDWHG VHTXHQFH WKH V\VWHP DXWR PDWLFDOO\ VHOHFWV DQ LPDJH VHTXHQFH FRQVLGHULQJ WKH FRQWH[W UHODWLRQVKLSV 4. INTERNAL MECHANISM OF DANCEREPRODUCER 7R GHYHORS 'DQFH5H3URGXFHU ZH PRGHOHG WKH UHODWLRQ VKLSV EHWZHHQ PXVLF DQG YLGHR DQG WKHQ JHQHUDWHG LPDJH VHTXHQFHV DSSURSULDWH WR LQSXW PXVLF E\ FRQVLGHULQJ WKH ORFDO DQG FRQWH[W UHODWLRQVKLSV ,Q JHQHUDO LW LV GLI¿FXOW WR PRGHO VXFK UHODWLRQVKLSV EXW ZH VROYHG WKLV SUREOHP WKURXJK WUDLQLQJ XVLQJ D KXJH TXDQWLW\ PDVKXS YLGHR FOLSV SRVWHG WR WKH ZHE 6LQFH WKH FRQWHQW YLGHRV ZHUH PDGH E\ KXPDQV WKHUH ZHUH YDULRXV W\SHV RI PXWXDO UHODWLRQVKLS EHWZHHQ WKH PXVLF DQG WKH LPDJH VHTXHQFHV 7KLV VXJ JHVWV WKDW VXFK YLGHRV FDQ EH XVHG WR OHDUQ WKH UHODWLRQVKLSV WKURXJK D PDFKLQHOHDUQLQJ WHFKQLTXH 0RGHOLQJ XVLQJ WKH PDVKXS FOLSV VXIIHUV IURP WZR SURE OHPV 2QH LV WKDW FRPSOH[ UHODWLRQVKLSV H[LVW VXFK DV ZKHUH ³WKH VDPH LPDJH VHTXHQFHV DUH XVHG IRU GLIIHUHQW PXVLF´ RU ³GLIIHUHQW LPDJH VHTXHQFHV DUH XVHG IRU WKH VDPH PXVLF´ )LJXUH $QRWKHU SUREOHP LV WKDW WKH YLGHR TXDO LW\ YDULHV VWURQJO\ DQG LW LV GLI¿FXOW WR MXGJH WKH SRVVLELOLW\ 3.2.2 Interface 6FUHHQVKRWV RI WKH LPSOHPHQWHG 'DQFH5H3URGXFHU LQWHU IDFH DUH VKRZQ LQ )LJXUH DQG 7KHUH DUH EDVLF IXQFWLRQV IRU YLHZLQJ VXFK DV D ZLQGRZ VKRZLQJ WKH JHQHUDWHG LP DJH VHTXHQFH )LJXUH IXQFWLRQV WR ORDG LQSXW PXVLF DQG VDYH WKH JHQHUDWHG YLGHR WR SOD\ DQG VWRSSDXVH WKH JHQHUDWHG YLGHR DQG D SOD\EDFNSRVLWLRQ ³VOLGHU´ DQG WKH PXVLF VWUXFWXUH HVWLPDWHG DXWRPDWLFDOO\ >@ 7KH JUHHQ UHFWDQJXODU PDUNHUV LQ WKH PXVLF VWUXFWXUH UHSUH VHQW FKRUXV VHFWLRQV DQG WKH EOXH PDUNHUV UHSUHVHQW RWKHU VHFWLRQV ,Q DGGLWLRQ WKH WRWDO GXUDWLRQ RI WKH LQSXW PXVLF LV HTXDOO\ GLYLGHG LQWR VHFWLRQV 3URFHHGLQJVRIWKH60&WK6RXQGDQG0XVLF&RPSXWLQJ&RQIHUHQFH-XO\3DGRYD,WDO\ Database construction A Gather videos B Extract frame feature C Extract bar-level feature E Construct Resampling Database ... 1 frame Image sequence Web music video clips (30 fps, 44.1kHz) ... Dance video 1 bar 30 fps Resampling View count View count (16 points) D Music ... 44.1kHz DCT 1 bar Beat tracking 1 feature vector ( = (3rd order with DC) + ) ... View count Video generation Input music G Reconstruct H Train mapping model Database Feature space User ... Music F Extract bar-level feature ... 0% input tempo PCA DCT > 20% under the Euclidean distance ... Visual ... View count ... linear regression g htin weig lation u lc a c Output Viterbi search clustering <2 ... mapping I Select visual unit ... A A A A Music B B structure Stretch and concatenate of the unit ... Figure 5 2YHUYLHZ RI 'DQFH5H3URGXFHU D GDQFH YLGHR DXWKRULQJ V\VWHP WKDW FDQ DXWRPDWLFDOO\ JHQHUDWH D GDQFH YLGHR FOLS DSSURSULDWH WR D JLYHQ SLFH RI PXVLF E\ VHJPHQWLQJ FRQFDWHQDWLQJ DQG VWUHWFKLQJ H[LVWLQJ GDQFH YLGHR FOLSV 7KH V\VWHP ¿UVW FDOFXODWHV WKH SRZHU RI WKH LQSXW DXGLR VLJQDO DQG WKHQ FDOFXODWHV LWV DXWRFRUUHODWLRQ YDOXHV DQG HVWLPDWHV WKHLU SHDN WLPH 6LQFH LW UHSUHVHQWV WKH SHULRGLF LW\ RI WKH SRZHU ZH XVH WKH WLPH DV WHPSR RQH EHDW WLPH ,Q WKLV UHJDUG WR DYRLG RFWDYH HUURU e.g. KDOIGRXEOH WHPSR HUURU WKH HVWLPDWLRQ LV OLPLWHG WR WHPSR ZLWKLQ D UDQJH RI 60 − 120 ESP EHDW SHU PLQXWH 6HFRQG WKH V\VWHP FDOFXODWHV FURVVFRUUHODWLRQ EHWZHHQ WKH SRZHU DQG WKH SXOVH VLJQDO JHQHUDWHG XQGHU WKH HVWL PDWHG WHPSR 6LQFH WKH SHDN WLPH RI WKH FURVVFRUUHODWLRQ UHSUHVHQWV WKH ¿UVW EHDW WLPH WKH V\VWHP UHJDUGV WKH WLPH DV WKH EHJLQQLQJ WLPH RI WKH ¿UVW EDU ,Q DGGLWLRQ ZH DV VXPH WKDW WKH GDWDVHW YLGHRV KDYH D OHQJWK RI EHDWV RQH PHDVXUH LQ WLPH DQG WKHQ WKH V\VWHP GHFLGHV DOO EDU OLQHV PHFKDQLFDOO\ RI LWV UHXVH 7KHVH REVWDFOHV PDNH LW GLI¿FXOW WR PRGHO WKH UHODWLRQVKLSV DQG ZHUH QRW GHDOW ZLWK SUHYLRXV ZRUNV )LJXUH JLYHV DQ RYHUYLHZ RI WKH 'DQFH5H3URGXFHU V\V WHP 7KH V\VWHP FRQVLVWV RI WZR SURFHGXUHV GDWDEDVH FRQ VWUXFWLRQ DQG YLGHR JHQHUDWLRQ ,Q WKLV VHFWLRQ ZH GHVFULEH WKH GHWDLOV RI WKH V\VWHP DQG H[SODLQ KRZ ZH VROYH WKH DERYH WZR SUREOHPV LQ PRGHOLQJ XVLQJ WKH PDVKXS FOLSV 4.1 Database construction ,Q WKH GDWDEDVH FRQVWUXFWLRQ GDWDEDVH YLGHRV DUH JDWKHUHG YLD WKH ZHE DQG WKHQ DXGLR DQG YLVXDO IHDWXUHV DUH H[WUDFWHG IURP WKH YLGHRV WKURXJK WKH IROORZLQJ VWHSV 6WHS *DWKHU GDQFH PXVLF YLGHRV YLD ZHE DQG UHVDPSOH WKH VDPSOLQJ IUHTXHQF\ RI WKH PXVLF WR N+] DQG WKH IUDPHUDWH RI WKH LPDJH VHTXHQFH WR ISV )LJXUH $ 6WHS (VWLPDWH EDU OLQH RI WKH YLGHRV E\ XVLQJ EHDW WUDFN LQJ WHFKQLTXHV % 6WHS ([WUDFW IHDWXUH YHFWRUV WR OHDUQ WKHLU UHODWLRQVKLS ± % & 6LQFH WKH DQDO\VLV IUDPH PDWFKHV WKH IUDPH UDWH WKH GLVFUHWH WLPH VWHS frame-time LV DERXW PV DERXW SRLQWV 7KH H[WUDFWHG IHDWXUHV LQ HDFK IUDPHWLPH DUH FDOOHG frame features 7KH IUDPH IHDWXUHV DUH WKHQ LQWHJUDWHG LQ HDFK EDU WR RE WDLQ ZKDW DUH FDOOHG bar-level features 4.1.2 Frame feature extraction (Music) 7KH IUDPH IHDWXUHV RI PXVLF DUH GH¿QHG ZLWK WKH KHOS RI SUHYLRXV ZRUN RQ UHODWLRQVKLSV EHWZHHQ DXGLR DQG YLVXDO > @ DQG PXVLFDO JHQUH FODVVL¿FDWLRQ >@ 7KHVH IHD WXUHV UHSUHVHQW PXVLFDO DFFHQWV DQG LPSUHVVLRQV $V WKH IUDPH IHDWXUHV IRU DFFHQWV WR UHSUHVHQW WHPSRUDO FKDQJH LQ WKH SRZHU RI WKH DXGLR VLJQDO ZH H[WUDFW WKH ¿O WHU EDQN RXWSXW 4 GLPV DQG VSHFWUDO ÀX[ 1 GLP $V WKH IUDPH IHDWXUHV IRU LPSUHVVLRQV WR UHSUHVHQW WLPEUH ZH H[ WUDFW WKH ]HURFURVVLQJ UDWH 1 GLP DQG WK RUGHU 0)&&V PHOIUHTXHQF\ FHSVWUDO FRHI¿FLHQWV ZLWK D '& FRPSR QHQW 13 GLPV 4.1.1 Beat tracking 4.1.3 Frame feature extraction (Image sequence) 0XFK ZRUN KDV EHHQ GRQH RQ EHDW WUDFNLQJ >@ DQG ZH SODQ WR IRFXV RQ XVLQJ VXFK WHFKQLTXHV LQ WKH IXWXUH EXW RXU FXUUHQW LPSOHPHQWDWLRQ LV D VLPSOH RQH ZKLFK ZDV HIIHFWLYH LQ RXU SUHOLPLQDU\ H[SHULPHQW 7KH IUDPH IHDWXUHV RI DQ LPDJH VHTXHQFH DUH GH¿QHG ZLWK WKH KHOS RI SUHYLRXV ZRUN RQ UHODWLRQVKLSV EHWZHHQ DXGLR DQG YLVXDO >@ 7KHVH IHDWXUHV UHSUHVHQW YLVXDO DFFHQWV 3URFHHGLQJVRIWKH60&WK6RXQGDQG0XVLF&RPSXWLQJ&RQIHUHQFH-XO\3DGRYD,WDO\ DQG LPSUHVVLRQV 7R H[WUDFW WKH IHDWXUHV WKH LPDJH UHVROX WLRQ LV UHVDPSOHG WR 128 × 96 $V WKH IUDPH IHDWXUHV IRU DFFHQWV WR UHSUHVHQW FDPHUD ZRUN DQG GDQFH PRWLRQ DQG UHODWHG WHPSRUDO FKDQJHV ZH H[WUDFW WKH PHDQ YDOXHV RI WKH WHPSRUDO GHULYDWLYH RI WKH ZHOONQRZQ RSWLFDO ÀRZ DQG EULJKWQHVV 2 GLPV :H XVH D EORFNPDWFKLQJ DOJRULWKP WR GHWHFW WKH RSWLFDO ÀRZ IURP LPDJH VHTXHQFHV ZH XVH D 64 × 48 EORFN ZKLFK LV VKLIWHG E\ PD[LPXP UDQJH LV 7KH IUDPH IHDWXUHV IRU LP SUHVVLRQV DUH WKH PHDQ YDOXHV DQG VWDQGDUG GHYLDWLRQV RI WKH KXH VDWXUDWLRQ DQG EULJKWQHVV YDOXHV 6 GLPV ,Q DG GLWLRQ GLPHQVLRQDO '&7 GLVFUHWH FRVLQH WUDQVIRUP FR HI¿FLHQWV DUH H[WUDFWHG 4 GLPV IRU YHUWLFDO DQG 3 GLPV IRU KRUL]RQWDO 4.2.1 Linear regression models for multiple clusters ,Q WKLV SDSHU D ORFDO FRVW LV FDOFXODWHG E\ D OLQHDU UHJUHV VLRQ PRGHO ZKLFK LV XVHG WR OHDUQ WKH UHODWLRQVKLSV EH WZHHQ WKH DXGLR DQG YLVXDO EDUOHYHO IHDWXUHV +RZHYHU WR PRGHO FRPSOH[ UHODWLRQVKLSV VXFK DV ³WKH VDPH YLVXDO XQLWV DUH XVHG IRU GLIIHUHQW PXVLF´ RU ³GLIIHUHQW YLVXDO XQLWV DUH XVHG IRU WKH VDPH PXVLF´ )LJXUH RQH UHJUHVVLRQ PRGHO LV LQVXI¿FLHQW 7KHUHIRUH ZH SURSRVH D OLQHDU UHJUHVVLRQ ZKHUH WKH V\V WHP XVHV OLQHDU UHJUHVVLRQ PRGHOV IRU PXOWLSOH FOXVWHUV 7KH PXOWLSOH FOXVWHUV DUH REWDLQHG E\ DSSO\LQJ kPHDQV FOXVWHULQJ WR IHDWXUH YHFWRUV ZKHUH D IHDWXUH YHFWRU LV GH ¿QHG DV D FRQFDWHQDWLRQ RI D EDUOHYHO DXGLR IHDWXUH RI PXVLF DQG D EDUOHYHO YLVXDO IHDWXUH RI LPDJH VHTXHQFHV LQ WKH GDWDEDVH 1RWH WKDW WKLV IHDWXUH YHFWRU LV XVHG MXVW IRU WKH FOXVWHULQJ )RU HDFK FOXVWHU D OLQHDU UHJUHVVLRQ PRGHO LV WUDLQHG VR WKDW EDUOHYHO YLVXDO IHDWXUHV FDQ EH SUHGLFWHG E\ EDUOHYHO DXGLR PXVLF IHDWXUHV )LJXUH + 4.1.4 Bar-level feature extraction :H SURSRVH D bar-level feature ZKLFK LV DQ LQWHJUDWLRQ RI WKH IUDPH IHDWXUHV LQ HDFK EDU 7R H[WUDFW IHDWXUHV IURP RQH SLHFH RI PXVLF RU RQH YLGHR FOLS LQ PRVW SUHYLRXV ZRUN e.g. PXVLFDO JHQUH FODVVL¿FDWLRQ LQWHJUDWLRQ ZDV GRQH XV LQJ WKH WLPH DYHUDJH DQG LWV VWDQGDUG GHYLDWLRQ >@ +RZ HYHU VXFK LQWHJUDWLRQ GURSV WHPSRUDO LQIRUPDWLRQ RI WKH DXGLRYLVXDO IHDWXUHV ,Q WKLV SDSHU ZH LQWHJUDWH WKHVH IUDPH IHDWXUHV WR EDU OHYHO IHDWXUHV YLD XVLQJ '&7 )LJXUH ' ,Q HDFK EDU IUDPH IHDWXUHV DUH UHVDPSOHG WR SRLQWV IRU WKH WLPH D[LV WKH V\VWHP FRPSXWHV '&7 IRU HDFK GLPHQVLRQ DQG WKHQ WKH UG RUGHU '&7 FRHI¿FLHQWV ZLWK D '& FRPSRQHQW XVHG DV WKH EDUOHYHO IHDWXUHV 7KHUHIRUH WKH QXPEHU RI GLPHQ VLRQV RI WKH EDUOHYHO IHDWXUHV LV IRXU WLPHV WKH QXPEHU IURP WKH IUDPH IHDWXUHV 4.2.2 Image sequence selection under the criteria for natural/skillful relationships %\ LQWURGXFLQJ FRVWV UHSUHVHQWLQJ WKH ORFDO DQG FRQWH[W UH ODWLRQVKLSV ZH FDQ VROYH WKLV YLGHR JHQHUDWLRQ SUREOHP E\ PLQLPL]LQJ WKH FRVWV WKURXJK D 9LWHUEL VHDUFK )LJXUH , 7KH PRGHO RI WKH FOXVWHU KDYLQJ WKH FHQWURLG QHDUHVW WR WKH LQSXW IHDWXUHV LV VHOHFWHG DQG YLVXDO IHDWXUHV DSSUR SULDWH WR WKH LQSXW DXGLR IHDWXUHV DUH HVWLPDWHG E\ XVLQJ WKH PRGHO 7R FDOFXODWH WKH FRVWV RI WKH ORFDO UHODWLRQVKLSV WKH V\VWHP FDOFXODWHV WKH GLVWDQFH EHWZHHQ WKH HVWLPDWHG IHDWXUHV DQG WKH YLVXDO IHDWXUHV RI DOO XQLWV 7R UHSUHVHQW WKH FRVWV RI WKH FRQWH[W UHODWLRQVKLSV D PX VLFDO VWUXFWXUH DQG FKRUXV VHFWLRQ DUH HVWLPDWHG XVLQJ 5H IUDL' >@ 7KH HVWLPDWHG EHJLQQLQJ DQG HQGLQJ WLPHV RI DOO VHFWLRQV DUH XVHG DV WKH ERXQGDULHV RI D PXVLFDO VHFWLRQ +RZHYHU VHFWLRQV OHVV WKDQ EDUV LQ OHQJWK DUH QRW XVHG DV D VHFWLRQ IRU WKLV SXUSRVH /HW d(n, km ) EH WKH (XFOLGHDQ GLVWDQFH UHSUHVHQWLQJ WKH ORFDO FRVW EHWZHHQ WKH n(1 ≤ n ≤ N )WK EDU OHYHO IHDWXUH RI WKH LQSXW DQG WKH mWK YLGHR¶V kWK XQLW¶V IHDWXUHV RI WKH GDWDEDVH 7KH FDOFXODWHG ORFDO FRVWV DQG DFFXPXODWHG FRVWV DUH GH¿QHG DV IROORZV ⎧ ⎪ LI ch(n) = 1 ⎨d(n, km ) cl (n, km ) = ∧ch(km ) = 1 , ⎪ ⎩ pc × d(n, km ) RWKHUZLVH 4.2 Video generation ,Q WKH YLGHR JHQHUDWLRQ WR VHOHFW YLVXDO XQLWV IRU HDFK IUDPH IURP WKH GDWDEDVH WKH V\VWHP SURFHVV FRQVLVWV RI WKH IROORZLQJ VWHSV 6WHS ([WUDFW WKH EDUOHYHO IHDWXUHV RI D JLYHQ PXVLFDO SLHFH )LJXUH ) 6WHS 5HFRQVWUXFW WKH GDWDEDVH * 7R DYRLG JHQHU DWLQJ D YLGHR ZLWK XQQDWXUDOO\ IDVWVORZ WHPSR YL VXDO XQLWV ZLWK WHPSL 20% DERYH RU EHORZ WKH LQSXW WHPSR DUH QRW XVHG IRU WKH IROORZLQJ VWHSV 6WHS $SSO\ 3&$ SULQFLSDO FRPSRQHQW DQDO\VLV IRU DOO EDUOHYHO IHDWXUHV RI DOO EDUV DQG VWRUH ORZ N GLPHQVLRQDO IHDWXUHV 7KH N GLPHQVLRQ LV GHFLGHG EDVHG RQ WKH FXPXODWLYH FRQWULEXWLRQ UDWLR ≤ 95% )RU RXU LQYHVWLJDWLRQV WKH GLPHQVLRQV RI DXGLR DQG YLVXDO IHDWXUHV GHVFULEHG DERYH ZHUH UHGXFHG IURP 76 WR 62 DQG IURP 80 WR 68 UHVSHFWLYHO\ 6WHS 0RGHO UHODWLRQVKLS EHWZHHQ PXVLF DQG LPDJH VH TXHQFH IURP WKH GDWDEDVH + 7KLV VWHS LV H[ SODLQHG LQ PRUH GHWDLO EHORZ VHFWLRQ 4.2.1 6WHS 6HOHFW YLVXDO XQLWV XQGHU WKH FULWHULD RI WKH UHOD WLRQVKLSV GHVFULEHG LQ , ca (n, km ) ⎧ cl (n, km ) LI (μ = m ∧ κ = k − 1) ⎪ ⎪ ⎪ ⎨ +c (n − 1, κ ) ∨st(n) = st(n − 1) a μ . = min τ,μ ⎪ pt × cl (n, km ) ⎪ ⎪ ⎩ +ca (n − 1, κμ ) RWKHUZLVH ZKHUH ch(n) UHWXUQV LI n LV LQFOXGHG LQ D FKRUXV VHFWLRQ DQG st(n) UHWXUQV WKH QXPEHU RI PXVLFDO VHFWLRQV $ KLJKHU pc YDOXH PHDQV WKDW WKH XQLW RI FKRUXV VHFWLRQV DUH PRUH HDVLO\ VHOHFWHG DW D FKRUXV VHFWLRQ $ ORZHU pt YDOXH PHDQV WKDW WKH VHOHFWHG XQLW KDV OHVV WLPH FRQWLQXLW\ 7R PLQL PL]H WKH DFFXPXODWHG FRVW DW WKH N PHDVXUH WKH V\VWHP 6LQFH WKH GDWDEDVH LV UHFRQVWUXFWHG GHSHQGLQJ RQ WKH WHPSR RI WKH LQSXW WKH UHGXFHG GLPHQVLRQ LV QRW FRQVWDQW 3URFHHGLQJVRIWKH60&WK6RXQGDQG0XVLF&RPSXWLQJ&RQIHUHQFH-XO\3DGRYD,WDO\ &RQGLWLRQ (DFK YLGHR FOLS KDV WKH YLHZ FRXQW E\ XVHUV RQ WKH ZHE &RQGLWLRQ 7KH QXPEHU RI DYDLODEOH YLGHR FOLSV LV ODUJH HQRXJK VHOHFWV D XQLW ZKLFK KDV PLQLPXP DFFXPXODWHG FRVW dmin DQG WKHQ D LPDJH VHTXHQFH LV JHQHUDWHG E\ EDFNWUDFLQJ dmin = argmin k,m ca (N, km ). $V FRQWHQW IXO¿OOLQJ DOO RI WKH DERYH FRQGLWLRQV ZH XVHG PDVKXS YLGHRV ZKLFK DUH JHQHUDWHG IURP -DSDQHVH GDQFH VLPXODWLRQ JDPHV IXOO RI GDQFH VFHQHV ³7+( ,'2/0#67(5´ DQG ³7+( ,'2/0#67(5 /,9( )25 <28´ ,Q DGGLWLRQ ZH DOVR XVHG GDQFH YLGHRV ZKLFK DUH JHQHUDWHG XVLQJ MikuMikuDance (MMD) WKDW LV D GLPHQWLRQDO KXPDQ PRWLRQ V\QWKHVL]HU IRU GDQFH SHUIRU PDQFH %RWK YLGHRV FDQ EH IRXQG RQ D YLGHR VKDULQJ VHU YLFH NicoNicoDouga 7R FRQVWUXFW D GDWDEDVH ZH JDWK HUHG RI WKHVH PDVKXS YLGHR FOLSV DQG RI WKHVH 00' YLGHR FOLSV DOO RI ZKLFK KDG WKH YLHZ FRXQW RI RYHU RQ WKH 1LFR1LFR'RXJD 7KH LQWHUDFWLYH UHVHOHFWLRQ IXQFWLRQ LV LPSOHPHQWHG VR WKDW WKH V\VWHP FKRRVHV IRXU GLIIHUHQW FDQGLGDWHV IRU HDFK VHFWLRQ )LJXUH 7KHVH FDQGLGDWHV DUH PDGH IURP IRXU GLIIHUHQW DFFXPXODWHG FRVWV DQG WKHQ IRXU LPDJH VH TXHQFHV DUH JHQHUDWHG E\ EDFNWUDFLQJ 7R H[SDQG WKH YD ULHW\ RI JHQHUDWHG LPDJH VHTXHQFHV WKH FKRVHQ FDQGLGDWHV DUH PDGH IURP PLQLPXP PLQLPXP PLQLPXP DQG PD[LPXP DFFXPXODWHG FRVWV 7KLV HQDEOHV JHQHUDWLRQ RI D YDULDWLRQDO LPDJH VHTXHQFH 4.3 Model training weighted according to view counts 7KLV SDSHU IRFXVHV RQ WKH UHXVH RI WKH 0$' PRYLHV DYDLO DEOH RQ WKH ZHE 6LQFH WKHUH DUH PDQ\ FUHDWRUV WKH DX WKRULQJ TXDOLW\ RI JHQHUDWHG YLGHRV YDULHV ZLGHO\ ,Q RWKHU ZRUGV HDFK YLGHR ZLOO KDYH D GLIIHUHQW OHYHO RI UHOLDELO LW\ UHJDUGLQJ WKH UHODWLRQVKLSV EHWZHHQ PXVLF DQG LPDJH :H DVVXPH WKDW D YLGHR JHQHUDWHG E\ D XVHU KDYLQJ JRRG 0$' PRYLH VNLOOV ZLOO KDYH KLJKHU UHOLDELOLW\ DQG KLJKHU SRVVLELOLW\ RI LWV UHXVH 7KHUHIRUH WR PRGHO DQ DSSURSUL DWH LPDJH VHTXHQFH WR SDUWLFXODU PXVLF WKH V\VWHP VKRXOG LQWURGXFH D ZHLJKWLQJ IDFWRU LQ WKH PRGHO WUDLQLQJ SURFHVV ZKHUH KLJKHU TXDOLW\ YLGHR ZLOO EH JLYHQ D JUHDWHU ZHLJKW 7R HQDEOH DXWRPDWLF MXGJPHQW RI WKH TXDOLW\ ZH LQWUR GXFH WKH LGHD RI XVLQJ WKH YLHZ FRXQW RI HDFK YLGHR FOLS RQ WKH ZHE DV D ZHLJKW VLQFH WKH YLHZ FRXQW UHÀHFWV WKH YLGHR TXDOLW\ /HW ω EH DQ LQWHJHU ZHLJKWLQJ IDFWRU GH¿QHG DV IROORZV ZKHUH Vc LQGLFDWHV WKH YLHZ FRXQW w = max (α × log10 (Vc ) + 0.5 + β, 0) . 5.2 Trial usage and introspective comments 0DQ\ YLGHRV JHQHUDWHG E\ 'DQFH5H3URGXFHU ZHUH V\Q FKURQL]HG UHJDUGLQJ UK\WKP DQG LPSUHVVLRQ EHWZHHQ WKH PXVLF DQG LPDJH VHTXHQFH 7KLV VXJJHVWV WKDW WKH V\VWHP FDQ EH HIIHFWLYH DQG WKH PRGHOLQJ LV DSSURSULDWH 7ULDO XVHUV RI WKH V\VWHP RIIHUHG FRPPHQWV HVSHFLDOO\ UHJDUGLQJ WKH HIIHFWLYHQHVV RI WKH LQWHUDFWLYH UHVHOHFWLRQ IXQFWLRQ $ W\SLFDO FRPPHQW ZDV WKDW ³WKH IXQFWLRQ ZDV XVHIXO DQG HIIHFWLYH´ KRZHYHU LQ FRQWUDVW DQRWKHU XVHU FRPPHQWHG WKDW ³RFFDVLRQDOO\ WKHUH ZDV QR DSSURSULDWH FDQGLGDWH´ 6RPH FRPPHQWV ZHUH RQ ZD\V WR LPSURYH WKH V\VWHP SHUIRUPDQFH 2QH XVHU ZKR KDG QR H[SHULHQFH LQ 0$' PRYLH JHQHUDWLRQ VDLG LW ZRXOG EH XVHIXO WR KDYH ³PRUH FDQGLGDWHV IRU WKH LPDJH VHTXHQFH´ $QRWKHU FRPPHQW IURP D XVHU ZKR KDG 0$' PRYLH H[SHULHQFH ZDV WKDW WKH V\VWHP QHHGHG DQ ³DGMXVWPHQW IXQFWLRQ IRU WKH EDU DQG ERXQGDU\ RI WKH PXVLFDO VHFWLRQ ´ ,Q RXU FXUUHQW LPSOHPHQWDWLRQ α DQG β DUH VHW WR 2 DQG −7 UHVSHFWLYHO\ 7KLV PHDQV D YLHZ FRXQW RI 10, 000 FRU UHVSRQGV WR ω = 1 ZKLOH D YLHZ FRXQW RI 100, 000 FRUUH VSRQGV WR ω = 3 7R LPSOHPHQW WKH ZHLJKWHG WUDLQLQJ WKH QXPEHU RI EDUOHYHO DXGLRYLVXDO IHDWXUHV WUDLQLQJ VDP SOHV RI D YLGHR FOLS LV YLUWXDOO\ LQFUHDVHG E\ LWV ω GRX EOHG E\ ω = 2 IRU H[DPSOH LQ WUDLQLQJ OLQHDU UHJUHVVLRQ PRGHOV 6. CONCLUSION DanceReProducer LV D GDQFH YLGHR DXWKRULQJ V\VWHP WKDW FDQ DXWRPDWLFDOO\ JHQHUDWH GDQFH YLGHR DSSURSULDWH WR PX VLF E\ UHXVLQJ H[LVWLQJ GDQFH YLGHR VHTXHQFHV 7ULDO XVDJH RI WKH V\VWHP KDV VKRZQ WKDW LW LV D XVHIXO WRRO IRU XVHUV ZLWK OLWWOH NQRZOHGJH RU H[SHULHQFH LQ 0$' PRYLH JHQ HUDWLRQ $OWKRXJK GDQFH YLGHR FRQWHQW LV FXUUHQWO\ VXS SRUWHG LQ RXU LPSOHPHQWDWLRQ RXU DSSURDFK KDV FDSDELOLW\ WR XWLOL]H IRU DQ\ RWKHU PXVLF YLGHR FOLSV 2QH EHQH¿W RI 'DQFH5H3URGXFHU LV WKDW D XVHU GRHV QRW QHHG WR HQJDJH LQ WLPHFRQVXPLQJ PDQXDO JHQHUDWLRQ 0RUHRYHU WKH ³UHXVH´ DSSURDFK GHVFULEHG LQ WKLV SDSHU LV QRYHO LQ WKDW LW DOORZV WKH XVH RI HYHULQFUHDVLQJ XVHU JHQHUDWHG FRQWHQW RQ WKH ZHE :H H[SHFW WKH H[SDQVLRQ RI PDVKXS FRQWHQW nWK JHQHUDWLRQ FRQWHQW DQG LWV VXS SRUWLQJ V\VWHPV WR FUHDWH DQ RSSRUWXQLW\ IRU D QHZ IRUP RI HQWHUWDLQPHQW 5HPDLQLQJ LVVXHV VXFK DV D TXDQWLWDWLYH 5. IMPLEMENTATION OF DANCEREPRODUCER ,Q WKLV VHFWLRQ ZH GHVFULEH WKH GDWDVHW XVHG DQG WULDO XVHU FRPPHQWV UHJDUGLQJ WKH V\VWHP HIIHFWLYHQHVV 5.1 Dataset 7R JHQHUDWH D GDQFH YLGHR E\ VHJPHQWLQJ DQG FRQFDWHQDW LQJ IURP H[LVWLQJ GDQFH YLGHR DQG WR PRGHO WKH YDULRXV UHODWLRQVKLSV EHWZHHQ PXVLF DQG DQ LPDJH VHTXHQFH WKH GDWDEDVH VKRXOG IXO¿OO WKH IROORZLQJ IRXU FRQGLWLRQV &RQGLWLRQ 7KH PDLQ FRQWHQW RI YLGHR FOLSV LV GDQFH &RQGLWLRQ 9LGHR FOLSV DUH VLPLODU W\SHV RI 0$' PRYLHV VR WKDW WKHLU PL[WXUH JHQHUDWHG E\ RXU V\V WHP FDQ ORRN OLNH D FRQVLVWHQW FRQWHQW KWWSZZZEDQGDLQDPFRJDPHVFRMSFVOLVWLGROPDVWHU KWWSZZZJHRFLWLHVMSKLJXFKXXLQGH[ HKWP KWWSZZZQLFRYLGHRMS 'HPRQVWUDWLRQ YLGHR FOLSV JHQHUDWHG E\ RXU V\VWHP DUH DYDLODEOH DW KWWSVWDIIDLVWJRMSWQDNDQR'DQFH5H3URGXFHU 3URFHHGLQJVRIWKH60&WK6RXQGDQG0XVLF&RPSXWLQJ&RQIHUHQFH-XO\3DGRYD,WDO\ HYDOXDWLRQ RI WKLV V\VWHP IHDWXUH H[WUDFWLRQ IRU GDQFH PR WLRQ LQ GHWDLO OLNH WKH ERG\ PRWLRQ GHWHFWLRQ DQG DQ LQ WHUIDFH WKDW FDQ DGMXVW PHDVXUH RU VHFWLRQ ERXQGDULHV ZLOO EH WRSLFV FRYHUHG LQ RXU IXWXUH ZRUN >@ 0 1LVKL\DPD 7 .LWDKDUD . .RPDWDQL 7 2JDWD DQG + * 2NXQR ³$ &RPSXWDWLRQDO 0RGHO RI &RQJUXHQF\ EHWZHHQ 0XVLF DQG 9LGHR LQ 0XOWLPHGLD &RQWHQW´ LQ IPSJ SIG Technical Reports 2007-MUS-069 YRO QR SS ± LQ -DSDQHVH Acknowledgments >@ * 7]DQHWDNLV DQG 3 &RRN ³0XVLFDO JHQUH FODVVL¿FD WLRQ RI DXGLR VLJQDOV´ LQ IEEE Trans. on Speech and Audio Processing YRO QR SS ± :H WKDQN <XNL +DVHJDZD DQG 7DWVXQRUL +LUDL IRU WKHLU KHOS 7. REFERENCES >@ 7 ; )XMLVDZD 0 7DQL 1 1DJDWD DQG + .DWD\RVH ³0XVLF PRRG YLVXDOL]DWLRQ EDVHG RQ TXDQWLWDWLYH PRGHO RI FKRUG SHUFHSWLRQ´ LQ Journal of Information Processing Society of Japan YRO QR SS ± LQ -DSDQHVH >@ & /DXULHU DQG 3 +HUUHUD ³0RRG &ORXG $ UHDOWLPH PXVLF PRRG YLVXDOL]DWLRQ WRRO´ LQ Proc. of the 2008 Computers in Music Modeling and Retrieval Conference SS ± >@ 0 *RWR ³$Q DXGLREDVHG UHDOWLPH EHDW WUDFNLQJ V\V WHP IRU PXVLF ZLWK RU ZLWKRXW GUXPVRXQGV´ LQ Journal of New Music Research YRO QR SS ± >@ 7 6KLUDWRUL DQG . ,NHXFKL ³6\QWKHVLV RI GDQFH SHUIRU PDQFH EDVHG RQ DQDO\VHV RI KXPDQ PRWLRQ DQG PXVLF´ LQ IPSJ Transactions on Computer Vision and Image Media YRO QR SS ± >@ ;6 +XD / /X DQG +- =KDQJ ³$XWRPDWLFDOO\ &RQYHUWLQJ 3KRWRJUDSKLF 6HULHV LQWR 9LGHR´ LQ Proc. of the 12th annual ACM international conference on Multimedia SS ± >@ 5 &DL / =KDQJ ) -LQJ : /DL DQG :< 0D ³$X WRPDWHG 0XVLF 9LGHR *HQHUDWLRQ XVLQJ :(% ,PDJH 5HVRXUFH´ LQ Proc. of the 32nd IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2007) SS ,,±±,, >@ - )RRWH 0 &RRSHUDQG DQG $ *LUJHQVRKQ ³&UHDWLQJ PXVLF YLGHRV XVLQJ DXWRPDWLF PHGLD DQDO\VLV´ LQ Proc. of the tenth ACM international conference on Multimedia SS ± >@ ;6 +XD / /X DQG +- =KDQJ ³$XWRPDWLF PXVLF YLGHR JHQHUDWLRQ EDVHG RQ WHPSRUDO SDWWHUQ DQDO\VLV´ LQ Proc. of the 12th annual ACM international conference on Multimedia SS ± >@ 0 *RWR ³$ FKRUXVVHFWLRQ GHWHFWLRQ PHWKRG IRU PX VLFDO DXGLR VLJQDOV DQG LWV DSSOLFDWLRQ WR D PXVLF´ LQ IEEE Trans. on Audio, Speech, and Language Processing YRO QR SS ± >@ 2 *LOOHW 6 (VVLG DQG * 5LFKDUG ³2Q WKH FRUUHODWLRQ RI DXGLR DQG YLVXDO VHJPHQWDWLRQV RI PXVLF YLGHRV´ LQ IEEE Trans. on Circuits and Systems for Video Technology YRO QR SS ± (\HV:HE KWWSZZZLQIRPXVRUJ(\Z0DLQKWPO
Similar documents
PDF of entire issue - Parks College of Engineering, Aviation and
(DUWKTXDNH5HVSRQVH7HDP/HDG5RE:LOOLDPV7KLV SURJUDPZDVSDUWLFXODUO\WLPHO\DVDQHDUWKTXDNH VWUXFN1HZ=HDODQGZLWKLQDGD\RI5RE:LOOLDPV· SUHVHQWDWLRQRQKLVH[SHULHQFHZRUNLQJRQDS...
More informationSacred Sites and Pilgrimage Routes in the Kii Mountain Range
ULYHUV ZDWHUIDOOV KRW VSULQJV DQG UXUDO DUHDV LQ VKRUW LWV ³&XOWXUDO /DQGVFDSH´SOD\VDQHVVHQWLDOUROH,WLVDQDVVHWRIJUHDWYDOXHXQLTXHWR -DSDQDQGWKHNLQGRIZKLFKWKH...
More information