DanceReProducer: An Automatic Mashup Music Video Generation

Transcription

DanceReProducer: An Automatic Mashup Music Video Generation
3URFHHGLQJVRIWKH60&WK6RXQGDQG0XVLF&RPSXWLQJ&RQIHUHQFH-XO\3DGRYD,WDO\
DANCEREPRODUCER: AN AUTOMATIC MASHUP MUSIC VIDEO
GENERATION SYSTEM BY REUSING DANCE VIDEO CLIPS
ON THE WEB
1
Tomoyasu Nakano†1
Sora Murofushi‡3
Masataka Goto†2
Shigeo Morishima ‡3
†
1DWLRQDO ,QVWLWXWH RI $GYDQFHG ,QGXVWULDO 6FLHQFH DQG 7HFKQRORJ\ $,67 -DSDQ
‡
:DVHGD 8QLYHUVLW\ -DSDQ
2
3
t.nakano[at]aist.go.jp
m.goto[at]aist.go.jp
shigeo[at]waseda.jp
ABSTRACT
Original content (1st generation)
Music video
:H SURSRVH D GDQFH YLGHR DXWKRULQJ V\VWHP DanceReProducer WKDW FDQ DXWRPDWLFDOO\ JHQHUDWH D GDQFH YLGHR FOLS
DSSURSULDWH WR D JLYHQ SLHFH RI PXVLF E\ VHJPHQWLQJ DQG
FRQFDWHQDWLQJ H[LVWLQJ GDQFH YLGHR FOLSV ,Q WKLV SDSHU
ZH IRFXV RQ WKH reuse RI HYHULQFUHDVLQJ XVHUJHQHUDWHG
GDQFH YLGHR FOLSV RQ D YLGHR VKDULQJ ZHE VHUYLFH ,Q D
YLGHR FOLS FRQVLVWLQJ RI PXVLF DXGLR VLJQDOV DQG LPDJH
VHTXHQFHV YLGHR IUDPHV WKH LPDJH VHTXHQFHV DUH RIWHQ
V\QFKURQL]HG ZLWK RU UHODWHG WR WKH PXVLF 6XFK UHODWLRQ
VKLSV DUH GLYHUVH LQ GLIIHUHQW YLGHR FOLSV EXW ZHUH QRW GHDOW
ZLWK E\ SUHYLRXV PHWKRGV IRU DXWRPDWLF PXVLF YLGHR JHQ
HUDWLRQ 2XU V\VWHP HPSOR\V PDFKLQH OHDUQLQJ DQG EHDW
WUDFNLQJ WHFKQLTXHV WR PRGHO WKHVH UHODWLRQVKLSV 7R JHQ
HUDWH QHZ PXVLF YLGHR FOLSV VKRUW LPDJH VHTXHQFHV WKDW
KDYH EHHQ SUHYLRXVO\ H[WUDFWHG IURP RWKHU PXVLF FOLSV DUH
VWUHWFKHG DQG FRQFDWHQDWHG VR WKDW WKH HPHUJLQJ LPDJH VH
TXHQFH PDWFKHV WKH UK\WKPLF VWUXFWXUH RI WKH WDUJHW VRQJ
%HVLGHV DXWRPDWLFDOO\ JHQHUDWLQJ PXVLF YLGHRV 'DQFH5H
3URGXFHU RIIHUV D XVHU LQWHUIDFH LQ ZKLFK D XVHU FDQ LQWHU
DFWLYHO\ FKDQJH LPDJH VHTXHQFHV MXVW E\ FKRRVLQJ GLIIHUHQW
FDQGLGDWHV 7KLV ZD\ SHRSOH ZLWK OLWWOH NQRZOHGJH RU H[
SHULHQFH LQ 0$' PRYLH JHQHUDWLRQ FDQ LQWHUDFWLYHO\ FUHDWH
SHUVRQDOL]HG YLGHR FOLSV
Mashup music videos (User-generated video clips)
Creators
+
Tomoyasu
article
Nakano
distributed
under
Creative Commons Attribution 3.0 Unported License,
et
al.
the
which
terms
This
is
of
the
permits
reuse
+
new
reuse
+
reuse
+
3rd generation
reuse
reuse
+
Nth generation
Figure 1 *HQHUDWLRQ RI PDVKXS PXVLF YLGHRV XVHU
JHQHUDWHG PXVLF YLGHR FOLSV E\ UHXVLQJ H[LVWLQJ RULJLQDO
FRQWHQW
WKH 0$' YLGHR FOLSV JHQHUDWHG E\ XVHUV FDQ EH FRQVLGHUHG
2nd generation (secondary or derivative) content )LJXUH
,Q D 0$' YLGHR FOLS JRRG PXVLFWRLPDJH V\QFKUR
QL]DWLRQ ZLWK UHVSHFW WR UK\WKP LPSUHVVLRQ DQG FRQWH[W LV
LPSRUWDQW
$OWKRXJK LW LV HDV\ WR HQMR\ ZDWFKLQJ 0$' PRYLHV LW
LV QRW HDV\ WR JHQHUDWH WKHP EHFDXVH D FUHDWRU QHHGV WR
VHDUFK LQ H[LVWLQJ YLGHR FOLSV IRU LPDJH VHTXHQFHV
WKDW JLYH LPSUHVVLRQV DSSURSULDWH WR D JLYHQ WDUJHW PXVL
FDO SLHFH VHJPHQW DQG FRQFDWHQDWH LPDJH VHTXHQFHV WR
¿W WKH WDUJHW SLHFH DQG WLPHVWUHWFK WKH VHTXHQFHV WR
PDWFK WKH WHPSR RI WKH WDUJHW SLHFH EHFDXVH H[LVWLQJ YLGHR
FOLSV XVXDOO\ KDYH WHPSL GLIIHUHQW IURP WKH WHPSR RI WKH
WDUJHW SLHFH 0RUHRYHU IRU EHWWHU PXVLFWRLPDJH V\QFKUR
QL]DWLRQ WKH PXVLF VWUXFWXUH DQG FRQWH[W RI D PXVLFDO SLHFH
DQG LPDJH VHTXHQFHV VKRXOG EH WDNHQ LQWR DFFRXQW EXW LW
UHTXLUHV FRQVLGHUDEOH WLPH DQG HIIRUW
7R JLYH D FKDQFH RI HQMR\LQJ VXFK GLI¿FXOW 0$' PRYLH
JHQHUDWLRQ WR HYHU\ERG\ ZH KDYH GHYHORSHG D QHZ V\V
WHP FDOOHG DanceReProducer WKDW FDQ DXWRPDWLFDOO\ JHQ
unre-
stricted use, distribution, and reproduction in any medium, provided the original
author and source are credited.
reuse
new
2nd generation
reuse
+
+
...
F
2011
open-access
reuse
new
+
8VHUJHQHUDWHG YLGHR FOLSV FDOOHG MAD movies RU
mashup videos HDFK RI ZKLFK LV D GHULYDWLYH PL[WXUH RU
FRPELQDWLRQ RI VRPH RULJLQDO YLGHR FOLSV DUH JDLQLQJ SRS
XODULW\ RQ WKH ZHE DQG D ORW RI WKHP KDYH EHHQ XSORDGHG
DQG DUH DYDLODEOH RQ YLGHR VKDULQJ ZHE VHUYLFHV ,Q WKLV SD
SHU ZH IRFXV RQ PXVLF YLGHR FOLSV RI GDQFH VFHQHV GDQFH
YLGHR FOLSV LQ WKH IRUP RI 0$' PRYLHV RU PDVKXS YLGHRV
6XFK D 0$' PXVLF YLGHR FOLS FRQVLVWV RI D PXVLFDO SLHFH
DXGLR VLJQDOV DQG LPDJH VHTXHQFHV YLGHR IUDPHV WDNHQ
IURP RWKHU RULJLQDO YLGHR FOLSV 7KH RULJLQDO YLGHR FOLSV
DUH FDOOHG 1st generation (primary or original) content DQG
an
reuse
reuse
1. INTRODUCTION
Copyright:
Picture
Music
Dance video
KWWSHQZLNLSHGLDRUJZLNL0$' 0RYLH
3URFHHGLQJVRIWKH60&WK6RXQGDQG0XVLF&RPSXWLQJ&RQIHUHQFH-XO\3DGRYD,WDO\
HUDWH D GDQFH YLGHR FOLS IRU DQ\ JLYHQ SLHFH RI PXVLF E\
VHJPHQWLQJ FRQFDWHQDWLQJ DQG VWUHWFKLQJ H[LVWLQJ GDQFH
YLGHR FOLSV )LJXUH 7KLV V\VWHP SURYLGHV DQ LQWHUIDFH LQ
ZKLFK D XVHU QRW RQO\ OLVWHQV WR PXVLF EXW DOVR HQMR\V PX
VLF YLVXDOO\ E\ GLUHFWLQJ VXSHUYLVLQJ WKH VHPLDXWRPDWLF
JHQHUDWLRQ RI GDQFH YLGHR LPDJH VHTXHQFHV ,I WKH DXWR
PDWLFDOO\ JHQHUDWHG YLGHR FOLS LV VDWLVIDFWRU\ WKH XVHU FDQ
MXVW ZDWFK LW EXW LI WKH XVHU GRHV QRW OLNH JHQHUDWHG LP
DJH VHTXHQFHV IRU VRPH PXVLFDO VHFWLRQV e.g. $ % DQG
& LQ )LJXUH WKH XVHU FDQ HDVLO\ FKRRVH DQRWKHU IDYRULWH
LPDJH VHTXHQFH IURP UDQNHG FDQGLGDWHV IRU HDFK PXVLFDO
VHFWLRQ 7KHVH FDQGLGDWHV DUH DOVR DXWRPDWLFDOO\ SURSRVHG
E\ WKH V\VWHP DQG ZRXOG DOVR PDWFK D JLYHQ PXVLFDO VHF
WLRQ RI WKH LQSXW SLHFH DFFRUGLQJ WR RXU PDSSLQJ PRGHO
7KLV PDSSLQJ PRGHO ZDV WUDLQHG WKURXJK DQ DQDO\VLV RI D
ODUJH DPRXQW RI XVHUJHQHUDWHG GDQFH YLGHR FOLSV DYDLODEOH
RQ D YLGHR VKDULQJ ZHE VHUYLFH ,Q SDUWLFXODU ZH IRFXV RQ
WKH UHXVH RI YLGHR FOLSV RI WKH QG UG DQG N WK JHQHUDWLRQ
FRQWHQW )LJXUH DV ZHOO DV WKH VW JHQHUDWLRQ FRQWHQW ,Q
RWKHU ZRUGV RXU V\VWHP HQDEOHV D XVHU WR JHQHUDWH D QHZ
PDVKXS YLGHR FOLS E\ UHXVLQJ H[LVWLQJ PDVKXS YLGHR FOLSV
RQ WKH ZHE
Automatic mashup music video generation system
Web
Video2
Video3
...
...
...
...
VideoN
Stretch and Concatenate
Output
Chorus
Image sequence
Input
Music
Estimated music structure
A A B B B A A B B B C C C
Figure 2 $Q DXWRPDWLF PXVLF YLGHR JHQHUDWLRQ V\VWHP
DanceReProducer E\ UHXVLQJ H[LVWLQJ PXVLF YLGHR FOLSV
• Impression 9LVXDO LPSUHVVLRQV VXFK DV GDQFH PR
WLRQ FRORU EULJKWQHVV DQG OLJKWLQJ DUH V\QFKURQL]HG
ZLWK WKH PXVLFDO LPSUHVVLRQ
Context relationships FULWHULD IRU FRQWH[W V\QFKURQL]D
WLRQ EHWZHHQ PXVLF DQG LPDJH VHTXHQFHV
2. RELATED WORK
3UHYLRXV ZRUNV JHQHUDWHG YLVXDO SDWWHUQV EDVHG RQ VRPH
PXVLFDO DVSHFWV VXFK DV YLVXDOL]LQJ PXVLF FKRUGV E\
FRORU >@ YLVXDOL]LQJ PXVLFDO PRRG >@ DQG FRQWUROOLQJ
D FRPSXWHUJUDSKLFV GDQFHU XQGHU PXVLFDO EHDWV > @
7KHUH ZHUH DOVR SUHYLRXV ZRUNV DXWRPDWLFDOO\ JHQHUDWLQJ
PXVLFV\QFKURQL]HG YLGHR E\ UHXVLQJ PHGLD FRQWHQW IRU
H[DPSOH VRPH UHXVHG LPDJHV DQG SKRWRJUDSKV IURP WKH
ZHE > @ DQG RWKHUV UHXVHG KRPH YLGHRV > @ XQGHU DX
GLR FKDQJHV >@ RU UHSHWLWLYH YLVXDO DQG DXUDO SDWWHUQV >@
3UHYLRXV ZRUNV KRZHYHU GLG QRW UHXVH GDQFH YLGHR FOLSV
RQ WKH ZHE WR JHQHUDWH D QHZ PDVKXS YLGHR FOLS
• Music structure 9LVXDO LPSUHVVLRQ WHPSRUDO
FKDQJHV DUH V\QFKURQL]HG ZLWK WKH PXVLF VWUXFWXUH
e.g. YHUVH $ FKRUXV
• Temporal continuity ,PDJH VHTXHQFH KDV WHPSR
UDO FRQWLQXLW\ EXW YLVXDO LPSUHVVLRQ FDQ EH FKDQJHG
HDVLO\ RQ D PXVLF VWUXFWXUH ERXQGDU\
7KH DERYH FULWHULD DUH QRW DOO VDWLV¿HG DW DQ\ JLYHQ WLPH
DQG DUH QRW PXWXDOO\ LQGHSHQGHQW +RZHYHU WKH\ SURYLGH
D XVHIXO IRXQGDWLRQ IRU JHQHUDWLQJ DQ LPDJH VHTXHQFH DS
SURSULDWH WR D SDUWLFXODU SLHFH RI PXVLF
3. SYSTEM DESIGN
3.2 Image sequence generation
7R GHYHORS 'DQFH5H3URGXFHU ZH ¿UVW FRQVLGHUHG WKH FUL
WHULD WKDW SHRSOH XVH LQ MXGJLQJ ³ZKDW LV DQ DSSURSULDWH LP
DJH VHTXHQFH IRU D SDUWLFXODU SLHFH RI PXVLF´ DV GHVFULEHG
EHORZ :H WKHQ GHVFULEH IXQFWLRQV RI WKH V\VWHP LQWHUIDFH
7KH PDVKXS YLGHR JHQHUDWLRQ GRQH PDQXDOO\ LV GLI¿FXOW
DQG WLPHFRQVXPLQJ 7R HQDEOH PRUH HI¿FLHQW JHQHUDWLRQ
RXU V\VWHP ¿UVW DXWRPDWLFDOO\ JHQHUDWHV DQ LPDJH VHTXHQFH
DSSURSULDWH WR WKH PXVLF +RZHYHU WKH JHQHUDWHG VHTXHQFH
PD\ QRW EH WR WKH XVHU¶V WDVWH ,Q VXFK FDVHV RWKHU VH
TXHQFH FDQGLGDWHV DUH VKRZQ RQ D VFUHHQ VR WKDW WKH XVHU
FDQ VLPSO\ FKRRVH D SUHIHUUHG RQH (YHQ WKRXJK LW ZRXOG
EH GLI¿FXOW IRU D XVHU WR PDQXDOO\ ¿QG DQRWKHU FDQGLGDWH
IURP DPRQJ D KXJH QXPEHU RI FDQGLGDWHV LW LV HDV\ WR LQ
WHUDFWLYHO\ FKRRVH D SUHIHUUHG FDQGLGDWH
:H SURYLGH DQ RYHUYLHZ RI WKH LQWHUIDFH¶V LPDJH VH
TXHQFH JHQHUDWLRQ DQG IXQFWLRQV EHORZ
3.1 Criteria of natural/skillful relationships between
an image sequence and music
7R GHVLJQ WKH V\VWHP ZH FRQVLGHUHG WKH FULWHULD IURP
WZR DVSHFWV ± ORFDO UHODWLRQVKLSV DQG FRQWH[W JOREDO UH
ODWLRQVKLSV H[SODLQHG EHORZ ± WDNLQJ LQWR DFFRXQW SUHYLRXV
ZRUN > @ DQG WKH FRPPHQWV RIIHUHG E\ KXPDQ FUHDWRUV
RI 0$' PRYLHV 3.2.1 Automatic image sequence generation
Local relationships FULWHULD IRU LPSUHVVLRQ V\QFKURQL]D
WLRQ EHWZHHQ WKH PXVLF DQG LPDJH VHTXHQFHV
7R UHXVH H[LVWLQJ FRQWHQW ZH ¿UVW JDWKHU GDQFH YLGHR FOLSV
RQ D YLGHR VKDULQJ ZHE VHUYLFH DQG WKH V\VWHP HVWLPDWHV
WKH WHPSR DQG EDU OLQH RI WKH PXVLF DXGLR VLJQDOV LQ WKRVH
YLGHR FOLSV :H DVVXPH WKH PXVLF DQG LWV GDQFH PRWLRQV
ZLWKLQ HDFK YLGHR FOLS DUH V\QFKURQL]HG ZKLOH GHDOLQJ ZLWK
WKH ORFDO UHODWLRQVKLSV DQG XVH HDFK EDU PHDVXUH RI WKH
• Rhythm 9LVXDO UK\WKPV VXFK DV GDQFH PRWLRQ FDP
HUD ZRUN DQG FXW e.g. GLVVROYH DUH V\QFKURQL]HG
ZLWK EHDW DQG PXVLFDO DFFHQW
Segment
Music video clips
Video1
6RPH FUHDWRUV GLVFORVHG WKHLU FUHDWLYH SURFHVVHV RQ WKH ZHE
3URFHHGLQJVRIWKH60&WK6RXQGDQG0XVLF&RPSXWLQJ&RQIHUHQFH-XO\3DGRYD,WDO\
1
2
8
7
6
3
5
4
Figure 4 ([DPSOH RI LQWHUDFWLYH VHTXHQFH VHOHFWLRQ )RXU
GLIIHUHQW LPDJH VHTXHQFH FDQGLGDWHV DUH SUHYLHZHG DQG WKH
ORZHUULJKW FDQGLGDWH LV FKRVHQ E\ D XVHU
Figure 3 ([DPSOH RI WKH 'DQFH5H3URGXFHU VFUHHQ
PXVLF DV WKH PLQLPXP XQLW IRU VHJPHQWLQJ DQG FRQFDWH
QDWLQJ LPDJH VHTXHQFHV +HUHDIWHU ZH GHQRWH DQ LPDJH
VHTXHQFH VHULHV RI YLGHR IUDPHV IRU WKH EDUOHYHO PLQL
PXP XQLW DV D visual unit
6HFRQG WKH V\VWHP VHDUFKHV IRU D YLVXDO XQLW DSSURSULDWH
WR HDFK EDU IRU WKH LQSXW WDUJHW PXVLFDO SLHFH 7KH XQLWV
DUH WLPHVWUHWFKHG XQGHU WKH WHPSR RI WKH LQSXW PXVLF
DQG WKHQ DUH FRQFDWHQDWHG WR JHQHUDWH DQ LPDJH VHTXHQFH
,Q WKLV UHJDUG WR GHDO ZLWK WKH FRQWH[W UHODWLRQVKLSV WKH
V\VWHP VHOHFWV YLVXDO XQLWV ZKLFK WDNH LQWR DFFRXQW PXVLF
VWUXFWXUH DQG WHPSRUDO FRQWLQXLW\
7R VDWLVI\ HDFK FULWHULRQ GHVFULEHG LQ ZH LPSOHPHQW
WKH IROORZLQJ SURFHVVHV
7KLV LQWHUIDFH DOVR SURYLGHV WKH IROORZLQJ IXQFWLRQV WR UH
ÀHFW WKH XVHU¶V SUHIHUHQFHV
Interactive re-selection of a generated image sequence
%\ FOLFNLQJ WKH 1* EXWWRQ )LJXUH WKH XVHU
FDQ VHH RWKHU VHTXHQFH FDQGLGDWHV RQ D VFUHHQ DQG
VLPSO\ FKRRVH WKH SUHIHUUHG RQH )LJXUH 7KH XVHU FDQ VHH DQG FRPSDUH GLIIHUHQW FDQGLGDWHV
GXULQJ SOD\EDFN DQG FDQ FKRRVH KLVKHU IDYRULWH
VHTXHQFH 6LQFH WKLV LQWHUDFWLYH UHVHOHFWLRQ IXQF
WLRQ ZRUNV RQ HDFK VHFWLRQ RI WKH PXVLF VWUXFWXUH
e.g. $ % DQG & LQ )LJXUH WKH XVHU FDQ XVH WKLV
IXQFWLRQ WR HDVLO\ FRQVLGHU WKH PXVLF VWUXFWXUH DQG
FRQWH[W
Jumping to the beginning of sections %\ FOLFNLQJ WKH
MXPS EXWWRQ )LJXUH RU YLVXDOL]HG VHFWLRQV
D XVHU FDQ GLUHFWO\ MXPS WR DQG YLHZ WKH
SUHYLRXV RU WKH QH[W VHFWLRQ RI D VRQJ
Rhythmic synchronization $ PXVLFDO EDU LV XVHG DV WKH
PLQLPXP XQLW IRU VHJPHQWLQJ DQG FRQFDWHQDWLQJ $
YLVXDO XQLW LV VWUHWFKHG XQGHU LQSXW PXVLF WHPSR
Impression synchronization %\ PRGHOLQJ WKH PDSSLQJ
EHWZHHQ WKH H[WUDFWHG DXGLR DQG YLVXDO IHDWXUHV IRU
LPSUHVVLRQ WKH V\VWHP DXWRPDWLFDOO\ VHOHFWV DQ DS
SURSULDWH YLVXDO XQLW WR LQSXW PXVLF LPSUHVVLRQ LQ
HDFK EDU
Music structure DQG Temporal continuity %\ LQWURGXFLQJ
FRVWV UHSUHVHQWLQJ WKH WHPSRUDO FRQWLQXLW\ DQG PXVLF
VWUXFWXUH RI WKH JHQHUDWHG VHTXHQFH WKH V\VWHP DXWR
PDWLFDOO\ VHOHFWV DQ LPDJH VHTXHQFH FRQVLGHULQJ WKH
FRQWH[W UHODWLRQVKLSV
4. INTERNAL MECHANISM OF
DANCEREPRODUCER
7R GHYHORS 'DQFH5H3URGXFHU ZH PRGHOHG WKH UHODWLRQ
VKLSV EHWZHHQ PXVLF DQG YLGHR DQG WKHQ JHQHUDWHG LPDJH
VHTXHQFHV DSSURSULDWH WR LQSXW PXVLF E\ FRQVLGHULQJ WKH
ORFDO DQG FRQWH[W UHODWLRQVKLSV ,Q JHQHUDO LW LV GLI¿FXOW
WR PRGHO VXFK UHODWLRQVKLSV EXW ZH VROYHG WKLV SUREOHP
WKURXJK WUDLQLQJ XVLQJ D KXJH TXDQWLW\ PDVKXS YLGHR FOLSV
SRVWHG WR WKH ZHE 6LQFH WKH FRQWHQW YLGHRV ZHUH PDGH E\
KXPDQV WKHUH ZHUH YDULRXV W\SHV RI PXWXDO UHODWLRQVKLS
EHWZHHQ WKH PXVLF DQG WKH LPDJH VHTXHQFHV 7KLV VXJ
JHVWV WKDW VXFK YLGHRV FDQ EH XVHG WR OHDUQ WKH UHODWLRQVKLSV
WKURXJK D PDFKLQHOHDUQLQJ WHFKQLTXH
0RGHOLQJ XVLQJ WKH PDVKXS FOLSV VXIIHUV IURP WZR SURE
OHPV 2QH LV WKDW FRPSOH[ UHODWLRQVKLSV H[LVW VXFK DV
ZKHUH ³WKH VDPH LPDJH VHTXHQFHV DUH XVHG IRU GLIIHUHQW
PXVLF´ RU ³GLIIHUHQW LPDJH VHTXHQFHV DUH XVHG IRU WKH VDPH
PXVLF´ )LJXUH $QRWKHU SUREOHP LV WKDW WKH YLGHR TXDO
LW\ YDULHV VWURQJO\ DQG LW LV GLI¿FXOW WR MXGJH WKH SRVVLELOLW\
3.2.2 Interface
6FUHHQVKRWV RI WKH LPSOHPHQWHG 'DQFH5H3URGXFHU LQWHU
IDFH DUH VKRZQ LQ )LJXUH DQG 7KHUH DUH EDVLF IXQFWLRQV
IRU YLHZLQJ VXFK DV D ZLQGRZ VKRZLQJ WKH JHQHUDWHG LP
DJH VHTXHQFH )LJXUH IXQFWLRQV WR ORDG LQSXW PXVLF
DQG VDYH WKH JHQHUDWHG YLGHR WR SOD\ DQG VWRSSDXVH
WKH JHQHUDWHG YLGHR DQG D SOD\EDFNSRVLWLRQ ³VOLGHU´
DQG WKH PXVLF VWUXFWXUH HVWLPDWHG DXWRPDWLFDOO\ >@ 7KH JUHHQ UHFWDQJXODU PDUNHUV LQ WKH PXVLF VWUXFWXUH UHSUH
VHQW FKRUXV VHFWLRQV DQG WKH EOXH PDUNHUV UHSUHVHQW RWKHU
VHFWLRQV ,Q DGGLWLRQ WKH WRWDO GXUDWLRQ RI WKH LQSXW PXVLF
LV HTXDOO\ GLYLGHG LQWR VHFWLRQV 3URFHHGLQJVRIWKH60&WK6RXQGDQG0XVLF&RPSXWLQJ&RQIHUHQFH-XO\3DGRYD,WDO\
Database construction
A Gather videos B Extract frame feature C Extract bar-level feature E Construct
Resampling
Database
...
1 frame
Image sequence
Web
music video clips
(30 fps, 44.1kHz)
...
Dance video
1 bar
30 fps
Resampling
View count
View count
(16 points)
D
Music
...
44.1kHz
DCT
1 bar
Beat tracking
1 feature vector ( =
(3rd order with DC)
+
)
...
View count
Video generation
Input music
G Reconstruct H Train mapping model
Database
Feature space
User
...
Music
F
Extract bar-level feature
...
0%
input
tempo
PCA
DCT
> 20%
under the Euclidean distance
...
Visual
...
View count
...
linear
regression
g
htin
weig lation
u
lc
a
c
Output
Viterbi search
clustering
<2
...
mapping
I Select visual unit
...
A A A
A
Music
B B structure
Stretch and concatenate
of the unit
...
Figure 5 2YHUYLHZ RI 'DQFH5H3URGXFHU D GDQFH YLGHR DXWKRULQJ V\VWHP WKDW FDQ DXWRPDWLFDOO\ JHQHUDWH D GDQFH YLGHR
FOLS DSSURSULDWH WR D JLYHQ SLFH RI PXVLF E\ VHJPHQWLQJ FRQFDWHQDWLQJ DQG VWUHWFKLQJ H[LVWLQJ GDQFH YLGHR FOLSV
7KH V\VWHP ¿UVW FDOFXODWHV WKH SRZHU RI WKH LQSXW DXGLR
VLJQDO DQG WKHQ FDOFXODWHV LWV DXWRFRUUHODWLRQ YDOXHV DQG
HVWLPDWHV WKHLU SHDN WLPH 6LQFH LW UHSUHVHQWV WKH SHULRGLF
LW\ RI WKH SRZHU ZH XVH WKH WLPH DV WHPSR RQH EHDW WLPH
,Q WKLV UHJDUG WR DYRLG RFWDYH HUURU e.g. KDOIGRXEOH
WHPSR HUURU WKH HVWLPDWLRQ LV OLPLWHG WR WHPSR ZLWKLQ D
UDQJH RI 60 − 120 ESP EHDW SHU PLQXWH
6HFRQG WKH V\VWHP FDOFXODWHV FURVVFRUUHODWLRQ EHWZHHQ
WKH SRZHU DQG WKH SXOVH VLJQDO JHQHUDWHG XQGHU WKH HVWL
PDWHG WHPSR 6LQFH WKH SHDN WLPH RI WKH FURVVFRUUHODWLRQ
UHSUHVHQWV WKH ¿UVW EHDW WLPH WKH V\VWHP UHJDUGV WKH WLPH
DV WKH EHJLQQLQJ WLPH RI WKH ¿UVW EDU ,Q DGGLWLRQ ZH DV
VXPH WKDW WKH GDWDVHW YLGHRV KDYH D OHQJWK RI EHDWV RQH
PHDVXUH LQ WLPH DQG WKHQ WKH V\VWHP GHFLGHV DOO EDU
OLQHV PHFKDQLFDOO\
RI LWV UHXVH 7KHVH REVWDFOHV PDNH LW GLI¿FXOW WR PRGHO WKH
UHODWLRQVKLSV DQG ZHUH QRW GHDOW ZLWK SUHYLRXV ZRUNV
)LJXUH JLYHV DQ RYHUYLHZ RI WKH 'DQFH5H3URGXFHU V\V
WHP 7KH V\VWHP FRQVLVWV RI WZR SURFHGXUHV GDWDEDVH FRQ
VWUXFWLRQ DQG YLGHR JHQHUDWLRQ ,Q WKLV VHFWLRQ ZH GHVFULEH
WKH GHWDLOV RI WKH V\VWHP DQG H[SODLQ KRZ ZH VROYH WKH
DERYH WZR SUREOHPV LQ PRGHOLQJ XVLQJ WKH PDVKXS FOLSV
4.1 Database construction
,Q WKH GDWDEDVH FRQVWUXFWLRQ GDWDEDVH YLGHRV DUH JDWKHUHG
YLD WKH ZHE DQG WKHQ DXGLR DQG YLVXDO IHDWXUHV DUH H[WUDFWHG
IURP WKH YLGHRV WKURXJK WKH IROORZLQJ VWHSV
6WHS *DWKHU GDQFH PXVLF YLGHRV YLD ZHE DQG UHVDPSOH
WKH VDPSOLQJ IUHTXHQF\ RI WKH PXVLF WR N+]
DQG WKH IUDPHUDWH RI WKH LPDJH VHTXHQFH WR ISV
)LJXUH $
6WHS (VWLPDWH EDU OLQH RI WKH YLGHRV E\ XVLQJ EHDW WUDFN
LQJ WHFKQLTXHV %
6WHS ([WUDFW IHDWXUH YHFWRUV WR OHDUQ WKHLU UHODWLRQVKLS
±
% & 6LQFH WKH DQDO\VLV IUDPH PDWFKHV WKH IUDPH
UDWH WKH GLVFUHWH WLPH VWHS frame-time LV DERXW
PV DERXW SRLQWV 7KH H[WUDFWHG IHDWXUHV
LQ HDFK IUDPHWLPH DUH FDOOHG frame features 7KH
IUDPH IHDWXUHV DUH WKHQ LQWHJUDWHG LQ HDFK EDU WR RE
WDLQ ZKDW DUH FDOOHG bar-level features
4.1.2 Frame feature extraction (Music)
7KH IUDPH IHDWXUHV RI PXVLF DUH GH¿QHG ZLWK WKH KHOS RI
SUHYLRXV ZRUN RQ UHODWLRQVKLSV EHWZHHQ DXGLR DQG YLVXDO
> @ DQG PXVLFDO JHQUH FODVVL¿FDWLRQ >@ 7KHVH IHD
WXUHV UHSUHVHQW PXVLFDO DFFHQWV DQG LPSUHVVLRQV
$V WKH IUDPH IHDWXUHV IRU DFFHQWV WR UHSUHVHQW WHPSRUDO
FKDQJH LQ WKH SRZHU RI WKH DXGLR VLJQDO ZH H[WUDFW WKH ¿O
WHU EDQN RXWSXW 4 GLPV DQG VSHFWUDO ÀX[ 1 GLP $V WKH
IUDPH IHDWXUHV IRU LPSUHVVLRQV WR UHSUHVHQW WLPEUH ZH H[
WUDFW WKH ]HURFURVVLQJ UDWH 1 GLP DQG WK RUGHU 0)&&V
PHOIUHTXHQF\ FHSVWUDO FRHI¿FLHQWV ZLWK D '& FRPSR
QHQW 13 GLPV
4.1.1 Beat tracking
4.1.3 Frame feature extraction (Image sequence)
0XFK ZRUN KDV EHHQ GRQH RQ EHDW WUDFNLQJ >@ DQG ZH
SODQ WR IRFXV RQ XVLQJ VXFK WHFKQLTXHV LQ WKH IXWXUH EXW RXU
FXUUHQW LPSOHPHQWDWLRQ LV D VLPSOH RQH ZKLFK ZDV HIIHFWLYH
LQ RXU SUHOLPLQDU\ H[SHULPHQW
7KH IUDPH IHDWXUHV RI DQ LPDJH VHTXHQFH DUH GH¿QHG ZLWK
WKH KHOS RI SUHYLRXV ZRUN RQ UHODWLRQVKLSV EHWZHHQ DXGLR
DQG YLVXDO >@ 7KHVH IHDWXUHV UHSUHVHQW YLVXDO DFFHQWV
3URFHHGLQJVRIWKH60&WK6RXQGDQG0XVLF&RPSXWLQJ&RQIHUHQFH-XO\3DGRYD,WDO\
DQG LPSUHVVLRQV 7R H[WUDFW WKH IHDWXUHV WKH LPDJH UHVROX
WLRQ LV UHVDPSOHG WR 128 × 96
$V WKH IUDPH IHDWXUHV IRU DFFHQWV WR UHSUHVHQW FDPHUD
ZRUN DQG GDQFH PRWLRQ DQG UHODWHG WHPSRUDO FKDQJHV ZH
H[WUDFW WKH PHDQ YDOXHV RI WKH WHPSRUDO GHULYDWLYH RI WKH
ZHOONQRZQ RSWLFDO ÀRZ DQG EULJKWQHVV 2 GLPV :H XVH
D EORFNPDWFKLQJ DOJRULWKP WR GHWHFW WKH RSWLFDO ÀRZ IURP
LPDJH VHTXHQFHV ZH XVH D 64 × 48 EORFN ZKLFK LV VKLIWHG
E\ PD[LPXP UDQJH LV 7KH IUDPH IHDWXUHV IRU LP
SUHVVLRQV DUH WKH PHDQ YDOXHV DQG VWDQGDUG GHYLDWLRQV RI
WKH KXH VDWXUDWLRQ DQG EULJKWQHVV YDOXHV 6 GLPV ,Q DG
GLWLRQ GLPHQVLRQDO '&7 GLVFUHWH FRVLQH WUDQVIRUP FR
HI¿FLHQWV DUH H[WUDFWHG 4 GLPV IRU YHUWLFDO DQG 3 GLPV IRU
KRUL]RQWDO
4.2.1 Linear regression models for multiple clusters
,Q WKLV SDSHU D ORFDO FRVW LV FDOFXODWHG E\ D OLQHDU UHJUHV
VLRQ PRGHO ZKLFK LV XVHG WR OHDUQ WKH UHODWLRQVKLSV EH
WZHHQ WKH DXGLR DQG YLVXDO EDUOHYHO IHDWXUHV +RZHYHU
WR PRGHO FRPSOH[ UHODWLRQVKLSV VXFK DV ³WKH VDPH YLVXDO
XQLWV DUH XVHG IRU GLIIHUHQW PXVLF´ RU ³GLIIHUHQW YLVXDO XQLWV
DUH XVHG IRU WKH VDPH PXVLF´ )LJXUH RQH UHJUHVVLRQ
PRGHO LV LQVXI¿FLHQW
7KHUHIRUH ZH SURSRVH D OLQHDU UHJUHVVLRQ ZKHUH WKH V\V
WHP XVHV OLQHDU UHJUHVVLRQ PRGHOV IRU PXOWLSOH FOXVWHUV
7KH PXOWLSOH FOXVWHUV DUH REWDLQHG E\ DSSO\LQJ kPHDQV
FOXVWHULQJ WR IHDWXUH YHFWRUV ZKHUH D IHDWXUH YHFWRU LV GH
¿QHG DV D FRQFDWHQDWLRQ RI D EDUOHYHO DXGLR IHDWXUH RI
PXVLF DQG D EDUOHYHO YLVXDO IHDWXUH RI LPDJH VHTXHQFHV
LQ WKH GDWDEDVH 1RWH WKDW WKLV IHDWXUH YHFWRU LV XVHG MXVW IRU
WKH FOXVWHULQJ )RU HDFK FOXVWHU D OLQHDU UHJUHVVLRQ PRGHO
LV WUDLQHG VR WKDW EDUOHYHO YLVXDO IHDWXUHV FDQ EH SUHGLFWHG
E\ EDUOHYHO DXGLR PXVLF IHDWXUHV )LJXUH +
4.1.4 Bar-level feature extraction
:H SURSRVH D bar-level feature ZKLFK LV DQ LQWHJUDWLRQ RI
WKH IUDPH IHDWXUHV LQ HDFK EDU 7R H[WUDFW IHDWXUHV IURP RQH
SLHFH RI PXVLF RU RQH YLGHR FOLS LQ PRVW SUHYLRXV ZRUN
e.g. PXVLFDO JHQUH FODVVL¿FDWLRQ LQWHJUDWLRQ ZDV GRQH XV
LQJ WKH WLPH DYHUDJH DQG LWV VWDQGDUG GHYLDWLRQ >@ +RZ
HYHU VXFK LQWHJUDWLRQ GURSV WHPSRUDO LQIRUPDWLRQ RI WKH
DXGLRYLVXDO IHDWXUHV
,Q WKLV SDSHU ZH LQWHJUDWH WKHVH IUDPH IHDWXUHV WR EDU
OHYHO IHDWXUHV YLD XVLQJ '&7 )LJXUH ' ,Q HDFK EDU
IUDPH IHDWXUHV DUH UHVDPSOHG WR SRLQWV IRU WKH WLPH D[LV
WKH V\VWHP FRPSXWHV '&7 IRU HDFK GLPHQVLRQ DQG WKHQ
WKH UG RUGHU '&7 FRHI¿FLHQWV ZLWK D '& FRPSRQHQW XVHG
DV WKH EDUOHYHO IHDWXUHV 7KHUHIRUH WKH QXPEHU RI GLPHQ
VLRQV RI WKH EDUOHYHO IHDWXUHV LV IRXU WLPHV WKH QXPEHU
IURP WKH IUDPH IHDWXUHV
4.2.2 Image sequence selection under the criteria for
natural/skillful relationships
%\ LQWURGXFLQJ FRVWV UHSUHVHQWLQJ WKH ORFDO DQG FRQWH[W UH
ODWLRQVKLSV ZH FDQ VROYH WKLV YLGHR JHQHUDWLRQ SUREOHP E\
PLQLPL]LQJ WKH FRVWV WKURXJK D 9LWHUEL VHDUFK )LJXUH , 7KH PRGHO RI WKH FOXVWHU KDYLQJ WKH FHQWURLG QHDUHVW
WR WKH LQSXW IHDWXUHV LV VHOHFWHG DQG YLVXDO IHDWXUHV DSSUR
SULDWH WR WKH LQSXW DXGLR IHDWXUHV DUH HVWLPDWHG E\ XVLQJ
WKH PRGHO 7R FDOFXODWH WKH FRVWV RI WKH ORFDO UHODWLRQVKLSV
WKH V\VWHP FDOFXODWHV WKH GLVWDQFH EHWZHHQ WKH HVWLPDWHG
IHDWXUHV DQG WKH YLVXDO IHDWXUHV RI DOO XQLWV
7R UHSUHVHQW WKH FRVWV RI WKH FRQWH[W UHODWLRQVKLSV D PX
VLFDO VWUXFWXUH DQG FKRUXV VHFWLRQ DUH HVWLPDWHG XVLQJ 5H
IUDL' >@ 7KH HVWLPDWHG EHJLQQLQJ DQG HQGLQJ WLPHV RI DOO
VHFWLRQV DUH XVHG DV WKH ERXQGDULHV RI D PXVLFDO VHFWLRQ
+RZHYHU VHFWLRQV OHVV WKDQ EDUV LQ OHQJWK DUH QRW XVHG DV
D VHFWLRQ IRU WKLV SXUSRVH
/HW d(n, km ) EH WKH (XFOLGHDQ GLVWDQFH UHSUHVHQWLQJ WKH
ORFDO FRVW EHWZHHQ WKH n(1 ≤ n ≤ N )WK EDU OHYHO IHDWXUH
RI WKH LQSXW DQG WKH mWK YLGHR¶V kWK XQLW¶V IHDWXUHV RI WKH
GDWDEDVH 7KH FDOFXODWHG ORFDO FRVWV DQG DFFXPXODWHG FRVWV
DUH GH¿QHG DV IROORZV
⎧
⎪
LI ch(n) = 1
⎨d(n, km )
cl (n, km ) =
∧ch(km ) = 1 ,
⎪
⎩
pc × d(n, km ) RWKHUZLVH
4.2 Video generation
,Q WKH YLGHR JHQHUDWLRQ WR VHOHFW YLVXDO XQLWV IRU HDFK
IUDPH IURP WKH GDWDEDVH WKH V\VWHP SURFHVV FRQVLVWV RI
WKH IROORZLQJ VWHSV
6WHS ([WUDFW WKH EDUOHYHO IHDWXUHV RI D JLYHQ PXVLFDO
SLHFH )LJXUH )
6WHS 5HFRQVWUXFW WKH GDWDEDVH *
7R DYRLG JHQHU
DWLQJ D YLGHR ZLWK XQQDWXUDOO\ IDVWVORZ WHPSR YL
VXDO XQLWV ZLWK WHPSL 20% DERYH RU EHORZ WKH LQSXW
WHPSR DUH QRW XVHG IRU WKH IROORZLQJ VWHSV
6WHS $SSO\ 3&$ SULQFLSDO FRPSRQHQW DQDO\VLV IRU
DOO EDUOHYHO IHDWXUHV RI DOO EDUV DQG VWRUH ORZ N GLPHQVLRQDO IHDWXUHV 7KH N GLPHQVLRQ LV GHFLGHG
EDVHG RQ WKH FXPXODWLYH FRQWULEXWLRQ UDWLR ≤ 95%
)RU RXU LQYHVWLJDWLRQV WKH GLPHQVLRQV RI DXGLR DQG
YLVXDO IHDWXUHV GHVFULEHG DERYH ZHUH UHGXFHG IURP
76 WR 62 DQG IURP 80 WR 68 UHVSHFWLYHO\ 6WHS 0RGHO UHODWLRQVKLS EHWZHHQ PXVLF DQG LPDJH VH
TXHQFH IURP WKH GDWDEDVH +
7KLV VWHS LV H[
SODLQHG LQ PRUH GHWDLO EHORZ VHFWLRQ 4.2.1
6WHS 6HOHFW YLVXDO XQLWV XQGHU WKH FULWHULD RI WKH UHOD
WLRQVKLSV GHVFULEHG LQ ,
ca (n, km )
⎧
cl (n, km )
LI (μ = m ∧ κ = k − 1)
⎪
⎪
⎪
⎨ +c (n − 1, κ ) ∨st(n) = st(n − 1)
a
μ
.
= min
τ,μ
⎪
pt × cl (n, km )
⎪
⎪
⎩
+ca (n − 1, κμ ) RWKHUZLVH
ZKHUH ch(n) UHWXUQV LI n LV LQFOXGHG LQ D FKRUXV VHFWLRQ
DQG st(n) UHWXUQV WKH QXPEHU RI PXVLFDO VHFWLRQV $ KLJKHU
pc YDOXH PHDQV WKDW WKH XQLW RI FKRUXV VHFWLRQV DUH PRUH
HDVLO\ VHOHFWHG DW D FKRUXV VHFWLRQ $ ORZHU pt YDOXH PHDQV
WKDW WKH VHOHFWHG XQLW KDV OHVV WLPH FRQWLQXLW\ 7R PLQL
PL]H WKH DFFXPXODWHG FRVW DW WKH N PHDVXUH WKH V\VWHP
6LQFH WKH GDWDEDVH LV UHFRQVWUXFWHG GHSHQGLQJ RQ WKH WHPSR RI WKH
LQSXW WKH UHGXFHG GLPHQVLRQ LV QRW FRQVWDQW
3URFHHGLQJVRIWKH60&WK6RXQGDQG0XVLF&RPSXWLQJ&RQIHUHQFH-XO\3DGRYD,WDO\
&RQGLWLRQ (DFK YLGHR FOLS KDV WKH YLHZ FRXQW E\ XVHUV
RQ WKH ZHE
&RQGLWLRQ 7KH QXPEHU RI DYDLODEOH YLGHR FOLSV LV ODUJH
HQRXJK
VHOHFWV D XQLW ZKLFK KDV PLQLPXP DFFXPXODWHG FRVW dmin DQG WKHQ D LPDJH VHTXHQFH LV JHQHUDWHG E\ EDFNWUDFLQJ
dmin = argmin
k,m
ca (N, km ).
$V FRQWHQW IXO¿OOLQJ DOO RI WKH DERYH FRQGLWLRQV ZH
XVHG PDVKXS YLGHRV ZKLFK DUH JHQHUDWHG IURP -DSDQHVH
GDQFH VLPXODWLRQ JDPHV IXOO RI GDQFH VFHQHV ³7+(
,'2/0#67(5´ DQG ³7+( ,'2/0#67(5 /,9( )25
<28´ ,Q DGGLWLRQ ZH DOVR XVHG GDQFH YLGHRV ZKLFK
DUH JHQHUDWHG XVLQJ MikuMikuDance (MMD) WKDW LV D GLPHQWLRQDO KXPDQ PRWLRQ V\QWKHVL]HU IRU GDQFH SHUIRU
PDQFH %RWK YLGHRV FDQ EH IRXQG RQ D YLGHR VKDULQJ VHU
YLFH NicoNicoDouga 7R FRQVWUXFW D GDWDEDVH ZH JDWK
HUHG RI WKHVH PDVKXS YLGHR FOLSV DQG RI WKHVH
00' YLGHR FOLSV DOO RI ZKLFK KDG WKH YLHZ FRXQW RI RYHU
RQ WKH 1LFR1LFR'RXJD
7KH LQWHUDFWLYH UHVHOHFWLRQ IXQFWLRQ LV LPSOHPHQWHG VR
WKDW WKH V\VWHP FKRRVHV IRXU GLIIHUHQW FDQGLGDWHV IRU HDFK
VHFWLRQ )LJXUH 7KHVH FDQGLGDWHV DUH PDGH IURP
IRXU GLIIHUHQW DFFXPXODWHG FRVWV DQG WKHQ IRXU LPDJH VH
TXHQFHV DUH JHQHUDWHG E\ EDFNWUDFLQJ 7R H[SDQG WKH YD
ULHW\ RI JHQHUDWHG LPDJH VHTXHQFHV WKH FKRVHQ FDQGLGDWHV
DUH PDGH IURP PLQLPXP PLQLPXP PLQLPXP DQG
PD[LPXP DFFXPXODWHG FRVWV 7KLV HQDEOHV JHQHUDWLRQ RI D
YDULDWLRQDO LPDJH VHTXHQFH
4.3 Model training weighted according to view counts
7KLV SDSHU IRFXVHV RQ WKH UHXVH RI WKH 0$' PRYLHV DYDLO
DEOH RQ WKH ZHE 6LQFH WKHUH DUH PDQ\ FUHDWRUV WKH DX
WKRULQJ TXDOLW\ RI JHQHUDWHG YLGHRV YDULHV ZLGHO\ ,Q RWKHU
ZRUGV HDFK YLGHR ZLOO KDYH D GLIIHUHQW OHYHO RI UHOLDELO
LW\ UHJDUGLQJ WKH UHODWLRQVKLSV EHWZHHQ PXVLF DQG LPDJH
:H DVVXPH WKDW D YLGHR JHQHUDWHG E\ D XVHU KDYLQJ JRRG
0$' PRYLH VNLOOV ZLOO KDYH KLJKHU UHOLDELOLW\ DQG KLJKHU
SRVVLELOLW\ RI LWV UHXVH 7KHUHIRUH WR PRGHO DQ DSSURSUL
DWH LPDJH VHTXHQFH WR SDUWLFXODU PXVLF WKH V\VWHP VKRXOG
LQWURGXFH D ZHLJKWLQJ IDFWRU LQ WKH PRGHO WUDLQLQJ SURFHVV
ZKHUH KLJKHU TXDOLW\ YLGHR ZLOO EH JLYHQ D JUHDWHU ZHLJKW
7R HQDEOH DXWRPDWLF MXGJPHQW RI WKH TXDOLW\ ZH LQWUR
GXFH WKH LGHD RI XVLQJ WKH YLHZ FRXQW RI HDFK YLGHR FOLS
RQ WKH ZHE DV D ZHLJKW VLQFH WKH YLHZ FRXQW UHÀHFWV WKH
YLGHR TXDOLW\ /HW ω EH DQ LQWHJHU ZHLJKWLQJ IDFWRU GH¿QHG
DV IROORZV ZKHUH Vc LQGLFDWHV WKH YLHZ FRXQW
w
=
max (α × log10 (Vc ) + 0.5 + β, 0) .
5.2 Trial usage and introspective comments
0DQ\ YLGHRV JHQHUDWHG E\ 'DQFH5H3URGXFHU ZHUH V\Q
FKURQL]HG UHJDUGLQJ UK\WKP DQG LPSUHVVLRQ EHWZHHQ WKH
PXVLF DQG LPDJH VHTXHQFH 7KLV VXJJHVWV WKDW WKH V\VWHP
FDQ EH HIIHFWLYH DQG WKH PRGHOLQJ LV DSSURSULDWH
7ULDO XVHUV RI WKH V\VWHP RIIHUHG FRPPHQWV HVSHFLDOO\
UHJDUGLQJ WKH HIIHFWLYHQHVV RI WKH LQWHUDFWLYH UHVHOHFWLRQ
IXQFWLRQ $ W\SLFDO FRPPHQW ZDV WKDW ³WKH IXQFWLRQ ZDV
XVHIXO DQG HIIHFWLYH´ KRZHYHU LQ FRQWUDVW DQRWKHU XVHU
FRPPHQWHG WKDW ³RFFDVLRQDOO\ WKHUH ZDV QR DSSURSULDWH
FDQGLGDWH´
6RPH FRPPHQWV ZHUH RQ ZD\V WR LPSURYH WKH V\VWHP
SHUIRUPDQFH 2QH XVHU ZKR KDG QR H[SHULHQFH LQ 0$'
PRYLH JHQHUDWLRQ VDLG LW ZRXOG EH XVHIXO WR KDYH ³PRUH
FDQGLGDWHV IRU WKH LPDJH VHTXHQFH´ $QRWKHU FRPPHQW
IURP D XVHU ZKR KDG 0$' PRYLH H[SHULHQFH ZDV WKDW
WKH V\VWHP QHHGHG DQ ³DGMXVWPHQW IXQFWLRQ IRU WKH EDU DQG
ERXQGDU\ RI WKH PXVLFDO VHFWLRQ ´
,Q RXU FXUUHQW LPSOHPHQWDWLRQ α DQG β DUH VHW WR 2 DQG
−7 UHVSHFWLYHO\ 7KLV PHDQV D YLHZ FRXQW RI 10, 000 FRU
UHVSRQGV WR ω = 1 ZKLOH D YLHZ FRXQW RI 100, 000 FRUUH
VSRQGV WR ω = 3 7R LPSOHPHQW WKH ZHLJKWHG WUDLQLQJ WKH
QXPEHU RI EDUOHYHO DXGLRYLVXDO IHDWXUHV WUDLQLQJ VDP
SOHV RI D YLGHR FOLS LV YLUWXDOO\ LQFUHDVHG E\ LWV ω GRX
EOHG E\ ω = 2 IRU H[DPSOH LQ WUDLQLQJ OLQHDU UHJUHVVLRQ
PRGHOV
6. CONCLUSION
DanceReProducer LV D GDQFH YLGHR DXWKRULQJ V\VWHP WKDW
FDQ DXWRPDWLFDOO\ JHQHUDWH GDQFH YLGHR DSSURSULDWH WR PX
VLF E\ UHXVLQJ H[LVWLQJ GDQFH YLGHR VHTXHQFHV 7ULDO XVDJH
RI WKH V\VWHP KDV VKRZQ WKDW LW LV D XVHIXO WRRO IRU XVHUV
ZLWK OLWWOH NQRZOHGJH RU H[SHULHQFH LQ 0$' PRYLH JHQ
HUDWLRQ $OWKRXJK GDQFH YLGHR FRQWHQW LV FXUUHQWO\ VXS
SRUWHG LQ RXU LPSOHPHQWDWLRQ RXU DSSURDFK KDV FDSDELOLW\
WR XWLOL]H IRU DQ\ RWKHU PXVLF YLGHR FOLSV
2QH EHQH¿W RI 'DQFH5H3URGXFHU LV WKDW D XVHU GRHV
QRW QHHG WR HQJDJH LQ WLPHFRQVXPLQJ PDQXDO JHQHUDWLRQ
0RUHRYHU WKH ³UHXVH´ DSSURDFK GHVFULEHG LQ WKLV SDSHU
LV QRYHO LQ WKDW LW DOORZV WKH XVH RI HYHULQFUHDVLQJ XVHU
JHQHUDWHG FRQWHQW RQ WKH ZHE :H H[SHFW WKH H[SDQVLRQ
RI PDVKXS FRQWHQW nWK JHQHUDWLRQ FRQWHQW DQG LWV VXS
SRUWLQJ V\VWHPV WR FUHDWH DQ RSSRUWXQLW\ IRU D QHZ IRUP
RI HQWHUWDLQPHQW 5HPDLQLQJ LVVXHV VXFK DV D TXDQWLWDWLYH
5. IMPLEMENTATION OF DANCEREPRODUCER
,Q WKLV VHFWLRQ ZH GHVFULEH WKH GDWDVHW XVHG DQG WULDO XVHU
FRPPHQWV UHJDUGLQJ WKH V\VWHP HIIHFWLYHQHVV
5.1 Dataset
7R JHQHUDWH D GDQFH YLGHR E\ VHJPHQWLQJ DQG FRQFDWHQDW
LQJ IURP H[LVWLQJ GDQFH YLGHR DQG WR PRGHO WKH YDULRXV
UHODWLRQVKLSV EHWZHHQ PXVLF DQG DQ LPDJH VHTXHQFH WKH
GDWDEDVH VKRXOG IXO¿OO WKH IROORZLQJ IRXU FRQGLWLRQV
&RQGLWLRQ 7KH PDLQ FRQWHQW RI YLGHR FOLSV LV GDQFH
&RQGLWLRQ 9LGHR FOLSV DUH VLPLODU W\SHV RI 0$'
PRYLHV VR WKDW WKHLU PL[WXUH JHQHUDWHG E\ RXU V\V
WHP FDQ ORRN OLNH D FRQVLVWHQW FRQWHQW
KWWSZZZEDQGDLQDPFRJDPHVFRMSFVOLVWLGROPDVWHU
KWWSZZZJHRFLWLHVMSKLJXFKXXLQGH[ HKWP
KWWSZZZQLFRYLGHRMS
'HPRQVWUDWLRQ YLGHR FOLSV JHQHUDWHG E\ RXU V\VWHP DUH DYDLODEOH DW
KWWSVWDIIDLVWJRMSWQDNDQR'DQFH5H3URGXFHU
3URFHHGLQJVRIWKH60&WK6RXQGDQG0XVLF&RPSXWLQJ&RQIHUHQFH-XO\3DGRYD,WDO\
HYDOXDWLRQ RI WKLV V\VWHP IHDWXUH H[WUDFWLRQ IRU GDQFH PR
WLRQ LQ GHWDLO OLNH WKH ERG\ PRWLRQ GHWHFWLRQ DQG DQ LQ
WHUIDFH WKDW FDQ DGMXVW PHDVXUH RU VHFWLRQ ERXQGDULHV ZLOO
EH WRSLFV FRYHUHG LQ RXU IXWXUH ZRUN
>@ 0 1LVKL\DPD 7 .LWDKDUD . .RPDWDQL 7 2JDWD DQG
+ * 2NXQR ³$ &RPSXWDWLRQDO 0RGHO RI &RQJUXHQF\
EHWZHHQ 0XVLF DQG 9LGHR LQ 0XOWLPHGLD &RQWHQW´ LQ
IPSJ SIG Technical Reports 2007-MUS-069 YRO QR SS ± LQ -DSDQHVH
Acknowledgments
>@ * 7]DQHWDNLV DQG 3 &RRN ³0XVLFDO JHQUH FODVVL¿FD
WLRQ RI DXGLR VLJQDOV´ LQ IEEE Trans. on Speech and
Audio Processing YRO QR SS ±
:H WKDQN <XNL +DVHJDZD DQG 7DWVXQRUL +LUDL IRU WKHLU
KHOS
7. REFERENCES
>@ 7 ; )XMLVDZD 0 7DQL 1 1DJDWD DQG + .DWD\RVH
³0XVLF PRRG YLVXDOL]DWLRQ EDVHG RQ TXDQWLWDWLYH
PRGHO RI FKRUG SHUFHSWLRQ´ LQ Journal of Information
Processing Society of Japan YRO QR SS
± LQ -DSDQHVH
>@ & /DXULHU DQG 3 +HUUHUD ³0RRG &ORXG $ UHDOWLPH
PXVLF PRRG YLVXDOL]DWLRQ WRRO´ LQ Proc. of the 2008
Computers in Music Modeling and Retrieval Conference SS ±
>@ 0 *RWR ³$Q DXGLREDVHG UHDOWLPH EHDW WUDFNLQJ V\V
WHP IRU PXVLF ZLWK RU ZLWKRXW GUXPVRXQGV´ LQ Journal of New Music Research YRO QR SS
±
>@ 7 6KLUDWRUL DQG . ,NHXFKL ³6\QWKHVLV RI GDQFH SHUIRU
PDQFH EDVHG RQ DQDO\VHV RI KXPDQ PRWLRQ DQG PXVLF´
LQ IPSJ Transactions on Computer Vision and Image
Media YRO QR SS ±
>@ ;6 +XD / /X DQG +- =KDQJ ³$XWRPDWLFDOO\
&RQYHUWLQJ 3KRWRJUDSKLF 6HULHV LQWR 9LGHR´ LQ Proc.
of the 12th annual ACM international conference on
Multimedia SS ±
>@ 5 &DL / =KDQJ ) -LQJ : /DL DQG :< 0D ³$X
WRPDWHG 0XVLF 9LGHR *HQHUDWLRQ XVLQJ :(% ,PDJH
5HVRXUFH´ LQ Proc. of the 32nd IEEE International
Conference on Acoustics, Speech and Signal Processing (ICASSP2007) SS ,,±±,,
>@ - )RRWH 0 &RRSHUDQG DQG $ *LUJHQVRKQ ³&UHDWLQJ
PXVLF YLGHRV XVLQJ DXWRPDWLF PHGLD DQDO\VLV´ LQ Proc.
of the tenth ACM international conference on Multimedia SS ±
>@ ;6 +XD / /X DQG +- =KDQJ ³$XWRPDWLF PXVLF
YLGHR JHQHUDWLRQ EDVHG RQ WHPSRUDO SDWWHUQ DQDO\VLV´
LQ Proc. of the 12th annual ACM international conference on Multimedia SS ±
>@ 0 *RWR ³$ FKRUXVVHFWLRQ GHWHFWLRQ PHWKRG IRU PX
VLFDO DXGLR VLJQDOV DQG LWV DSSOLFDWLRQ WR D PXVLF´ LQ
IEEE Trans. on Audio, Speech, and Language Processing YRO QR SS ±
>@ 2 *LOOHW 6 (VVLG DQG * 5LFKDUG ³2Q WKH FRUUHODWLRQ
RI DXGLR DQG YLVXDO VHJPHQWDWLRQV RI PXVLF YLGHRV´ LQ
IEEE Trans. on Circuits and Systems for Video Technology YRO QR SS ±
(\HV:HE KWWSZZZLQIRPXVRUJ(\Z0DLQKWPO

Similar documents

PDF of entire issue - Parks College of Engineering, Aviation and

PDF of entire issue - Parks College of Engineering, Aviation and (DUWKTXDNH5HVSRQVH7HDP/HDG5RE:LOOLDPV7KLV SURJUDPZDVSDUWLFXODUO\WLPHO\DVDQHDUWKTXDNH VWUXFN1HZ=HDODQGZLWKLQDGD\RI5RE:LOOLDPV· SUHVHQWDWLRQRQKLVH[SHULHQFHZRUNLQJRQDS...

More information

Sacred Sites and Pilgrimage Routes in the Kii Mountain Range

Sacred Sites and Pilgrimage Routes in the Kii Mountain Range ULYHUV ZDWHUIDOOV KRW VSULQJV DQG UXUDO DUHDV  LQ VKRUW LWV ³&XOWXUDO /DQGVFDSH´SOD\VDQHVVHQWLDOUROH,WLVDQDVVHWRIJUHDWYDOXHXQLTXHWR -DSDQDQGWKHNLQGRIZKLFKWKH...

More information