Visualisation and Analysis of the Internet Movie Database

Transcription

Visualisation and Analysis of the Internet Movie Database
Visualisation and Analysis of the Internet Movie Database∗
Adel Ahmed†
Vladimir Batagelj‡
Xiaoyan Fu§
School of IT, University of Sydney
Discrete and Computational Mathematics
NICTA, Australia
NICTA, Australia
University of Ljubljana, Slovenia
Seok-Hee Hong¶
Damian Merrick
School of IT, University of Sydney
School of IT, University of Sydney
Social Science Informatics
NICTA, Australia
NICTA, Australia
University of Ljubljana, Slovenia
A BSTRACT
In this paper, we present a case study for the visualisation and analysis of large and complex temporal multivariate networks derived
from the Internet Movie DataBase (IMDB). Our approach is to integrate network analysis methods with visualisation in order to address scalability and complexity issues. In particular, we defined
new analysis methods such as (p,q)-core and 4-ring to identify important dense subgraphs and short cycles from the huge bipartite
graphs. We applied island analysis for a specific time slice in order
to identify important and meaningful subgraphs. Further, a temporal Kevin Bacon graph and a temporal two mode network are
extracted in order to provide insight and knowledge on the evolution.
Keywords: Large and Complex Networks, Case Study, Visualisation, Network Analysis, IMDB.
Index Terms: H.5.2 [Information Interfaces and Presentation]:
User Interfaces—Algorithms; I.3.6 [Computer Graphics]: Methodology and Techniques—
1 I NTRODUCTION
Recent technological advances have led to the production of a lot of
data, and consequently have led to many large and complex network
models across a number of domains. Examples include:
• Webgraphs: where the entities are web pages and relationships are hyperlinks; these are huge: the whole graph consists
of billions of nodes.
• Social networks: These include telephone call graphs (used
to trace terrorists), money movement networks (used to detect money laundering), and citation networks or collaboration networks. The size of the network can be medium to very
large.
• Biological networks: Protein-protein interaction (PPI) networks, metabolic pathways, gene regulatory networks and
phylogenetic networks are used by biologists to analyse and
engineer biochemical materials. In general, they are smaller,
with thousands of nodes. However, the relationships in these
networks are very complex.
∗ This paper is based on the winning entry of the Graph Drawing Competition 2005 [7] and invited presentation at Sunbelt Viszard Session [9].
† e-mail: [email protected]
‡ e-mail:[email protected]
§ e-mail:[email protected]
¶ e-mail:[email protected]
e-mail:[email protected]
∗∗ e-mail:[email protected]
Andrej Mrvar∗∗
Understanding these networks is a key enabler for many applications. Good analysis methods are needed for these networks, and
some are available. However, such methods are not useful unless
the results are effectively communicated to humans. Visualisation
can be an effective tool for the understanding of such networks.
Good visualisation reveals the hidden structure of the networks and
amplifies human understanding, thus leading to new insights, new
findings and possible predictions for the future.
We can identify the following challenging research issues for
analysis and visualisation of large and complex networks:
• Scalability: Webgraphs or telephone call graphs gathered by
AT&T have billions of nodes. In some cases, it is impossible
to visualise the whole graph, or one cannot possibly load the
whole graph in a main memory. Hence, the design of new
analysis and visualisation methods for huge networks is a key
research challenge from databases to computer graphics.
• Complexity: Relationships between actors in a social network, for example, can have a multitude of attributes (for example, observed behavior can be confirmed or unconfirmed,
relationships can be directed or undirected, and weighted by
probabilities). Also, biological networks are quite complex
in nature; for example, metabolic pathways have only a few
thousand nodes, but their relationships and interactions are
very complex. The data may be given by nature, but some
parts of the data may be unknown to human scientists. The
design of analysis and visualisation methods to resolve these
complexity issues is the second research challenge.
• Network Dynamics: Real world networks are always changing over time. Many social networks, such as webgraphs,
evolve relatively slowly over time. In some cases, such as telephone call networks, the data is a very fast-streamed graph.
Effective and efficient modeling, analysis and visualisation
for dynamic networks are challenging research topics.
One approach to solve these challenging issues is an integration of analysis with visualisation and interaction. Analysis tools
for networks are not useful without visualisation, and visualisation
tools are not useful unless they are linked to analysis. Further, interaction is necessary to find out more details or insights from the
visualisation.
In this paper, we present a case study for our approach to integrating analysis, visualisation and interaction using large and complex temporal multivariate networks derived from the IMDB (Internet Movie Data Base). In general, the IMDB is a huge and very
rich data set with many attributes. Note that the IMDB data set has
become a challenging data set for visualisation researchers [7, 9].
For example, a multi-scale approach for visualisation of small
world networks was used for data sets from IMDB [3]. A visualization approach for dynamic affiliation networks in which events
are characterized by a set of descriptors was presented [6]. A radial ripple metaphor was devised to display the passing of time and
’EnquŒtes du commissaire Maigret, Les’
Popular Science
Unusual Occupations
Richard, Jean (I)
Whitman, Gayne
Carpenter, Ken (I)
’Sitte, Die’
Heinrichs, Dirk
Gawlich, Cathlen
B hm, Iris
Boyd, Karin
’Nero Wolfe Mystery, A’
Hutton, Timothy
Fox, Colin (I)
Dunn, Conrad
Chaykin, Maury
’Commissario Corso, Il’
Abatantuono, Diego
Maggio, Rosalia
Panczak, Hans Georg
Martens,
’Operation Phoenix - J ger zwischen den
Welten’Dirk (I)
Jarczyk, Robert
Bock, Alana
Pfohl, Lawrence
Flair, Ric
Borden, Steve (I)
Starrcade
Dansk melodi grand prix
Eurovision Song Contest, The
Rasmussen, Tommy (I)
Olsen, Jłrgen
Heick, Keld
de Mylius, Jłrgen
Siggaard, Kirsten
Hłeg, Jannie
Kelehan, Noel
Berry, Colin
Statsministerens nyt rstale
Schl ter, Poul
Rasmussen, Poul Nyrup
Cream of Comedy
Sims, Tim
Leese, Lindsay
Kennedy Center Honors: A Celebration of the Performing Arts, The
Dronningens nyt rstale
Cronkite, Walter
Margrethe II
Levesque, Paul Michael
Jacobs, Glen
Hickenbottom, Michael
Gunn, Billy (II)
Hart, Owen
Traylor, Raymond
DiBiase, Ted
Anoai, Solofatu
Ross, Jim (III)
Royal Rumble
Hart, Bret
Summerslam
Smith, Davey Boy
King of the Ring
Lawler, Jerry
Survivor Series
Eaton, Mark (II)
Calaway, Mark
McMahon, Vince
Figure 1: Arcs with multiplicity at least 8
conveys relations among the different constituents through appropriate layout. Note that the method is suitable for an egocentric
perspective.
As the first step of our approach, we integrate network analysis
methods [5, 10] with visualisation. In particular, we defined the
new analysis methods such as (p,q)-core and 4-ring to identify important dense subgraphs and short cycles from the huge bipartite
graphs. We applied island analysis for a specific time slice in order
to identify important and meaningful subgraphs of the large and
complex network. Further, a temporal Kevin Bacon graph and a
temporal two mode network are extracted and visualised in order to
provide insight and knowledge on the evolution of the IMDB data
set.
This paper is organised as follows. In the next Section, we
present a simple analysis of the IMDB data set. In Section 3, we
present the integration of network analysis methods with visualisation for large bipartite graphs including (p,q)-core, 4-ring and island. Section 4 presents visual analysis based on the Kevin-Bacon
number. Section 5 presents galaxy metaphor visualisation of a temporal two mode actor-movie network, and a visual analysis of the
two mode network with company attributes. Section 6 concludes.
2
BASIC CHARACTERISTICS OF IMDB
The source of the original data is the Internet Movie Database.
We transformed the contest data into a temporal network with
some additional vectors and partitions describing the properties
of vertices. The IMDB network is bipartite (two mode) and has
1324748 = 428440 + 896308 vertices and 3792390 arcs. 9927 of
the arcs in the network are multiple (parallel) arcs. The nature of
the appearance of multiple arcs can be seen in Figure 1, where all
arcs with multiplicity of at least 8 are displayed.
Note that in the analysis that follows, we treat multiple arcs as
single. The IMDB network consists of 132714 weak components.
3
V ISUALISATION AND A NALYSIS OF L ARGE B IPARTITE
N ETWORKS
There are few direct specialized methods for analyzing bipartite
networks, especially large ones. Because of the size of the IMDB
network, the standard reduction of the entire network to one or the
other derived 1-mode network was not an option. This motivated us
to design and implement two new methods for analysis of bipartite
networks:
• bipartite version of cores – (p, q)-cores
Table 1: (p, q : n1 , n2 ) for IMDB
1 1590: 1590
1 | 22 24: 1854 1153 | 43 14: 29 83
2 516: 788
3 | 23 23:
47
56 | 44 14: 29 83
3 212: 1705
18 | 24 23:
34
39 | 45 13: 30 95
4 151: 4330 154 | 25 22:
42
53 | 46 13: 29 94
5 131: 4282 209 | 26 22:
31
38 | 47 12: 29 101
6 115: 3635 223 | 27 22:
31
38 | 48 12: 28 100
7 101: 3224 244 | 28 20:
36
53 | 49 12: 26 95
8
88: 2860 263 | 29 20:
35
52 | 50 11: 27 111
9
77: 3467 393 | 30 19:
35
59 | 51 11: 26 110
10
69: 3150 428 | 31 19:
35
59 | 52 11: 16 79
11
63: 2442 382 | 32 19:
34
57 | 53 10: 35 162
12
56: 2479 454 | 33 18:
34
62 | 54 10: 35 162
13
50: 3330 716 | 34 18:
34
62 | 55 10: 34 162
14
46: 2460 596 | 35 18:
33
61 | 56 10: 34 162
15
42: 2663 739 | 36 17:
33
65 | 57 9: 35 187
16
39: 2173 678 | 37 16:
33
75 | 58 9: 33 180
17
35: 2791 995 | 38 16:
30
73 | 59 9: 33 180
18
32: 2684 1080 | 39 16:
29
70 | 60 9: 32 178
19
30: 2395 1063 | 40 15:
29
77 | 61 9: 31 177
20
28: 2216 1087 | 41 15:
28
76 | 62 9: 31 177
21
26: 1988 1087 | 42 15:
28
76 | 63 8: 31 202
• 4-ring weights on lines
3.1 (p, q)-core Analysis
The subset of vertices C ⊆ V is a (p, q)-core in a bipartite (2-mode)
network N = (V1 ,V2 ; L), V = V1 ∪V2 if and only if
a. in the induced subnetwork K = (C1 ,C2 ; L(C)), C1 = C ∩ V1 ,
C2 = C ∩ V2 it holds ∀v ∈ C1 : degK (v) ≥ p and ∀v ∈ C2 :
degK (v) ≥ q ;
b. C is the maximal subset of V satisfying condition a.
The basic properties of bipartite cores are:
• C(0, 0) = V
• K(p, q) is not always connected
• (p1 ≤ p2 ) ∧ (q1 ≤ q2 ) ⇒ C(p1 , q1 ) ⊆ C(p2 , q2 )
Using (p, q)-cores, we can identify important dense structure out
of large and complex networks. We design a very efficient O(m)
algorithm to fine (p, q)-cores, and implement in Pajek .
Since there are many (p, q)-cores, we must answer the question
of how to select the interesting ones among them. To help the user
in these decisions, we implemented a Table of cores’ characteristics
n1 = |C1 (p, q)|, n2 = |C2 (p, q)| and k – number of components in
K(p, q) (see Table 1 and 2). We look for (p, q)-cores where
• n1 + n2 ≤ selected threshold
• big jumps from C(p − 1, q) and C(p, q − 1) to C(p, q).
For example, we selected (247,2)-core and (27,22)-core. From
the labels we can see that the corresponding topics are: wrestling,
and pornography. See Figures 2 and 3.
3.2 4-ring Analysis
A k-ring is a simple closed chain of length k. Using k-rings we can
define a weight of edges as wk (e) = # of different k-rings containing
the edge e ∈ E.
Since for a complete graph Kr , r ≥ k ≥ 3 we have wk (Kr ) =
(r − 2)!/(r − k)! the edges belonging to cliques have large weights.
Therefore, these weights can be used to identify the dense parts of
a network. For example, all r-cliques of a network belong to r − 2edge cut for the weight w3 .
Zhukov, Boris (I)
Wright, Charles (II)
Wilson, Al (III)
Wight, Paul
Wickens, Brian
White, Leon
Warrior
Warrington, Chaz
Ware, David (II)
Waltman, Sean
Walker, P.J.
von Erich, Kerry
Vaziri, Kazrow
Van Dam, Rob
Valentine, Greg
Vailahi, Sione
Tunney, Jack
Traylor, Raymond
Tenta, John
Taylor, Terry (IV)
Taylor, Scott (IX)
Tanaka, Pat
Tajiri, Yoshihiro
Szopinski, Terry
Storm, Lance
Steiner, Scott
Steiner, Rick (I)
Solis, Mercid
Snow, Al
Smith, Davey Boy
Slaughter, Sgt.
Simmons, Ron (I)
Shinzaki, Kensuke
Shamrock, Ken
Senerca, Pete
Scaggs, Charles
Savage, Randy
Saturn, Perry
Sags, Jerry
Ruth, Glen
Runnels, Dustin
Rude, Rick
Rougeau, Raymond
Rougeau Jr., Jacques
Rotunda, Mike
Ross, Jim (III)
Rock, The
Roberts, Jake (II)
Rivera, Juan (II)
Rhodes, Dusty (I)
Reso, Jason
Reiher, Jim
Reed, Bruce (II)
Race, Harley
Prichard, Tom
Powers, Jim (IV)
Poffo, Lanny
Plotcheck, Michael
Piper, Roddy
Pfohl, Lawrence
Pettengill, Todd
Peruzovic, Josip
Palumbo, Chuck (I)
Page, Dallas
Ottman, Fred
Orton, Randy
Okerlund, Gene
Nowinski, Chris
Norris, Tony (I)
Nord, John
Neidhart, Jim
Nash, Kevin (I)
Muraco, Don
Morris, Jim (VII)
Morley, Sean
Morgan, Matt (III)
Mooney, Sean (I)
Moody, William (I)
Miller, Butch
Mero, Marc
McMahon, Vince
McMahon, Shane
Matthews, Darren (II)
Martin, Andrew (II)
Martel, Rick
Marella, Robert
Marella, Joseph A.
Manna, Michael
Lothario, Jose
Long, Teddy
LoMonaco, Mark
Lockwood, Michael
Levy, Scott (III)
Levesque, Paul Michael
Lesnar, Brock
Leslie, Ed
Leinhardt, Rodney
Layfield, John
Lawler, Jerry
Lawler, Brian (II)
Laurinaitis, Joe
Laughlin, Tom (IV)
Lauer, David (II)
Knobs, Brian
Knight, Dennis (II)
Killings, Ron
Kelly, Kevin (VIII)
Keirn, Steve
Jones, Michael (XVI)
Johnson, Ken (X)
Jericho, Chris
Jarrett, Jeff (I)
Jannetty, Marty
James, Brian (II)
Jacobs, Glen
Jackson, Tiger
Hyson, Matt
Hughes, Devon
Huffman, Booker
Howard, Robert William
Howard, Jamie
Houston, Sam
Horowitz, Barry
Horn, Bobby
Hollie, Dan
Hogan, Hulk
Hickenbottom, Michael
Heyman, Paul
Hernandez, Ray
Henry, Mark (I)
Hennig, Curt
Helms, Shane
Hegstrand, Michael
Heenan, Bobby
Hebner, Earl
Hebner, Dave
Heath, David (I)
Hayes, Lord Alfred
Hart, Stu
Hart, Owen
Hart, Jimmy (I)
Hart, Bret
Harris, Ron (IV)
Harris, Don (VII)
Harris, Brian (IX)
Hardy, Matt
Hardy, Jeff (I)
Hall, Scott (I)
Guttierrez, Oscar
Gunn, Billy (II)
Guerrero, Eddie
Guerrero Jr., Chavo
Gray, George (VI)
Goldberg, Bill (I)
Gill, Duane
Gasparino, Peter
Garea, Tony
Funaki, Sho
Fujiwara, Harry
Frazier Jr., Nelson
Foley, Mick
Flair, Ric
Finkel, Howard
Fifita, Uliuli
Fatu, Eddie
Farris, Roy
Eudy, Sid
Enos, Mike (I)
Eaton, Mark (II)
Eadie, Bill
Duggan, Jim (II)
Douglas, Shane
DiBiase, Ted
DeMott, William
Davis, Danny (III)
Darsow, Barry
Cornette, James E.
Copeland, Adam (I)
Constantino, Rico
Connor, A.C.
Cole, Michael (V)
Coage, Allen
Coachman, Jonathan
Clemont, Pierre
Clarke, Bryan
Chavis, Chris
Centopani, Paul
Cena, John (I)
Canterbury, Mark
Candido, Chris
Calaway, Mark
Bundy, King Kong
Buchanan, Barry (II)
Brunzell, Jim
Brisco, Gerald
Bresciano, Adolph
Bloom, Wayne
Bloom, Matt (I)
Blood, Richard
Blanchard, Tully
Blair, Brian (I)
Blackman, Steve (I)
Bischoff, Eric
Bigelow, Scott ’Bam Bam’
Benoit, Chris (I)
Batista, Dave
Bass, Ron (II)
Barnes, Roger (II)
Backlund, Bob
Austin, Steve (IV)
Apollo, Phil
Anoai, Solofatu
Anoai, Sam
Anoai, Rodney
Anoai, Matt
Anoai, Arthur
Angle, Kurt
AndrØ the Giant
Anderson, Arn
Albano, Lou
Al-Kassi, Adnan
Ahrndt, Jason
Adams, Brian (VI)
Young, Mae (I)
Wright, Juanita
Wilson, Torrie
Vachon, Angelle
Stratus, Trish
Runnels, Terri
Robin, Rockin’
Psaltis, Dawn Marie
Moretti, Lisa
Moore, Jacqueline (VI)
Moore, Carlene (II)
Mero, Rena
McMichael, Debra
McMahon, Stephanie
Martin, Judy (II)
Martel, Sherri
Laurer, Joanie
Keibler, Stacy
Kai, Leilani
Hulette, Elizabeth
Guenard, Nidia
Garc a, LiliÆn
Ellison, Lillian
Dumas, Amy
Survivor Series
Royal Rumble
Table 2: (p, q : n1 , n2 ) for IMDB
Size Freq
Size Freq
Size Freq
Size Freq
-------------------------------------------------------2 5512
20
19
38
4
59
2
3 1978
21
18
39
3
61
1
4 1639
22
15
40
2
64
1
5
968
23
9
42
2
67
1
6
666
24
13
43
3
70
1
7
394
25
12
45
3
73
1
8
257
26
6
46
4
76
1
9
209
27
6
47
5
82
1
10
148
28
5
48
1
86
1
11
118
29
6
49
2
106
1
12
87
30
3
50
2
122
1
13
55
31
6
51
1
135
1
14
62
32
5
52
2
144
1
15
46
33
3
53
1
163
1
16
39
34
1
54
2
269
1
17
27
35
5
55
1
301
1
18
28
36
4
57
1
332
2
19
29
37
7
58
1
673
1
--------------------------------------------------------
Kesten, Brad
Brando, Kevin
Robbins, Peter (I)
Shea, Christopher (I)
Altieri, Ann
Ornstein, Geoffrey
Hauer, Brent
Charlie Brown and Snoopy Show
Reilly, Earl ’Rocky’
Charlie Brown Celebration
You Don’t Look 40, Charlie Brown
He’s Your Dog, Charlie Brown
Making of ’A Charlie Brown Christmas’
You’re In Love, Charlie Brown
It’s the Great Pumpkin, Charlie Brown
Charlie Brown’s All Stars!
Life Is a Circus, Charlie Brown
Charlie Brown Christmas
Race for Your Life, Charlie Brown
Be My Valentine, Charlie Brown
Mendelson, Karen
Stratford, Tracy
Schoenberg, Jeremy
It’s Magic, Charlie Brown
Dryer, Sally
Melendez, Bill
You’re a Good Sport, Charlie Brown
It’s a Mystery, Charlie Brown
Boy Named Charlie Brown
It’s an Adventure, Charlie Brown
It’s Flashbeagle, Charlie Brown
Play It Again, Charlie Brown
Momberger, Hilary
Is This Goodbye, Charlie Brown?
Charlie Brown Thanksgiving
There’s No Time for Love, Charlie Brown
You’re Not Elected, Charlie Brown
Snoopy Come Home
It’s the Easter Beagle, Charlie Brown
Shea, Stephen
Figure 2: (247,2)-core
’WWF Smackdown!’
’WWE Velocity’
’Sunday Night Heat’
’Raw Is War’
WWF Vengeance
WWF Unforgiven
WWF Rebellion
WWF No Way Out
WWF No Mercy
WWF Judgment Day
WWF Insurrextion
WWF Backlash
WWE Wrestlemania XX
WWE Wrestlemania X-8
WWE Vengeance
WWE Unforgiven
WWE SmackDown! Vs. Raw
WWE No Way Out
WWE No Mercy
WWE Judgment Day
WWE Armageddon
Wrestlemania X-Seven
Wrestlemania X-8
Wrestlemania 2000
Survivor Series
Summerslam
Royal Rumble
No Way Out
King of the Ring
Invasion
Fully Loaded
Taylor, Scott (IX)
Van Dam, Rob
Matthews, Darren (II)
LoMonaco, Mark
Hughes, Devon
Huffman, Booker
Heyman, Paul
Hebner, Earl
McMahon, Stephanie
Keibler, Stacy
Wight, Paul
Simmons, Ron (I)
Senerca, Pete
Ross, Jim (III)
Rock, The
Reso, Jason
McMahon, Vince
McMahon, Shane
Martin, Andrew (II)
Levesque, Paul Michael
Layfield, John
Lawler, Jerry
Jericho, Chris
Jacobs, Glen
Hardy, Matt
Hardy, Jeff (I)
Gunn, Billy (II)
Guerrero, Eddie
Copeland, Adam (I)
Cole, Michael (V)
Calaway, Mark
Bloom, Matt (I)
Benoit, Chris (I)
Austin, Steve (IV)
Anoai, Solofatu
Angle, Kurt
Stratus, Trish
Dumas, Amy
Figure 3: (27,22)-core
The 3-ring weights were already available [8]. However, there
are no 3-rings in the IMDB network. The densest substructures
are complete bipartite subgraphs K p,q . They contain many 4-rings.
This motivated us to design a method to find 4-rings weights. We
implement it in Pajek .
Figure 4: Charlie Brown
To identify interesting substructures, we applied the simple islands procedure for the weight w4 . It takes around three minutes to
compute w4 weights on a 1400 MHz, 1GB RAM computer, and 13
seconds to determine the islands. We obtained 12465 simple line
islands on 56086 vertices. Here is their size distribution.
There are 94 of size at least 30; and only 10 over 100. The
largest island corresponds to wrestling. Each island represents a
special topic. We visualized only some of them. For example, see
Figures 4, 5, 6, 7 and 8.
3.3
Time slices and Island Analysis
By extracting a time slice from the complete network, we can identify the main groups in selected time periods. Islands can identify
important subgraphs of large networks based on the value of attributes [4].
To illustrate this, we extracted the time slice 1935-1950. There
are 223 simple islands [4] for w4 on 1774 vertices. For example,
we selected island 6 – ’Dona Macabra’; see Figure 9.
4
T EMPORAL C O -S TARRING N ETWORK :
N ETWORK
K EVIN -BACON
We extracted a small important subset of the actors in the IMDB
network and constructed from it a dynamic visualisation of a 1mode network showing the co-appearance of actors in films.
To define a sufficiently small important subgraph, we first considered only nodes in the network with a Kevin Bacon number of
1. The Kevin Bacon number of an actor is a similar concept to the
Sawak nus el lail
Sergeant Madden
Honky Tonk
Soltan, Hoda
Hoodlum Saint, The
Roaring Twenties, The
Malak el zalem, El
Rostom, Hind
Fatawa, El
Unconquered
El Dekn, Tewfik
Union Pacific
Phelps, Lee (I)
Flavin, James
Big City
Tarik el saada
Hub fil zalam
Saum, Cliff
Wells Fargo
Star Is Born, A
Sittat afarit, alFatat el mina
Hareb min el ayyam
Abu Hadid
Elf laila wa laila
Souk el selah
Nashal, El
Maktub alal guebin
Fatawat el Husseinia
Amir el antikam
Abid el gassad
Ghaltet ab
Abu Dahab
Aguazet seif
Hamida
Batal lil nehaya
Namrud, El
Ebn el-hetta
Nassab, El
Zoj el azeb, El
Abid el mal
Cass el azab
Ghazal al-banat
Rasif rakam khamsa
Laab bil nar, El
Iskanderija... lih?
Imlak, El
Matloub zawja fawran
Sarhan, Shukry
Port Said
Riad, Hussein
Dunn, Ralph
Hamama, Faten
San Quentin
You Can’t Take It with You
Hamdi, Imad
Ard el ahlam
Vogan, Emmett
Chandler, Eddy
Flowers, Bess
Shawqi, Farid
Baad al wedah
Massiada, Al
Asrar el naas
Baba Amin
Beyt al Taa
Haked, El
Osta Hassan, El
Ibn al ajar
Ana bint min?
Murra kulshi, El
Mohtal, El
Zalamuni el habaieb
Ashki limin?
Ana zanbi eh?
O’Connor, Frank (I)
Whole Town’s Talking, The
Sullivan, Charles (I)
Nancy Drew... Reporter
Dust Be My Destiny
Meet John Doe
Castle on the Hudson
Holmes, Stuart
Valley of the Giants
Racket Busters
Kid Galahad
Go Getter, The
They Made Me a Criminal
Women in the Wind
El-Meliguy, Mahmoud
Abu Ahmad
Mower, Jack
Man Who Talked Too Much, The
Naughty But Nice
Yankee Doodle Dandy
Kid From Kokomo, The
King of the Underworld
They Drive by Night
Secret Service of the Air
Figure 7: Shawqi, Farid and El-Meliguy, Mahmoud
Bad Men of Missouri
Adventures of Mark Twain, The
Polizeiruf 110 - Henkersmahlzeit
Polizeiruf 110 - Der Pferdem rder
Polizeiruf 110 - Tote erben nicht
Polizeiruf 110 - Kurschatten
Polizeiruf 110 - Mordsfreunde
Polizeiruf 110 - Der Spieler
Polizeiruf 110 - Todsicher
Polizeiruf 110 - Hei kalte Liebe
Polizeiruf 110 - Jugendwahn
Polizeiruf 110 - Doktorspiele
Polizeiruf 110 - Angst um Tessa B low
Polizeiruf 110 - Rosentod
Polizeiruf 110 - Zerst rte Tr ume
Starkes Team - Die Natter, Ein
Starkes Team - Im Visier des M rders, Ein
Starkes Team - Braunauge, Ein
Starkes Team - Verraten und verkauft, Ein
Starkes Team - Bankraub, Ein
Starkes Team - Tr ume und L gen, Ein
Starkes Team - Der sch ne Tod, Ein
Starkes Team - Das gro e Schweigen, Ein
Winkler, Wolfgang
Starkes Team - Mordlust, Ein
Starkes Team - Der Todfeind, Ein
Starkes Team - Kleine Fische, gro e Fische, Ein
Schwarz, Jaecki
Starkes Team, Ein
Starkes Team - Lug und Trug, Ein
Starkes Team - Auge um Auge, Ein
Horner, Mike
Starkes Team - T dliche Rache, Ein
Michaels, Sean
Starkes Team - Kollege M rder, Ein
Sanders, Alex (I)
North, Peter (I)
Starkes Team - Eins zu Eins, Ein
Dough, Jon
Starkes Team - Der letzte Kampf, Ein
Voyeur, Vince
Starkes Team - Kindertr ume, Ein
Davis, Mark (V)
Starkes Team - Der Mann, den ich hasse, Ein
Boy, T.T.
Starkes Team - Blutsbande, Ein
Morgan, Jonathan (I)
Polizeiruf 110 - Kopf in der Schlinge
Polizeiruf 110 - Ein Bild von einem M rder
Figure 5: Mower, Jack and Phelps, Lee
Starkes Team - Das Bombenspiel, Ein
Smashing the Money Ring
Starkes Team - Sicherheitsstufe 1, Ein
Knockout
Lerche, Arnfried
Bademsoy, Tayfun
Lansink, Leonard
Starkes Team - Der Verdacht, Ein
Thomas, Paul (I)
Starkes Team - Roter Schnee, Ein
Savage, Herschel
Starkes Team - Erbarmungslos, Ein
Wallice, Marc
’Aff re Semmeling, Die’
Jeremy, Ron
Maranow, Maja
West, Randy (I)
Martens, Florian
Silvera, Joey
Starkes Team - M rderisches Wiedersehen, Ein
Drake, Steve (I)
Byron, Tom
Figure 6: Adult
Figure 8: Polizeiruf 110 and Starkes Team
Erdös number of a mathematician; it represents the length of the
shortest path in the movie star collaboration network from the actor
to Kevin Bacon.
The data set was divided into time slices of a decade in length
(e.g. 1920s, 1930s, etc.), and the set of actors reduced in each
decade to only those who had co-starred in at least 5 films with
another actor with a Kevin Bacon number of 1. The sizes of the
graphs for each of these time slices are given in Table 3.
The 1-mode co-starring networks of these reduced sets of actors
were constructed for each decade, and a three-dimensional layout
was generated for each using the Scale-free network layout [2]in
GEOMI [1]. Nodes in the force-directed layout were restricted to
lie on one of three concentric spheres, depending on the degree of
the node [2]. The colouring of each node was also used to indicate
the degree. The size of each node was dependant on the number of
movies in which the corresponding actor starred in that particular
decade. Similarly, the width of an edge was used to represent the
number of co-appearances between two actors in a decade.
To effectively illustrate the evolution of the co-starring network,
we display smooth animations between the layouts of subsequent
decades. The animations are broken into several parts shown one
after the other in time, in order to aid retention of the mental map.
First, nodes and edges not present in the first layout are faded out.
Nodes present in both first and second layouts are then animated to
their new positions in the second layout. Nodes new to the second
layout burst out from the centre and come to rest in their calculated positions, and finally new edges are faded in to show the new
collaborations in the second decade. The animation is downloadable from http://www.it.usyd.edu.au/∼dmerrick/gd05contest/gd05final.avi
Gonzalez, Gibran
Langlands, Rob
Fernandez, Emiliano
Janitors, The
Cardona, Renan
Arenas, Mathieu
Cabello, Antonio
Misterio del latigo negro, El
Tesoro de Morgan, El
Noriega, Leonardo J.
Del Degan, Davide
Calles, David
Villate, Victor
Triboulet
Lupo und der Muezzin
Blanco, Tomas (I)
Tehtaan varjossa
Trevino, Alejandro
Aroza, Diego
Gomez, Martha
Buendia, Jorge
Primo Baby
Tierra y mar del noroeste
Velasco, Gary
Frank, Constanze
Monja alferez, La
Martinez, Pablo (V)
de Anda, Rafael
Frauscher, Richard
Rueda, Enrique
Lopez, Celso
Tu Hau
Segarra, Carol
Silencio roto
Rayo de luz, Un
Obregon, Julia
Roldan, Celia
Hoy canto para ti
Martin Fierro
Zea, Kristi
Todo un caballero
Barreiro, Jose
Perez, Jose A. (I)
Parra, Aleksandr
Camargos, Glaucia
Busquets, Enrique
D’Org, Olga
Escobar, Valeria
O’Farril, Alfredo
Villarreal, Juan Antonio
Lopez, Bruno
Suenos atomicos
Soler, Cote
Sor Juana Inez de la cruz
Marti, Adam
Isla Isabel
Deray, Sara
Wimer, Homero
Calvo, Ricardo
Dona Macabra
Madre padrona
Delholm, Kirsten
Morales, Lucy
Figure 9: Dona Macabra
KB1
Initial
all decades, no filtering
1910s, ≥ 5 films
1920s, ≥ 5 films
1930s, ≥ 5 films
1940s, ≥ 5 films
1950s, ≥ 5 films
1960s, ≥ 5 films
1970s, ≥ 5 films
1980s, ≥ 5 films
1990s, ≥ 5 films
2000s, ≥ 5 films
V
1324748
2742
16
4
25
17
19
16
79
59
207
124
Figure 10: The co-starring actors visualisation (1960s)
E
3792390
336060
18
2
53
17
18
35
411
73
425
208
Table 3: Graph sizes per decade of co-starring network
This process was continued for all decade slices from 1911
through to 2004, and the result can be seen in the downloadable
animation. Figures 10, 11, 12, 13, 14 show snapshots of the animation from the 1960s through to the early 2000s.
The visualisation revealed a number of interesting facts. One unexpected finding was the substantial number of actors with a Kevin
Bacon number of 1 in the early years of the twentieth century, some
of whom could clearly not have co-starred in a film with Kevin Bacon. This revealed some problems in the collection of the movie
data set. The years of some movies had been recorded incorrectly,
while edges to other movies that possessed the same name as a
movie of a prior decade were all recorded as belonging to the earlier
movie.
In the 1960s (Figure 10), the visualisation shows a clique involving the US president John F. Kennedy. This is due to the assassination of Kennedy in 1963, and the subsequent barrage of documentaries that were produced detailing the event. The other actors in the
clique (Jacqueline Kennedy, John and Nellie Connally, etc.) were
all present at the assassination. They are present in this data set
since the movie JFK, starring Kevin Bacon, included real archive
footage of the assassination. The Kennedys continue through to
later decades in the visualisation, illustrating the vast number of
documentary films developed that were based on this event.
The 1970s, shown in Figure 11, sees the first large connected
group of Hollywood actors that continue as big names to this day.
James Earl Jones, Robert Redford, Steve Martin and John Travolta
all appear in this group.
Figure 11: The co-starring actors visualisation (1970s)
The visualisation of the 1980s (Figure 12) highlights some particularly close-knit groups of actors. Comedy stars Chevy Chase,
Dan Akroyd and Bill Murray appear due to roles in Satuday Night
Live, Caddy Shack and Spies Like Us. Also present are Jim Cummings, Jack Angel and Rob Paulson, who have quite high degrees
due to their involvement as voice actors in many short cartoons and
episodes.
These groups continue into the 1990s, where the groups of actors
become much larger and more highly connected (Figure 13). More
well-established modern actors like Whoopi Goldberg, Tom Hanks
and Dennis Hopper become particularly prominent in this decade.
Finally, in the 2000s, we see some particularly interesting and
unexpected phenomena (Figure 14). First, music stars such as Britney Spears, Beyoncé Knowles and Sheryl Crow appear with very
high degree and connectedness, due to their participation in numerous music award shows. Secondly, on the other side of the visualisation, popular actor Arnold Schwarzenegger links politicians to
the movie stars and musicians in the rest of the co-starring network.
This was primarily due to Schwarzenegger’s entry into politics, in
Figure 12: The co-starring actors visualisation (1980s)
Figure 14: The co-starring actors visualisation (2000s)
reduce visual complexity as follows.
We define the “stars” from the IMDB as follows:
• every star actor must have been in more than 12 movies over
the whole time period
• every star movie must have more than 12 actors
• each star actor must have played in between three to six
movies in each year
Figure 13: The co-starring actors visualisation (1990s)
becoming the governor of the US state of California. Following
this event, he was in several political documentaries in which Bill
Clinton also appeared. Bill Clinton, in turn, is linked through documentaries and archival footage to other famous politicians, such as
Ronald Reagan, Richard Nixon and John F. Kennedy.
A G ALAXY OF M OVIE S TARS OF T EMPORAL ACTOR M OVIE N ETWORK
This section describes a galaxy of movie stars of the temporal actormovie network with animation (in order to see the overview), and
a visualisation of the network of specific time slice (in order to see
the details).
First we consider a “galaxy of stars” metaphor of the movie-actor
network. The main idea is to map the “movie stars” in a movie
(i.e. animation) of a galaxy of stars which displays actor-movie
interactions.
Representing as much information as possible without introducing overwhelming visual complexity has always been a challenge
when visualising large data sets. We define important subgraphs to
5
We again use a bipartite (2-mode) network model. There are two
types of nodes: actor nodes and movie nodes. Actor nodes are displayed as stars in the night sky, and edges are displayed as faint
lines joining up “constellations” of actors (See Figure 15). Edges
with bends are displayed between actor and movie nodes; however,
movie nodes are hidden; in this manner, collaboration between actors can easily be seen. In this case, the picture not only reduces the
visual complexity (especially for edges), but also represents actormovie and actor-actor interactions at the same time.
To produce an overview of the temporal network dynamics, we computed a layout for each year from 1907 to 2004
and produced an animation. A two-dimensional force-directed
layout was generated for each year’s subgraph using GEOMI
[1]. The animation is performed between each layout, in a
similar manner to the animation of the co-starring authors network in the previous section. The animation is available from
http://www.it.usyd.edu.au/∼dmerrick/gd05contest/gd05-final.avi
Once we have an overview of the temporal network using an
animation, we now focus on the details of the specific year of the
network to observe some interesting patterns in specific time periods.
Figure 16 shows part of the layout of year 1918. Those three
actors co-starred in five movies together; on the other hand, they did
not appear in any other movies. Only one of the movies includes
actors from outside. This kind of pattern can be usually found in
the early years.
Figures 17 and 18 show a different pattern. They are both captured from the layout of year 1983. In Figure 17, nineteen actors
co-starred in a masterpiece. In Figure 18, the same group of people starred in a series of movies together, whilst also appearing in
other movies with actors from outside the group. Compared to the
pattern of early years in Figure 16, one may gain some knowledge
and insight about the trends of the movie industry from Figure 17.
Figure 17: Many actors co-starring one movie.
Figure 15: A frame from the galaxy of stars animation
Figure 18: Same group of people in several movie.
Figure 16: Actor collaboration pattern in early years.
Further insights can be discovered when combining company attributes in visualisation, Figures 19 to 22 show. There are two clusters in 1985. To assist with analysis, we display the movie nodes
with their labels. The two clusters are normal movies and adult
movies.
Figures 19 to 22 show some patterns in the evolution: before the
1990s, these two types of movies were clearly separated, meaning
that they were produced by different companies with different actors. That is, two groups seldom collaborated. However, these two
groups started to merge into one big group. The actors started to
move around between different companies for collaboration. For
example, see the year 1994. It is difficult to separate these two
groups in the picture. This may be an indication of the possible
change in the movie industry, as well as to the social network of actors. This visualisation can be a useful supplement to formal analysis methods.
6 C ONCLUSION
Integration of good analysis methods with proper visualisation
methods is an effective approach to gain an insight into large and
complex networks. Our next step is to further integrate various
analysis methods with visualisation on different data sets. A formal evaluation on the insights and knowledge derived then needs to
be carried out.
Ultimately, appropriate interaction methods need to be integrated
in order to complete our visual analysis framework for large and
complex networks.
R EFERENCES
[1] A. Ahmed, T. Dwyer, M. Forster, X. Fu, J. Ho, S. Hong, D.
Koschützki, C. Murray, N. Nikolov, A. Tarassov, R. Taib and K. Xu,
GEOMI: GEometry for Maximum Insight, Proc. of Graph Drawing
2006, pp. 468-479, 2006.
[2] A. Ahmed, T. Dwyer, S. Hong, C. Murray, L. Song and Y. Wu, Visualisation and Analysis of Large and Complex Scale-free Networks,
Proc. of EuroVis 2005, pp. 18, 2005.
[3] D. Auber, Y. Chiricota, F. Jourdan and G. Melanon, Multiscale Visualization of Small World Networks, Proc. of InfoVis, pp. 75-81, 2003.
[4] V. Batagelj, Analysis of large networks - Islands, Dagstuhl seminar
03361: Algorithmic Aspects of Large and Complex Networks, 2003.
[5] U. Brandes and T. Erlebach, Network Analysis: methodological foundations, Springer, 2005.
[6] U. Brandes, M. Hoefer and C. Pich, Affiliation Dynamics with an Application to Movie-Actor Biographies, Proc. of EuroVis 2006, pp. 179186, 2006.
[7] Graph Drawing 2005 Competition, http://gd2005.org/
[8] Pajek, http://vlado.fmf.uni-lj.si/pub/networks/pajek/
[9] Sunbelt XXVI 2006 Viszard Sesseion.
[10] S. Wasserman and K. Faust, Social Network Analysis: Methods and
Applications, Cambridge University Press, 1994.
Figure 19: Layout of 1985
Figure 21: Layout of 1991
Figure 20: Layout of 1988
Figure 22: Layout of 1994