VISUALIZING

MOVIE SCRIPTS

A long time ago, in a galaxy far, far away data analysts were talking about the upcoming new Star Wars movie. One of them has never seen any eposide before, so they decided to make the movie more accessible to this poor fellow.

Relationships matter

The SW universe is full of strange-looking characters. They are talking and beeping a lot, and sometimes it is hard to track down who talked to whom. So we took the movie scripts and started coding hard.

The conversation graphs represent relationships (edges) between characters (nodes) who are talking in the same scene. The scripts were split into scenes and the names of all character were listed who are speaking in the same scene. Characters who were mentioned more than three times were selected and their conversations were visualized in form of force-directed graphs. The thickness of an edge indicates how frequently that two characters speak in the same scene. The size of a node means how frequently that character talks in all of the scenes. Except for R2-D2, who doesn’t talk but makes noises like beeping, squealing, howling, shrieking, screaming or chirping, and except for Chewbacca who among other things barks, growls, howls and grunts.

Conversation graph of the Original Trilogy

a

Click on a node to fade out all but its immediate neighbours.

Click to bring them back again.

Mouse-over on a node to give more information about that character's conversations.

Conversation graph of the Prequel Trilogy

a

Click on a node to fade out all but its immediate neighbours.

Click to bring them back again.

Mouse-over on a node to give more information about that character's conversations.

Conversation graph of the film The Force Awakens

a

Click on a node to fade out all but its immediate neighbours.

Click to bring them back again.

Mouse-over on a node to give more information about that character's conversations.

Mood

Mood, or if you like fancy names, sentiment, also matters a lot. We analyzed the scripts using sentiment lexicons (one for positive and one for negative words) and assigned a mood score to each and every scene in the movies. Characters who appeared more than 50 times were selected to show where and how frequently the names of these characters are mentioned in the scripts. The seven lines represent the seven Star Wars episodes. The height of the bars show how frequently that character is mentioned in that particular scene. The colour of the squares indicate the general sentiment of the text for each scene.

Words with negative associations (such as “darkness”, “menace” and “death”) make scenes more gray and words with positive associations (such as “love”, “proud” and “hopeful”) make them more blue.

Character mentions and mood for each scene

a

Click on a character name to see the frequency of its mentions and the mood of the scenes associated with the name.

Tweet

Created by Precognox

Contact us to see what we can do for you!

Kitti Balogh

data analyst

twitter

Krisztina Szucs

data viz designer

twitter

Viktoria Verecze

web designer

twitter

Zoltan Varju

computational linguist

twitter

Team members do NOT disclose any information regarding the person who hasn’t seen the movies yet, except that he is a really nice guy.

Source of the Star Wars movie scripts: http://www.imsdb.com

Source of the sentiment lexicons: https://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html#lexicon

Source of the Star Wars logo: via Wikimedia Commons