Importing Packages¶
In [1]:
import json
import pandas as pd
Data source¶
Data sample¶
{
"episodes":[
{
"seasonNum":1,
"episodeNum":1,
"episodeTitle":"Winter Is Coming",
"episodeLink":"/title/tt1480055/",
"episodeAirDate":"2011-04-17",
"episodeDescription":"Jon Arryn, the Hand of the King, is dead. King Robert Baratheon plans to ask his oldest friend, Eddard Stark, to take Jon's place. Across the sea, Viserys Targaryen plans to wed his sister to a nomadic warlord in exchange for an army.",
"openingSequenceLocations":[
"King's Landing",
"Winterfell",
"The Wall",
"Pentos"
],
"scenes":[
{
"sceneStart":"0:00:40",
"sceneEnd":"0:01:45",
"location":"The Wall",
"subLocation":"Castle Black",
"characters":[
{"name":"Gared"},
{"name":"Waymar Royce"},
{"name":"Will"}
]
}
]
}
]
}
Reading data¶
In [2]:
f = open('../data/episodes.json')
data = json.load(f)
f.close() #close the file to remove the original file from the memory.
Parsing the JSON File into Tidy Format¶
Tidy data sets have structure and working with them is easy; they’re easy to manipulate, model and visualize. Tidy data sets main concept is to arrange data in a way that each variable is a column and each observation (or case) is a row.
Source: https://www.wikiwand.com/en/Tidy_data
In [3]:
data_list = list()
for episode in data['episodes']:
seasonNum = episode['seasonNum']
episodeNum = episode['episodeNum']
for scene in episode['scenes']:
sceneStart = scene['sceneStart']
sceneEnd = scene['sceneEnd']
for character in scene['characters']:
characterName = character['name']
row = [seasonNum, episodeNum, characterName, sceneStart, sceneEnd]
data_list.append(row)
Saving parsed data into Pandas Dataframe¶
In [4]:
df = pd.DataFrame(columns=['season_num', 'episode_num', 'character_name', 'scene_start_time', 'scene_end_time'],
data=data_list)
Final Table¶
In [5]:
display(df.head())