1. Créer un dataframe à partir du fichier de données :
import pandas
df = pandas.read_csv('movies.csv')
print(df)
color director_name ... aspect_ratio movie_facebook_likes
0 Color James Cameron ... 1.78 33000
1 Color Gore Verbinski ... 2.35 0
2 Color Sam Mendes ... 2.35 85000
3 Color Christopher Nolan ... 2.35 164000
4 NaN Doug Walker ... NaN 0
... ... ... ... ... ...
5038 Color Scott Smith ... NaN 84
5039 Color NaN ... 16.00 32000
5040 Color Benjamin Roberds ... NaN 16
5041 Color Daniel Hsia ... 2.35 660
5042 Color Jon Gunn ... 1.85 456
[5043 rows x 28 columns]
2. Former un nouveau tableau appelé movies_df2 en ne gardant que les colonnes suivantes, et les renommer en français, puis afficher le nouveau tableau :
movies_df2 = df[['movie_title', 'director_name', 'duration', 'title_year', 'imdb_score', 'budget', 'gross']]
movies_df2 = movies_df2.rename(columns = {"movie_title": "Titre", "director_name": "Réalisateur", "duration": "Durée en min", "title_year": "Année", "imdb_score": "Score IMDb", "budget": "Budget", "gross": "Recette"})
print(movies_df2)
Titre ... Recette
0 Avatar ... 760505847.0
1 Pirates of the Caribbean: At World's End ... 309404152.0
2 Spectre ... 200074175.0
3 The Dark Knight Rises ... 448130642.0
4 Star Wars: Episode VII - The Force Awakens ... ... NaN
... ... ... ...
5038 Signed Sealed Delivered ... NaN
5039 The Following ... NaN
5040 A Plague So Pleasant ... NaN
5041 Shanghai Calling ... 10443.0
5042 My Date with Drew ... 85222.0
[5043 rows x 7 columns]
3. Enlever les lignes présentant des valeurs manquantes :
movies_df2 = movies_df2.dropna()
print(movies_df2)
Titre ... Recette
0 Avatar ... 760505847.0
1 Pirates of the Caribbean: At World's End ... 309404152.0
2 Spectre ... 200074175.0
3 The Dark Knight Rises ... 448130642.0
5 John Carter ... 73058679.0
... ... ... ...
5033 Primer ... 424760.0
5034 Cavite ... 70071.0
5035 El Mariachi ... 2040920.0
5037 Newlyweds ... 4584.0
5042 My Date with Drew ... 85222.0
[3890 rows x 7 columns]
4. Ne garder que les films sortis à partir de l'année 2000 :
condition = (movies_df2['Année']>=2000)
movies_df2 = movies_df2[condition]
print(movies_df2)
Titre ... Recette
0 Avatar ... 760505847.0
1 Pirates of the Caribbean: At World's End ... 309404152.0
2 Spectre ... 200074175.0
3 The Dark Knight Rises ... 448130642.0
5 John Carter ... 73058679.0
... ... ... ...
5027 The Circle ... 673780.0
5033 Primer ... 424760.0
5034 Cavite ... 70071.0
5037 Newlyweds ... 4584.0
5042 My Date with Drew ... 85222.0
[2838 rows x 7 columns]
5. Afficher cette table en classant du moins bien noté au mieux noté :
print(movies_df2.sort_values(by = "Score IMDb"))
Titre ... Recette
2834 Justin Bieber: Never Say Never ... 73000942.0
2295 Superbabies: Baby Geniuses 2 ... 9109322.0
2268 Disaster Movie ... 14174654.0
3505 Who's Your Caddy? ... 5694308.0
3340 Glitter ... 4273372.0
... ... ... ...
4029 City of God ... 7563397.0
270 The Lord of the Rings: The Fellowship of the R... ... 313837577.0
97 Inception ... 292568851.0
339 The Lord of the Rings: The Return of the King ... 377019252.0
66 The Dark Knight ... 533316061.0
[2838 rows x 7 columns]
6. Lequel de ces films a levé la plus grosse recette (vous pouvez vous aider d'un tri pour obtenir ce résultat) ?
print(movies_df2.sort_values(by = "Recette"))
# Avatar a levé la plus grosse recette.
Titre ... Recette
3330 Skin Trade ... 162.0
4607 The Jimmy Show ... 703.0
4606 In Her Line of Fire ... 721.0
4915 The Trials of Darryl Hunt ... 1111.0
4758 Detention of the Dead ... 1332.0
... ... ... ...
66 The Dark Knight ... 533316061.0
17 The Avengers ... 623279547.0
794 The Avengers ... 623279547.0
29 Jurassic World ... 652177271.0
0 Avatar ... 760505847.0
[2838 rows x 7 columns]
7. Créer une nouvelle colonne qui va mesurer la rentabilité (Recette - Budget) de chaque film et l'ajouter au tableau (ignorer les éventuels avertissements) :
rentab = (movies_df2['Recette'] - movies_df2['Budget'])
movies_df2['Rentabilité'] = rentab
print(movies_df2)
Titre ... Rentabilité
0 Avatar ... 523505847.0
1 Pirates of the Caribbean: At World's End ... 9404152.0
2 Spectre ... -44925825.0
3 The Dark Knight Rises ... 198130642.0
5 John Carter ... -190641321.0
... ... ... ...
5027 The Circle ... 663780.0
5033 Primer ... 417760.0
5034 Cavite ... 63071.0
5037 Newlyweds ... -4416.0
5042 My Date with Drew ... 84122.0
[2838 rows x 8 columns]
8. Trier le tableau pour afficher les films du moins rentable au plus rentable :
print(movies_df2.sort_values(by = "Rentabilité"))
Titre Réalisateur ... Recette Rentabilité
2988 The Host Joon-ho Bong ... 2201412.0 -1.221330e+10
3859 Lady Vengeance Chan-wook Park ... 211667.0 -4.199788e+09
3005 Fateless Lajos Koltai ... 195888.0 -2.499804e+09
2334 Steamboy Katsuhiro Ôtomo ... 410388.0 -2.127110e+09
3075 Kabhi Alvida Naa Kehna Karan Johar ... 3275443.0 -6.967246e+08
... ... ... ... ... ...
66 The Dark Knight Christopher Nolan ... 533316061.0 3.483161e+08
17 The Avengers Joss Whedon ... 623279547.0 4.032795e+08
794 The Avengers Joss Whedon ... 623279547.0 4.032795e+08
29 Jurassic World Colin Trevorrow ... 652177271.0 5.021773e+08
0 Avatar James Cameron ... 760505847.0 5.235058e+08
[2838 rows x 8 columns]
9. Quels sont les films dont le score IMDb est supérieur à 8 et qui n'ont pourtant pas été rentables ?
condition1 = (movies_df2['Score IMDb'] > 8)
condition2 = (movies_df2['Rentabilité'] < 0)
print(movies_df2[condition1 & condition2])
Titre ... Rentabilité
1298 Amélie ... -43798339.0
1329 Baahubali: The Beginning ... -11528148.0
1373 Rush ... -11096291.0
1448 The Pianist ... -2480678.0
1978 Warrior ... -11348338.0
2047 Howl's Moving Castle ... -19289545.0
2373 Spirited Away ... -8950114.0
2829 Downfall ... -7998060.0
2830 The Sea Inside ... -7913655.0
2914 Tae Guk Gi: The Brotherhood of War ... -11689814.0
3553 Elite Squad ... -3991940.0
3849 Requiem for a Dream ... -890722.0
3854 Donnie Darko ... -3772117.0
3931 Samsara ... -1398153.0
4033 The Hunt ... -3189032.0
4105 Oldboy ... -818710.0
4286 No End in Sight ... -569815.0
4289 In the Shadow of the Moon ... -865951.0
4585 The Act of Killing ... -515779.0
4748 The Other Dream Team ... -366222.0
[20 rows x 8 columns]