Correction - Données en tables : Pandas

1. Créer un dataframe à partir du fichier de données :


import pandas

df = pandas.read_csv('movies.csv')
print(df)

			
     	color      director_name  ...  aspect_ratio  movie_facebook_likes
0     Color      James Cameron  ...          1.78                 33000
1     Color     Gore Verbinski  ...          2.35                     0
2     Color         Sam Mendes  ...          2.35                 85000
3     Color  Christopher Nolan  ...          2.35                164000
4       NaN        Doug Walker  ...           NaN                     0
...     ...                ...  ...           ...                   ...
5038  Color        Scott Smith  ...           NaN                    84
5039  Color                NaN  ...         16.00                 32000
5040  Color   Benjamin Roberds  ...           NaN                    16
5041  Color        Daniel Hsia  ...          2.35                   660
5042  Color           Jon Gunn  ...          1.85                   456

[5043 rows x 28 columns]

2. Former un nouveau tableau appelé movies_df2 en ne gardant que les colonnes suivantes, et les renommer en français, puis afficher le nouveau tableau :


movies_df2 = df[['movie_title', 'director_name', 'duration', 'title_year', 'imdb_score', 'budget', 'gross']]
movies_df2 = movies_df2.rename(columns = {"movie_title": "Titre", "director_name": "Réalisateur", "duration": "Durée en min", "title_year": "Année", "imdb_score": "Score IMDb", "budget": "Budget", "gross": "Recette"})
print(movies_df2)

		
                                                  Titre  ...      Recette
0                                               Avatar   ...  760505847.0
1             Pirates of the Caribbean: At World's End   ...  309404152.0
2                                              Spectre   ...  200074175.0
3                                The Dark Knight Rises   ...  448130642.0
4     Star Wars: Episode VII - The Force Awakens    ...  ...          NaN
...                                                 ...  ...          ...
5038                           Signed Sealed Delivered   ...          NaN
5039                         The Following               ...          NaN
5040                              A Plague So Pleasant   ...          NaN
5041                                  Shanghai Calling   ...      10443.0
5042                                 My Date with Drew   ...      85222.0

[5043 rows x 7 columns]

3. Enlever les lignes présentant des valeurs manquantes :


movies_df2 = movies_df2.dropna()	
print(movies_df2)

		
                                          Titre  ...      Recette
0                                       Avatar   ...  760505847.0
1     Pirates of the Caribbean: At World's End   ...  309404152.0
2                                      Spectre   ...  200074175.0
3                        The Dark Knight Rises   ...  448130642.0
5                                  John Carter   ...   73058679.0
...                                         ...  ...          ...
5033                                    Primer   ...     424760.0
5034                                    Cavite   ...      70071.0
5035                               El Mariachi   ...    2040920.0
5037                                 Newlyweds   ...       4584.0
5042                         My Date with Drew   ...      85222.0

[3890 rows x 7 columns]

4. Ne garder que les films sortis à partir de l'année 2000 :


condition = (movies_df2['Année']>=2000)
movies_df2 = movies_df2[condition]
print(movies_df2)

		
                                          Titre  ...      Recette
0                                       Avatar   ...  760505847.0
1     Pirates of the Caribbean: At World's End   ...  309404152.0
2                                      Spectre   ...  200074175.0
3                        The Dark Knight Rises   ...  448130642.0
5                                  John Carter   ...   73058679.0
...                                         ...  ...          ...
5027                                The Circle   ...     673780.0
5033                                    Primer   ...     424760.0
5034                                    Cavite   ...      70071.0
5037                                 Newlyweds   ...       4584.0
5042                         My Date with Drew   ...      85222.0

[2838 rows x 7 columns]

5. Afficher cette table en classant du moins bien noté au mieux noté :


print(movies_df2.sort_values(by = "Score IMDb"))

		
                                                  Titre  ...      Recette
2834                    Justin Bieber: Never Say Never   ...   73000942.0
2295                      Superbabies: Baby Geniuses 2   ...    9109322.0
2268                                    Disaster Movie   ...   14174654.0
3505                                 Who's Your Caddy?   ...    5694308.0
3340                                           Glitter   ...    4273372.0
...                                                 ...  ...          ...
4029                                       City of God   ...    7563397.0
270   The Lord of the Rings: The Fellowship of the R...  ...  313837577.0
97                                           Inception   ...  292568851.0
339      The Lord of the Rings: The Return of the King   ...  377019252.0
66                                     The Dark Knight   ...  533316061.0

[2838 rows x 7 columns]

6. Lequel de ces films a levé la plus grosse recette (vous pouvez vous aider d'un tri pour obtenir ce résultat) ?


print(movies_df2.sort_values(by = "Recette"))

# Avatar a levé la plus grosse recette.

		
                           Titre  ...      Recette
3330                 Skin Trade   ...        162.0
4607             The Jimmy Show   ...        703.0
4606        In Her Line of Fire   ...        721.0
4915  The Trials of Darryl Hunt   ...       1111.0
4758      Detention of the Dead   ...       1332.0
...                          ...  ...          ...
66              The Dark Knight   ...  533316061.0
17                 The Avengers   ...  623279547.0
794                The Avengers   ...  623279547.0
29               Jurassic World   ...  652177271.0
0                        Avatar   ...  760505847.0

[2838 rows x 7 columns]

7. Créer une nouvelle colonne qui va mesurer la rentabilité (Recette - Budget) de chaque film et l'ajouter au tableau (ignorer les éventuels avertissements) :


rentab = (movies_df2['Recette'] - movies_df2['Budget'])
movies_df2['Rentabilité'] = rentab
print(movies_df2)

		
                                          Titre  ...  Rentabilité
0                                       Avatar   ...  523505847.0
1     Pirates of the Caribbean: At World's End   ...    9404152.0
2                                      Spectre   ...  -44925825.0
3                        The Dark Knight Rises   ...  198130642.0
5                                  John Carter   ... -190641321.0
...                                         ...  ...          ...
5027                                The Circle   ...     663780.0
5033                                    Primer   ...     417760.0
5034                                    Cavite   ...      63071.0
5037                                 Newlyweds   ...      -4416.0
5042                         My Date with Drew   ...      84122.0

[2838 rows x 8 columns]

8. Trier le tableau pour afficher les films du moins rentable au plus rentable :


print(movies_df2.sort_values(by = "Rentabilité"))

		
                        Titre        Réalisateur  ...      Recette   Rentabilité
2988                The Host        Joon-ho Bong  ...    2201412.0 -1.221330e+10
3859          Lady Vengeance      Chan-wook Park  ...     211667.0 -4.199788e+09
3005                Fateless        Lajos Koltai  ...     195888.0 -2.499804e+09
2334                Steamboy     Katsuhiro Ôtomo  ...     410388.0 -2.127110e+09
3075  Kabhi Alvida Naa Kehna         Karan Johar  ...    3275443.0 -6.967246e+08
...                       ...                ...  ...          ...           ...
66           The Dark Knight   Christopher Nolan  ...  533316061.0  3.483161e+08
17              The Avengers         Joss Whedon  ...  623279547.0  4.032795e+08
794             The Avengers         Joss Whedon  ...  623279547.0  4.032795e+08
29            Jurassic World     Colin Trevorrow  ...  652177271.0  5.021773e+08
0                     Avatar       James Cameron  ...  760505847.0  5.235058e+08

[2838 rows x 8 columns]

9. Quels sont les films dont le score IMDb est supérieur à 8 et qui n'ont pourtant pas été rentables ?


condition1 = (movies_df2['Score IMDb'] > 8)
condition2 = (movies_df2['Rentabilité'] < 0)
print(movies_df2[condition1 & condition2])

		
                                    Titre  ... Rentabilité
1298                              Amélie   ... -43798339.0
1329            Baahubali: The Beginning   ... -11528148.0
1373                                Rush   ... -11096291.0
1448                         The Pianist   ...  -2480678.0
1978                             Warrior   ... -11348338.0
2047                Howl's Moving Castle   ... -19289545.0
2373                       Spirited Away   ...  -8950114.0
2829                            Downfall   ...  -7998060.0
2830                      The Sea Inside   ...  -7913655.0
2914  Tae Guk Gi: The Brotherhood of War   ... -11689814.0
3553                         Elite Squad   ...  -3991940.0
3849                 Requiem for a Dream   ...   -890722.0
3854                        Donnie Darko   ...  -3772117.0
3931                             Samsara   ...  -1398153.0
4033                            The Hunt   ...  -3189032.0
4105                              Oldboy   ...   -818710.0
4286                     No End in Sight   ...   -569815.0
4289           In the Shadow of the Moon   ...   -865951.0
4585                  The Act of Killing   ...   -515779.0
4748                The Other Dream Team   ...   -366222.0

[20 rows x 8 columns]