1.Pandas 簡介

我們做數(shù)據(jù)可視化，其實就是就數(shù)據(jù)進行分析，使用Python做數(shù)據(jù)分析的，我想pandas必然是一個利器，一個非常強大的數(shù)據(jù)分析工具包，也集成了數(shù)據(jù)可視化的功能，一個集數(shù)據(jù)處理、分析、可視化于一身的工具，非常強大好用。pandas中的數(shù)據(jù)可視化已經(jīng)可以滿足我們大部分的要求了，也就省下了我們很多自己使用如 matplotlib 來數(shù)據(jù)可視化的工作。

通常使用 pandas 進行下列的圖形的快速繪圖：

‘line’
‘bar’ or ‘barh’ for bar plots
‘hist’ for histogram
‘box’ for boxplot
‘a(chǎn)rea’ for area plots
‘scatter’ for scatter plots
‘pie’ for pie plots

對于本身熟悉 Matplotlib 的，就馬上使用了，要不是不熟悉，也可通過下面的介紹快速上手。

2.數(shù)據(jù)集

在對數(shù)據(jù)進行可視化之前，我們先看一下數(shù)據(jù)集。

3.可視化

3.1 畫線

使用如下代碼，繪畫直線。

df.loc['Algeria'].plot(kind='line', label='Algeria') #取出 Algeria 這一行的數(shù)據(jù)plt.legend(loc='upper left')

一行代碼，就對 index 為 ‘Algeria’ 數(shù)據(jù)畫了線，這里我排除掉了 Total 這一列。

好像也不怎么樣？看下面的代碼及結(jié)果先：

df.T[['Albania', 'Algeria', 'Argentina']].plot(kind='line')1

很輕易的，我們就畫了對應(yīng)三條線，并且圖例說明已經(jīng)自動生成，已經(jīng)看出它比 matplotlib 的方便的地方。這里有一個問題需要注意的是，我對數(shù)據(jù)集進行了轉(zhuǎn)置，也就是行列互換。新的數(shù)據(jù)集如下：

在對具有多行的數(shù)據(jù)進行繪圖時，pandas會將index作為X軸，對應(yīng)列的數(shù)據(jù)作為Y軸，而對應(yīng)列的Column則作為Line。意思就是給每個列畫一條線。所以我們需要作出轉(zhuǎn)置的操作。

3.2 直方圖

觀察下面代碼：

count, bin_edges = np.histogram(df_can['2013']) #將數(shù)據(jù)分成等間距的10個區(qū)間，count表示對應(yīng)區(qū)間內(nèi)有多個數(shù)據(jù)，bin_edges則是劃分的區(qū)間，結(jié)果如下圖

df_can['2013'].plot(kind='hist', figsize=(8, 5), xticks=bin_edges)plt.title('Histogram of Immigration from 195 countries in 2013') # add a title to the histogramplt.ylabel('Number of Countries') # add y-labelplt.xlabel('Number of Immigrants') # add x-labelplt.show()1
2
3
4
5
6
7

pandas 中繪畫直方圖，也是默認分成10個區(qū)間，跟np.histogram的劃分一致，所以我們并不需要傳入什么數(shù)據(jù)，只需確定繪畫直方圖，對于 xticks 可傳可不傳，若是沒有傳入xticks，則繪出的直方圖的 X軸的區(qū)間則不那么直觀。

像繪畫條一樣，我們也可以繪出若干個直方圖。同樣我們需要對數(shù)據(jù)進行轉(zhuǎn)置。

df_t = df_can.loc[['Denmark', 'Norway', 'Sweden'], years].transpose()df_t.head()

前5條數(shù)據(jù)如下：

df_t.plot(kind='hist', figsize=(10, 6))plt.title('Histogram of Immigration from Denmark, Norway, and Sweden from 1980 - 2013')plt.ylabel('Number of Years')plt.xlabel('Number of Immigrants')plt.show()1
2
3
4
5
6
7

可以發(fā)現(xiàn)的確是畫出了三類直方圖。可是有些類別的數(shù)據(jù)似乎覆蓋掉了，我們觀察不到，這不是好的例子，所以我們對Plot傳入一些參數(shù)，使得被覆蓋的數(shù)據(jù)還是可視的。

count, bin_edges = np.histogram(df_t, 15)# un-stacked histogramdf_t.plot(kind ='hist', figsize=(10, 6), bins=15, alpha=0.6, xticks=bin_edges, color=['coral', 'darkslateblue', 'mediumseagreen'] )plt.title('Histogram of Immigration from Denmark, Norway, and Sweden from 1980 - 2013')plt.ylabel('Number of Years')plt.xlabel('Number of Immigrants')plt.show()

上面我們修改了區(qū)間的個數(shù)，并且主要的是我們傳入 alpha 透明度的參數(shù)，這就使得被覆蓋的數(shù)據(jù)可視了。

還有下面一種修改

df_t.plot(kind ='hist',           figsize=(10, 6),          bins=15,          stacked=True,          xticks=bin_edges,          color=['coral', 'darkslateblue', 'mediumseagreen']         )1
2
3
4
5
6
7

3.3條形圖

首先看一下我們將要進行可視化的數(shù)據(jù)

df_iceland = df_can.loc['Iceland', years]df_iceland.head()

數(shù)據(jù)是冰島1980-2013的移民數(shù)據(jù)，下面只展示前5條數(shù)據(jù)。

繪畫條線圖很簡答，代碼如下：

# step 2: plot datadf_iceland.plot(kind='bar', figsize=(10, 6))df_iceland.plot(kind='line')plt.xlabel('Year') # add to x-label to the plotplt.ylabel('Number of immigrants') # add y-label to the plotplt.title('Icelandic immigrants to Canada from 1980 to 2013') # add title to the plotplt.show()1
2
3
4
5
6
7
8

條形圖有分垂直，以及水平的，上面的就是垂直的。

df_iceland.plot(kind='barh', figsize=(10, 6))

只需要將 kind = ‘bar’ 換成 kind = ‘barh’ 就可以了。

3.4 餅圖

首先觀察一下將要來繪畫餅圖的數(shù)據(jù);

通過一下代碼繪畫餅圖：

# autopct create %, start angle represent starting pointdf_continents['Total'].plot(kind='pie',                            figsize=(5, 6),                            autopct='%1.f%%', # add in percentages                            startangle=90,     # start angle 90° (Africa)                            shadow=True,       # add shadow                                  )plt.title('Immigration to Canada by Continent [1980 - 2013]')plt.axis('equal') # Sets the pie chart to look like a circle.plt.show()1
2
3
4
5
6
7
8
9
10
11
12

可以發(fā)現(xiàn)一些label重疊的情況，為解決這種情況我們需要再傳入一些參數(shù)：

colors_list = ['gold', 'yellowgreen', 'lightcoral', 'lightskyblue', 'lightgreen', 'pink']explode_list = [0.1, 0, 0, 0, 0.1, 0.1] # ratio for each continent with which to offset each wedge.df_continents['Total'].plot(kind='pie', figsize=(15, 6), autopct='%1.1f%%', startangle=90, shadow=True, labels=None, # turn off labels on pie chart pctdistance=1.12, # the ratio between the center of each pie slice and the start of the text generated by autopct colors=colors_list, # add custom colors explode=explode_list # 'explode' lowest 3 continents )# scale the title up by 12% to match pctdistanceplt.title('Immigration to Canada by Continent [1980 - 2013]', y=1.12) plt.axis('equal') # add legendplt.legend(labels=df_continents.index, loc='upper left') plt.show()

3.5 Area

使用的數(shù)據(jù)如下：

fig, (ax1, ax2) = plt.subplots(2)df_CI.plot(kind='area', stacked=False, ax=ax1)df_CI.plot(kind='area', ax=ax2)1
2
3

ax1 中就像是畫 India 和 China 的線，然后進行填充，ax2中 stacked=True，數(shù)據(jù)的值就會疊加，疊加的方向是 DataFrame 從左向右。

3.6 Box 箱型圖

數(shù)據(jù)概覽如下：

df_CI.plot(kind='box')

3.7 Scatter 散點圖

部分數(shù)據(jù)展示：

df_tot.plot(kind='scatter', x='year', y='total', figsize=(10, 6), color='darkblue')plt.title('Total Immigration to Canada from 1980 - 2013')plt.xlabel('Year')plt.ylabel('Number of Immigrants')plt.show()1
2
3
4
5
6
7

不同其他plot，散點圖需要傳入 x、y 的名，然后會自動在 df_tot 中尋找，跟 pyplot.plot() 中的類似。

本站僅提供存儲服務(wù)，所有內(nèi)容均由用戶發(fā)布，如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容，請點擊舉報。

中文字幕理论片,69视频免费在线观看,亚洲成人app,国产1级毛片,刘涛最大尺度戏视频,欧美亚洲美女视频,2021韩国美女仙女屋vip视频