中文字幕理论片,69视频免费在线观看,亚洲成人app,国产1级毛片,刘涛最大尺度戏视频,欧美亚洲美女视频,2021韩国美女仙女屋vip视频

打開APP
userphoto
未登錄

開通VIP,暢享免費(fèi)電子書等14項(xiàng)超值服

開通VIP
pandas——很全的groupby、agg,對(duì)表格數(shù)據(jù)分組與統(tǒng)計(jì)

這篇groupby寫的不好。太復(fù)雜了。其實(shí)實(shí)際上經(jīng)常用的就那么幾個(gè)。舉個(gè)例子,把常用的往那一放就很容易理解和拿來(lái)用了。日后再寫一篇。

文章目錄


groupby功能:分組
groupby + agg(聚集函數(shù)們): 分組后,對(duì)各組應(yīng)用一些函數(shù),如’sum’,‘mean’,‘max’,‘min’…

groupby默認(rèn)縱方向上分組,axis=0

DataFrame

import pandas as pdimport numpy as np
 df = pd.DataFrame({'key1':['a', 'a', 'b', 'b', 'a'],                    'key2':['one', 'two', 'one', 'two', 'one'],                    'data1':np.random.randn(5),                    'data2':np.random.randn(5)})print(df)
      data1     data2 key1 key20 -0.410122  0.247895    a  one1 -0.627470 -0.989268    a  two2  0.179488 -0.054570    b  one3 -0.299878 -1.640494    b  two4 -0.297191  0.954447    a  one

分組,并對(duì)分組進(jìn)行迭代

list(df.groupby(['key1']))#list后得到:[(group1),(group2),......]
[('a',       data1     data2 key1 key2  0 -0.410122  0.247895    a  one  1 -0.627470 -0.989268    a  two  4 -0.297191  0.954447    a  one), ('b',       data1     data2 key1 key2  2  0.179488 -0.054570    b  one  3 -0.299878 -1.640494    b  two)]

list后得到:[(group1),(group2),…]
每個(gè)數(shù)據(jù)片(group)格式: (name,group)元組

1. 按key1(一個(gè)列)分組,其實(shí)是按key1的值

groupby對(duì)象支持迭代,產(chǎn)生一組二元元組:(分組名,數(shù)據(jù)塊),(分組名,數(shù)據(jù)塊)…

for name,group in df.groupby(['key1']):    print(name)    print(group)
a      data1     data2 key1 key20 -0.410122  0.247895    a  one1 -0.627470 -0.989268    a  two4 -0.297191  0.954447    a  oneb      data1     data2 key1 key22  0.179488 -0.054570    b  one3 -0.299878 -1.640494    b  two

2. 按[key1, key2](多個(gè)列)分組

對(duì)于多重鍵,產(chǎn)生的一組二元元組:((k1,k2),數(shù)據(jù)塊),((k1,k2),數(shù)據(jù)塊)…
第一個(gè)元素是由鍵值組成的元組

for name,group in df.groupby(['key1','key2']):    print(name)  #name=(k1,k2)    print(group)
('a', 'one')      data1     data2 key1 key20 -0.410122  0.247895    a  one4 -0.297191  0.954447    a  one('a', 'two')     data1     data2 key1 key21 -0.62747 -0.989268    a  two('b', 'one')      data1    data2 key1 key22  0.179488 -0.05457    b  one('b', 'two')      data1     data2 key1 key23 -0.299878 -1.640494    b  two

3. 按函數(shù)分組

4. 按字典分組

5. 按索引級(jí)別分組

6.將函數(shù)跟數(shù)組、列表、字典、Series混合使用也不是問(wèn)題,因?yàn)槿魏螙|西最終都會(huì)被轉(zhuǎn)換為數(shù)組


將這些數(shù)據(jù)片段做成字典

dict(list(df.groupby(['key1'])))#dict(list())
{'a':       data1     data2 key1 key2 0 -0.410122  0.247895    a  one 1 -0.627470 -0.989268    a  two 4 -0.297191  0.954447    a  one, 'b':       data1     data2 key1 key2 2  0.179488 -0.054570    b  one 3 -0.299878 -1.640494    b  two}

分組后進(jìn)行一些統(tǒng)計(jì)、計(jì)算等

1. 分組后,返回一個(gè)含有分組大小的Series

按key1分組

df.groupby(['key1']).size()
key1a    3b    2dtype: int64
dict(['a1','x2','e3'])
{'a': '1', 'e': '3', 'x': '2'}

按[key1,key2]分組

df.groupby(['key1','key2']).size()
key1  key2a     one     2      two     1b     one     1      two     1dtype: int64

2. 對(duì)data1按key1進(jìn)行分組,并計(jì)算data1列的平均值

df['data1'].groupby(df['key1']).mean()#groupby沒(méi)有進(jìn)行任何的計(jì)算。它只是進(jìn)行了一個(gè)分組
key1a   -0.444928b   -0.060195Name: data1, dtype: float64
df.groupby(['key1'])['data1'].mean()#理解:對(duì)df按key1分組,并計(jì)算分組后df['data1']的均值#等價(jià)于:df.groupby(['key1']).data1.mean()
key1a   -0.444928b   -0.060195Name: data1, dtype: float64

說(shuō)明:
groupby沒(méi)有進(jìn)行任何的計(jì)算。它只是進(jìn)行了一個(gè)分組。
數(shù)據(jù)(Series)根據(jù)分組鍵進(jìn)行了聚合,產(chǎn)生了一個(gè)新的Series,其索引為key1列中的唯一值。

這種索引操作所返回的對(duì)象是一個(gè)已分組的DataFrame(如果傳入的是列表或數(shù)組)或已分組的Series

df.groupby(['key1'])['data1'].size()
key1a    3b    2Name: data1, dtype: int64

3.對(duì)data1按[key1,key2]進(jìn)行分組,并計(jì)算data1的平均值

df['data1'].groupby([df['key1'],df['key2']]).mean()
key1  key2a     one    -0.353657      two    -0.627470b     one     0.179488      two    -0.299878Name: data1, dtype: float64
df.groupby(['key1','key2'])['data1'].mean()#等價(jià)于:df.groupby(['key1','key2']).data1'.mean()
key1  key2a     one    -0.353657      two    -0.627470b     one     0.179488      two    -0.299878Name: data1, dtype: float64

通過(guò)兩個(gè)鍵對(duì)數(shù)據(jù)進(jìn)行了分組,得到的Series具有一個(gè)層次化索引(由唯一的鍵對(duì)組成):

df.groupby(['key1','key2'])['data1'].mean().unstack()
key2 one two
key1
a -0.353657 -0.627470
b 0.179488 -0.299878

在上面這些示例中,分組鍵均為Series。實(shí)際上,分組鍵可以是任何長(zhǎng)度適當(dāng)?shù)臄?shù)組。非常靈活。


橫方向上

按列的數(shù)據(jù)類型(df.dtypes)來(lái)分

df共兩種數(shù)據(jù)類型:float64和object,所以會(huì)分為兩組(dtype(‘float64’),數(shù)據(jù)片),(dtype(‘O’), 數(shù)據(jù)片)

list(df.groupby(df.dtypes, axis=1))
[(dtype('float64'),       data1     data2  0 -0.410122  0.247895  1 -0.627470 -0.989268  2  0.179488 -0.054570  3 -0.299878 -1.640494  4 -0.297191  0.954447), (dtype('O'),   key1 key2  0    a  one  1    a  two  2    b  one  3    b  two  4    a  one)]

agg的應(yīng)用

groupby+agg 可以對(duì)groupby的結(jié)果同時(shí)應(yīng)用多個(gè)函數(shù)

SeriesGroupBy的方法agg()參數(shù):
aggregate(self, func_or_funcs, * args, ** kwargs)
func: function, string, dictionary, or list of string/functions
返回:aggregated的Series

s= pd.Series([10,20,30,40])s
0    101    202    303    40dtype: int64
for n,g in s.groupby([1,1,2,2]):    print(n)    print(g)
10    101    20dtype: int6422    303    40dtype: int64
s.groupby([1,1,2,2]).min()
1    102    30dtype: int64
#等價(jià)于這個(gè):s.groupby([1,1,2,2]).agg('min')
1    102    30dtype: int64
s.groupby([1,1,2,2]).agg(['min','max'])#加[],func僅接受一個(gè)參數(shù)
min max
1 10 20
2 30 40

常常這樣用:

df
data1 data2 key1 key2
0 -0.410122 0.247895 a one
1 -0.627470 -0.989268 a two
2 0.179488 -0.054570 b one
3 -0.299878 -1.640494 b two
4 -0.297191 0.954447 a one

比較下面,可以看出agg的用處:

df.groupby(['key1'])['data1'].min()
key1a   -0.627470b   -0.299878Name: data1, dtype: float64
df.groupby(['key1'])['data1'].agg({'min'})
min
key1
a -0.627470
b -0.299878
#推薦用這個(gè)√df.groupby(['key1']).agg({'data1':'min'})#對(duì)data1列,取各組的最小值,名字還是data1
data1
key1
a -0.627470
b -0.299878
#按key1分組后,aggregate各組data1的最小值和最大值:df.groupby(['key1'])['data1'].agg({'min','max'})
max min
key1
a -0.297191 -0.627470
b 0.179488 -0.299878
#推薦用這個(gè)√df.groupby(['key1']).agg({'data1':['min','max']})
data1
min max
key1
a -0.627470 -0.297191
b -0.299878 0.179488

可以對(duì)groupby的結(jié)果更正列名(不推薦用這個(gè),哪怕在后面單獨(dú)更改列名)

# 對(duì)data1,把min更名為a,max更名為bdf.groupby(['key1'])['data1'].agg({'a':'min','b':'max'})#這里的'min' 'max'為兩個(gè)函數(shù)名
d:\python27\lib\site-packages\ipykernel_launcher.py:2: FutureWarning: using a dict on a Series for aggregationis deprecated and will be removed in a future version
a b
key1
a -0.627470 -0.297191
b -0.299878 0.179488
本站僅提供存儲(chǔ)服務(wù),所有內(nèi)容均由用戶發(fā)布,如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容,請(qǐng)點(diǎn)擊舉報(bào)。
打開APP,閱讀全文并永久保存 查看更多類似文章
猜你喜歡
類似文章
Pandas分組統(tǒng)計(jì)函數(shù):groupby、pivot
Pandas分組運(yùn)算(groupby)修煉
【python】DataFrame.groupby()聚合,分組級(jí)運(yùn)算
Python學(xué)習(xí)筆記
Python 數(shù)據(jù)分析——Pandas 分組運(yùn)算
10張思維導(dǎo)圖,全面講解 Pandas
更多類似文章 >>
生活服務(wù)
熱點(diǎn)新聞
分享 收藏 導(dǎo)長(zhǎng)圖 關(guān)注 下載文章
綁定賬號(hào)成功
后續(xù)可登錄賬號(hào)暢享VIP特權(quán)!
如果VIP功能使用有故障,
可點(diǎn)擊這里聯(lián)系客服!

聯(lián)系客服