中文字幕理论片,69视频免费在线观看,亚洲成人app,国产1级毛片,刘涛最大尺度戏视频,欧美亚洲美女视频,2021韩国美女仙女屋vip视频

打開APP
userphoto
未登錄

開通VIP,暢享免費電子書等14項超值服

開通VIP
python爬彩票大樂透歷史數(shù)據(jù) 預(yù)測測試...

好久沒用python練手爬蟲這次再試下爬大樂透,一般來說爬東西找對網(wǎng)頁很關(guān)鍵,因為數(shù)據(jù)在一些網(wǎng)頁是動態(tài)加載什么很多,而有些網(wǎng)頁直接是以Json格式的,這樣就相當(dāng)好爬了,這次想找個好爬點的網(wǎng)頁找了半天沒找到,算了直接去體彩官網(wǎng)http://www.lottery.gov.cn/historykj/history.jspx?_ltype=dlt爬去,上代碼

from bs4 import BeautifulSoup as bsimport requestsimport os def get_url(): data_1 = [] for i in range(1,91): url = 'http://www.lottery.gov.cn/historykj/history_'+ str(i) +'.jspx?_ltype=dlt' data = requests.get(url).text data = bs(data,'lxml') data = data.find('tbody').find_all('tr') for content in data: number = content.get_text().strip().replace('\r','').replace('\t','').replace('\n',' ') with open('data_recent','a') as f: f.write(number+'\n') f.close()if __name__ == '__main__': get_url()

結(jié)果如圖:


這個我以前嘗試過,但是代碼沒找到了,有時想把數(shù)據(jù)更新更新還是在寫一遍,但是原來用的是urllib,最近發(fā)現(xiàn)requests有時更方便簡單啊,這里直接get就可以了,urllib還得加header 啥,光給個網(wǎng)址還打不開,多說一句 因為我一開始學(xué)的BeautifulSoup用習(xí)慣了,但是覺得lxml更簡單方便,直接開發(fā)者工具里面右鍵copy xpath就可以了,接著就是個人想法了:先畫個圖看看!

import os import pandas as pd import numpy as np data = pd.read_csv(r'C:\Users\Administrator\jupyter\dale1.csv',sep= ' ',header=None,error_bad_lines=False).valuesdata = data[:,2:]import matplotlib as mplfrom mpl_toolkits.mplot3d import Axes3Dimport matplotlib.pyplot as pltfig = plt.figure(figsize=(10,10))ax = fig.gca(projection='3d')a = np.random.randint(0,5,size=100)for i in range(1,8):    z = data[:100,i-1]    y = np.full_like(a,i)    x = range(100)    ax.plot(x, y, z)ax.legend()#ax.set_xlim=[0,8]plt.tight_layout()plt.savefig('img_3d.png')plt.show()            

注意這段代碼用的數(shù)據(jù)是原來的數(shù)據(jù),要把剛爬的數(shù)據(jù)稍微處理一下就和上面一樣了,不多說效果如圖:


3D圖還可以旋轉(zhuǎn),看圖像或代碼也就知道我這里是取得最開始的100期,根據(jù)7個球每個球的波動畫出來的,這里很好辨認,第一個數(shù)字永遠比第二個數(shù)字小,依次內(nèi)推,很有層次感,如果用每一期7個數(shù)畫線,它會上下波動難以辨認,如圖:

圖畫了很多沒看出個啥,接著用統(tǒng)計吧!基于任一期的數(shù)字,統(tǒng)計這一期之前的所有期第幾個球出現(xiàn)這個數(shù)字時是增大還是減小的概率,結(jié)果是這樣(PS:這是以前雙色球的,大樂透我直接整在下面代碼里,當(dāng)然也可以輸出來):

這是統(tǒng)計這一期之前所有期第1-7(行0-6)個球分別是1-36(列0-35)時它會增大的概率,減小或相等與之類似,先對爬下來的數(shù)據(jù)進行處理,將近期的數(shù)據(jù)放最后面,也就是將索引反過來排列用pandas讀了好久讀不出來,仔細一看數(shù)據(jù)不規(guī)范:

還好后面的數(shù)據(jù)沒用那就換個方法吧:

import oswith open (r'C:\Users\Administrator\jupyter\data_recent.csv','r',encoding='utf-8') as f: with open('.\simple_data.csv','a') as file: for line in f: file.write(line[:26]+'\n')f.close()file.close()

現(xiàn)在可以統(tǒng)計了:

import  numpy as npimport pandas as pdimport os data  = pd.read_csv(r'C:\Users\Administrator\jupyter\simple_data.csv',sep=' ',header=None)data=data.sort_index(ascending=False).values#數(shù)據(jù)反過來data = data[:,1:]def fengbu(i):    abb={}    for l in range(7):        for n in range(1,36):            abb[l,n]=[]            for  qiu  in range(i-1):                if  data[qiu][l] ==n:                    a =  data[qiu+1][l] - data[qiu][l]                    abb[l,n].append(a)#一個大字典為{(l,n):a}    dict1={}    dict2={}#每個數(shù)字增大的概率    add1={}#增大的次數(shù)    reduce={}#減小的次數(shù)    da={}    jian={}    da1 =[]    jian1=[]    dict21=[]    for n,l in abb.items():        add1[n]=0        reduce[n]=0        da[n] =0        jian[n]=0        for m in l:            if m > 0 :                add1[n]+=1#統(tǒng)計往期為這個數(shù)字時下次增大次數(shù)            elif m <0:                reduce[n]+=1#減小次數(shù)                        dict2[n] = round(add1[n] / (reduce[n]+ add1[n]+1),4)        #得到前面那張概率圖 減小和它相反        for m in set(l):            if m >0:                dict1[n,m]=(round(l.count(m) / add1[n],4))* m                da[n]+=dict1[n,m]                '''                這是基于首先判斷當(dāng)前期每個數(shù)字增大或減小概率哪個大                數(shù)值大的進一步細化,即將具體增大或減小的值得概率當(dāng)                成權(quán)重再分別與之對應(yīng)值相乘,在全部相加為下一次預(yù)測值                                '''            elif m<0:                dict1[n,m]=(round(l.count(m) / reduce[n],4))* m                jian[n]+=dict1[n,m]            elif m ==0:                dict1[n,m]=0#兩次數(shù)字不變     for n,m,l in zip(da.values(),jian.values(),dict2.values()):        da1.append(n)#原來是字典現(xiàn)在要將其弄成矩陣        jian1.append(m)        dict21.append(l)    da1=np.array(da1).reshape(7,35)    jian1=np.array(jian1).reshape(7,35)    dict21=np.array(dict21).reshape(7,35)    #shuan    return da1,jian1,dict21def predict(i):    for red in range(7):        print(round(data[:,red].mean(),4),round(data[:,red].std(),4))        當(dāng)前均值 方差    da1,jian1,dict21 = fengbu(i)    predict =np.zeros(7)    for l in range(7):        for m  in range(1,34):            if data[i][l]==m:                if dict21[l][m-1]>0.5:                    print(dict21[l][m-1],da1[l][m-1],data[i][l])                    #每期每個數(shù)字增大或減小概率,權(quán)重和,每個數(shù)字值                    predict[l]=data[i][l]+ da1[l][m-1]                elif dict21[l][m-1]<0.5:                    print(dict21[l][m-1],jian1[l][m-1],data[i][l])                    predict[l] =data[i][l]+jian1[l][m-1]    print('第 %d 次,結(jié)果是:%s' % (i,data[i]))    print('所以預(yù)測下一次是:%s' % predict)    print('真正下一次是:%s' % data[i+1])    print('*'*50)if __name__ =='__main__':	predict(1641)


雙色球也一樣,把range(1,36)改為range(1,33),reshape(7,33)改為reshape(7,35)就行,這個還有點意思,最好見過對5個,兩個相差2以內(nèi),但大多數(shù)都。。,畢竟這個是基于統(tǒng)計如果概率大的就對,那概率應(yīng)該趨向于1才對,所以有時個別值過大或者過小,以前從沒有出現(xiàn)過這個數(shù),那將沒有預(yù)測值即為0,有時預(yù)測的兩個值相等??梢詫⒆詈竺娲a改一下只看結(jié)果不要均值方差多來幾組:

嘗試用神經(jīng)網(wǎng)絡(luò)預(yù)測一下會是什么結(jié)果?貼出來看看:

import pandas as pdimport numpy as np import os data = pd.read_csv(r'C:\Users\Administrator\jupyter\dale1.csv',sep=' ',header=None,error_bad_lines=False).valuesdata = data[:,2:]mean = data[:1500].mean(axis=0)std = data[:1500].std(axis=0)data1 = data.copy()data1 -= meandata1 /= stdtrain_data = data1[:1400]train_data= np.expand_dims(train_data,axis=1)val_data = data1[1400:1550]val_data = np.expand_dims(val_data,axis=1)test_data = data1[1550:len(data)-1]test_data = np.expand_dims(test_data,axis=1)red1_labels = data[:,0]red2_labels = data[:,1]red3_labels = data[:,2]red4_labels = data[:,3]red5_labels = data[:,4]blue1_labels = data[:,5]blue2_labels = data[:,6]train_labels_1 = red1_labels[1:1401]train_labels_2 = red2_labels[1:1401]train_labels_3 = red3_labels[1:1401]train_labels_4 = red4_labels[1:1401]train_labels_5 = red5_labels[1:1401]train_labels_6 = blue1_labels[1:1401]train_labels_7 = blue2_labels[1:1401]val_labels_1 = red1_labels[1401:1551]val_labels_2 = red2_labels[1401:1551]val_labels_3 = red3_labels[1401:1551]val_labels_4 = red4_labels[1401:1551]val_labels_5 = red5_labels[1401:1551]val_labels_6 = blue1_labels[1401:1551]val_labels_7 = blue2_labels[1401:1551]test_labels_1 = red1_labels[1551:]test_labels_2 = red2_labels[1551:]test_labels_3 = red3_labels[1551:]test_labels_4 = red4_labels[1551:]test_labels_5 = red5_labels[1551:]test_labels_6 = blue1_labels[1551:]test_labels_7 = blue2_labels[1551:]from keras import layersfrom keras import Modelfrom keras import Inputfrom keras.optimizers import RMSproppost_input = Input(shape=(None,7),name='post_input')lstm = layers.LSTM(150,dropout=0.2,recurrent_dropout=0.2,activation='relu',return_sequences=True)(post_input)lstm1=layers.LSTM(250,dropout=0.2,recurrent_dropout=0.2,activation='relu')(lstm)x= layers.Dense(360,activation='relu')(lstm1)x=layers.Dense(250,activation='relu')(x)x=layers.Dense(250,activation='relu')(x)x= layers.Dense(250,activation='relu')(x)x= layers.Dense(250,activation='relu')(x)x= layers.Dense(250,activation='relu')(x)x= layers.Dense(140,activation='relu')(x)x= layers.Dense(70,activation='relu')(x)#x=layers.Dropout(0.3)(x)red1_predict = layers.Dense(1,name='red1')(x)red2_predict = layers.Dense(1,name='red2')(x)red3_predict = layers.Dense(1,name='red3')(x)red4_predict = layers.Dense(1,name='red4')(x)red5_predict = layers.Dense(1,name='red5')(x)blue1_predict = layers.Dense(1,name='blue1')(x)blue2_predict = layers.Dense(1,name='blue2')(x)model = Model(post_input,[red1_predict,red2_predict,red3_predict,red4_predict,red5_predict,blue1_predict,blue2_predict])model.compile(optimizer = RMSprop(1e-4),loss=['mse','mse','mse','mse','mse','mse','mse'],metrics=['acc','acc','acc','acc','acc','acc','acc'])history= model.fit(train_data,[train_labels_1,train_labels_2,train_labels_3,train_labels_4,train_labels_5,train_labels_6,train_labels_7], batch_size=20,epochs=50,validation_data=(val_data,[val_labels_1,val_labels_2,val_labels_3,val_labels_4,val_labels_5, val_labels_6,val_labels_7]))import matplotlib.pyplot as pltloss = history.history['loss']loss = loss[3:]val_loss = history.history['val_loss']val_loss = val_loss[3:]epochs = range(1,len(loss)+1)plt.figure()plt.plot(epochs, loss, 'b',color='r', label='Training loss')plt.plot(epochs, val_loss, 'b', label='Validation loss')plt.title('Training and validation loss')plt.legend()plt.show()

損失圖像如圖:



果然和想的一樣,根據(jù)損失函數(shù)它只會趨向某一固定值以確保數(shù)值無論如何變換它的損失一直穩(wěn)定減小的,而驗證數(shù)據(jù)會有使其損失很大的時候,所以它的任何預(yù)測結(jié)果也是一直在固定數(shù)值附近波動,(將目標值采用one-hot編碼,結(jié)果也是只是一樣,只不過是另一組固定值)所以感覺原來那個還好點,但會不會存在更好的損失函數(shù)符合這種波動,而不是mse呢。。歡迎留言!也算將python實操一遍吧。

本站僅提供存儲服務(wù),所有內(nèi)容均由用戶發(fā)布,如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容,請點擊舉報。
打開APP,閱讀全文并永久保存 查看更多類似文章
猜你喜歡
類似文章
pandas數(shù)據(jù)分析 | pandas.DataFrame數(shù)據(jù)修改、索引設(shè)置、數(shù)據(jù)組合
天池項目總結(jié),特征工程了解一下!(文末送書)
25000字玩轉(zhuǎn) Pandas, 這一篇給力了!
Pandas入門教程
pandas中dataframe的基本用法匯總
tensorflow學(xué)習(xí)筆記三:實例數(shù)據(jù)下載與讀取
更多類似文章 >>
生活服務(wù)
熱點新聞
分享 收藏 導(dǎo)長圖 關(guān)注 下載文章
綁定賬號成功
后續(xù)可登錄賬號暢享VIP特權(quán)!
如果VIP功能使用有故障,
可點擊這里聯(lián)系客服!

聯(lián)系客服