杨幂视频bt种子,景甜跳舞视频,鞠婧祎视频

一、算法原理

線性回歸是機器學(xué)習(xí)中最簡單的回歸算法，多元線性回歸指的就是一個樣本有多個特征的線性回歸問題。對于一個n有 i 個特征的樣本而言，它的回歸結(jié)果可以寫作一個幾乎人人熟悉的方程：

在多元線性回歸中，我們的損失函數(shù)（L2范式，又稱RSS殘差平方和）如下定義：

通過最小化真實值和預(yù)測值之間的RSS來求解參數(shù)的方法叫做最小二乘法。
最小二乘法的過程非常簡單，求解極值的第一步往往是求解一階導(dǎo)數(shù)并讓一階導(dǎo)數(shù)等于0，最小二乘法也不能免俗。因此，我們現(xiàn)在殘差平方和RSS上對參數(shù)向量w求導(dǎo)：

對于簡單的函數(shù)，取函數(shù)的梯度，令其等于0，求出所有的解(也稱為臨界點)，然后從中選擇使函數(shù)值最小的那一個，就是全局最小值。但是對于變量非常多，網(wǎng)絡(luò)非常復(fù)雜的函數(shù)來說，求數(shù)值解難度非常大，幾乎不可能。我們可以通過迭代的方法來解決，從一個隨機的解開始，在每次迭代中嘗試改進一點。經(jīng)過大量的迭代，將得到一個相當(dāng)好的解決方案。

其中一種迭代方法是梯度下降法。你可能知道，一個函數(shù)的梯度給出了最陡的上升方向，如果我們?nèi)√荻鹊呢?fù)值，它會給我們最陡下降的方向，也就是我們可以在這個方向上最快地達到最小值。因此，在每一次迭代（也可以將其稱作一次訓(xùn)練輪次）時，我們計算損失函數(shù)對各個系數(shù)的梯度，并從舊參數(shù)中減去它（乘以一個稱為學(xué)習(xí)率的因子）以得到新參數(shù)。

此外在標(biāo)準(zhǔn)梯度下降法中，梯度是將整個數(shù)據(jù)集考慮進來并進行計算的。通常這是不可取的，因為該計算可能是昂貴的。在實踐中，數(shù)據(jù)集被隨機分成多個塊，這些塊被稱為批。對每個批進行更新。這種方法就叫做隨機梯度下降。

二、Numpy手寫代碼

import numpy as npimport pandas as pd

初始化dim維的模型參數(shù)w和b為0。

### 初始化模型參數(shù)def initialize_params(dims):    '''    輸入：    dims：訓(xùn)練數(shù)據(jù)變量維度    輸出：    w：初始化權(quán)重參數(shù)值    b：初始化偏差參數(shù)值    '''    # 初始化權(quán)重參數(shù)為零矩陣    w = np.zeros((dims, 1))    # 初始化偏差參數(shù)為零    b = 0    return w, b

給定輸入變量矩陣X，標(biāo)簽向量y以及當(dāng)前系數(shù)w,b，計算損失loss，標(biāo)簽y的預(yù)測值y_hat，以及w,b的梯度。其中損失函數(shù)對w,b求解梯度使用的是鏈?zhǔn)椒▌t。

### 定義模型主體部分### 包括線性回歸公式、均方損失和參數(shù)偏導(dǎo)三部分def linear_loss(X, y, w, b): ''' 輸入: X：輸入變量矩陣 y：輸出標(biāo)簽向量 w：變量參數(shù)權(quán)重矩陣 b：偏差項 輸出： y_hat：線性模型預(yù)測輸出 loss：均方損失值 dw：權(quán)重參數(shù)一階偏導(dǎo) db：偏差項一階偏導(dǎo) ''' # 訓(xùn)練樣本數(shù)量 num_train = X.shape[0] # 訓(xùn)練特征數(shù)量 num_feature = X.shape[1] # 線性回歸預(yù)測輸出 y_hat = np.dot(X, w) + b # 計算預(yù)測輸出與實際標(biāo)簽之間的均方損失 loss = np.sum((y_hat-y)**2)/num_train # 基于均方損失對權(quán)重參數(shù)的一階偏導(dǎo)數(shù) dw = np.dot(X.T, (y_hat-y)) /num_train # 基于均方損失對偏差項的一階偏導(dǎo)數(shù) db = np.sum((y_hat-y)) /num_train return y_hat, loss, dw, db

定義線性回歸模型訓(xùn)練過程，定義訓(xùn)練學(xué)習(xí)率，訓(xùn)練批次，每次沿著梯度方向更新w，b。

### 定義線性回歸模型訓(xùn)練過程def linear_train(X, y, learning_rate=0.01, epochs=10000):    '''    輸入：    X：輸入變量矩陣    y：輸出標(biāo)簽向量    learning_rate：學(xué)習(xí)率    epochs：訓(xùn)練迭代次數(shù)    輸出：    loss_his：每次迭代的均方損失    params：優(yōu)化后的參數(shù)字典    grads：優(yōu)化后的參數(shù)梯度字典    '''    # 記錄訓(xùn)練損失的空列表    loss_his = []    # 初始化模型參數(shù)    w, b = initialize_params(X.shape[1])    # 迭代訓(xùn)練    for i in range(1, epochs):        # 計算當(dāng)前迭代的預(yù)測值、損失和梯度        y_hat, loss, dw, db = linear_loss(X, y, w, b)        # 基于梯度下降的參數(shù)更新        w += -learning_rate * dw        b += -learning_rate * db        # 記錄當(dāng)前迭代的損失        loss_his.append(loss)        # 每1000次迭代打印當(dāng)前損失信息        if i % 10000 == 0:            print('epoch %d loss %f' % (i, loss))        # 將當(dāng)前迭代步優(yōu)化后的參數(shù)保存到字典        params = {            'w': w,            'b': b        }        # 將當(dāng)前迭代步的梯度保存到字典        grads = {            'dw': dw,            'db': db        }         return loss_his, params, grads

導(dǎo)入數(shù)據(jù)集

from sklearn.datasets import load_diabetesdiabetes = load_diabetes()data = diabetes.datatarget = diabetes.target print(data.shape)print(target.shape)print(data[:5])print(target[:5])

數(shù)據(jù)預(yù)處理，將數(shù)據(jù)打亂后劃分成訓(xùn)練集和測試集。

# 導(dǎo)入sklearn diabetes數(shù)據(jù)接口from sklearn.datasets import load_diabetes# 導(dǎo)入sklearn打亂數(shù)據(jù)函數(shù)from sklearn.utils import shuffle# 獲取diabetes數(shù)據(jù)集diabetes = load_diabetes()# 獲取輸入和標(biāo)簽data, target = diabetes.data, diabetes.target # 打亂數(shù)據(jù)集X, y = shuffle(data, target, random_state=13)# 按照8/2劃分訓(xùn)練集和測試集offset = int(X.shape[0] * 0.8)# 訓(xùn)練集X_train, y_train = X[:offset], y[:offset]# 測試集X_test, y_test = X[offset:], y[offset:]# 將訓(xùn)練集改為列向量的形式y_train = y_train.reshape((-1,1))# 將驗證集改為列向量的形式y_test = y_test.reshape((-1,1))# 打印訓(xùn)練集和測試集維度print('X_train's shape: ', X_train.shape)print('X_test's shape: ', X_test.shape)print('y_train's shape: ', y_train.shape)print('y_test's shape: ', y_test.shape)

模型訓(xùn)練

# 線性回歸模型訓(xùn)練loss_his, params, grads = linear_train(X_train, y_train, 0.01, 200000)# 打印訓(xùn)練后得到模型參數(shù)print(params)

模型預(yù)測。

### 定義線性回歸預(yù)測函數(shù)def predict(X, params):    '''    輸入：    X：測試數(shù)據(jù)集    params：模型訓(xùn)練參數(shù)    輸出：    y_pred：模型預(yù)測結(jié)果    '''    # 獲取模型參數(shù)    w = params['w']    b = params['b']    # 預(yù)測    y_pred = np.dot(X, w) + b    return y_pred# 基于測試集的預(yù)測y_pred = predict(X_test, params)# 打印前五個預(yù)測值y_pred[:5]

對于回歸類算法而言，只探索數(shù)據(jù)預(yù)測是否準(zhǔn)確是不足夠的。除了數(shù)據(jù)本身的數(shù)值大小之外，我們還希望我們的模型能夠捕捉到數(shù)據(jù)的”規(guī)律“，比如數(shù)據(jù)的分布規(guī)律，單調(diào)性等等，而是否捕獲了這些信息并無法使用MSE來衡量。

我們希望找到新的指標(biāo)，除了判斷預(yù)測的數(shù)值是否正確之外，還能夠判斷我們的模型是否擬合了足夠多的，數(shù)值之外的信息。在我們學(xué)習(xí)降維算法PCA的時候，我們提到我們使用方差來衡量數(shù)據(jù)上的信息量。如果方差越大，代表數(shù)據(jù)上的信息量越多，而這個信息量不僅包括了數(shù)值的大小，還包括了我們希望模型捕捉的那些規(guī)律。為了衡量模型對數(shù)據(jù)上的信息量的捕捉，我們定義了R**2和可解釋性方差分?jǐn)?shù)(explained_variance_score，EVS）來幫助我們：

其中y是我們的真實標(biāo)簽，y^是我們的預(yù)測結(jié)果，y- 是我們的均值，如果除以樣本量m就是我們的方差。方差的本質(zhì)是任意一個y值和樣本均值的差異，差異越大，這些值所帶的信息越多。在中，分子是真實值和預(yù)測值之差的差值，也就是我們的模型沒有捕獲到的信息總量，分母是真實標(biāo)簽所帶的信息量，所以其衡量的是1 - 我們的模型沒有捕獲到的信息量占真實標(biāo)簽中所帶的信息量的比例，所以，越接近1越好。

可以使用三種方式來調(diào)用，一種是直接從metrics中導(dǎo)入r2_score，輸入預(yù)測值和真實值后打分。第二種是直接從線性回歸LinearRegression的接口score來進行調(diào)用。第三種是在交叉驗證中，輸入'r2'來調(diào)用。

在此自定義：

### 定義R2系數(shù)函數(shù)def r2_score(y_test, y_pred): ''' 輸入： y_test：測試集標(biāo)簽值 y_pred：測試集預(yù)測值 輸出： r2：R2系數(shù) ''' # 測試標(biāo)簽均值 y_avg = np.mean(y_test) # 總離差平方和 ss_tot = np.sum((y_test - y_avg)**2) # 殘差平方和 ss_res = np.sum((y_test - y_pred)**2) # R2計算 r2 = 1 - (ss_res/ss_tot) return r2

預(yù)測效果：

import matplotlib.pyplot as pltf = X_test.dot(params['w']) + params['b']
plt.scatter(range(X_test.shape[0]), y_test)plt.plot(f, color = 'darkorange')plt.xlabel('X_test')plt.ylabel('y_test')plt.show();

plt.plot(loss_his, color = 'blue')plt.xlabel('epochs')plt.ylabel('loss')plt.show()

當(dāng)訓(xùn)練數(shù)據(jù)比較龐大時，我們通常會使用K折交叉驗證的方法獲取局部訓(xùn)練數(shù)據(jù)多次進行訓(xùn)練。

from sklearn.utils import shuffleX, y = shuffle(data, target, random_state=13)X = X.astype(np.float32)data = np.concatenate((X, y.reshape((-1,1))), axis=1)

from random import shuffle

def k_fold_cross_validation(items, k, randomize=True): if randomize: items = list(items) shuffle(items)

slices = [items[i::k] for i in range(k)]

for i in range(k): validation = slices[i] training = [item for s in slices if s is not validation for item in s] training = np.array(training) validation = np.array(validation) yield training, validation

for training, validation in k_fold_cross_validation(data, 5): X_train = training[:, :10] y_train = training[:, -1].reshape((-1,1)) X_valid = validation[:, :10] y_valid = validation[:, -1].reshape((-1,1)) loss5 = [] #print(X_train.shape, y_train.shape, X_valid.shape, y_valid.shape) loss, params, grads = linar_train(X_train, y_train, 0.001, 100000) loss5.append(loss) score = np.mean(loss5) print('five kold cross validation score is', score) y_pred = predict(X_valid, params) valid_score = np.sum(((y_pred-y_valid)**2))/len(X_valid) print('valid score is', valid_score)

三、scikit-learn集成方法

sklearn提供了完整的線性回歸模塊。

from sklearn.datasets import load_diabetesfrom sklearn.utils import shufflefrom sklearn.model_selection import train_test_split
diabetes = load_diabetes()data = diabetes.datatarget = diabetes.target X, y = shuffle(data, target, random_state=13)X = X.astype(np.float32)y = y.reshape((-1, 1))X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

import matplotlib.pyplot as pltimport numpy as npfrom sklearn import linear_modelfrom sklearn.metrics import mean_squared_error, r2_score

regr = linear_model.LinearRegression()regr.fit(X_train, y_train)

y_pred = regr.predict(X_test)

# The coefficientsprint('Coefficients: \n', regr.coef_)# The mean squared errorprint('Mean squared error: %.2f' % mean_squared_error(y_test, y_pred))# Explained variance score: 1 is perfect predictionprint('Variance score: %.2f' % r2_score(y_test, y_pred))print(r2_score(y_test, y_pred))

# Plot outputsplt.scatter(range(X_test.shape[0]), y_test, color='red')plt.plot(range(X_test.shape[0]), y_pred, color='blue', linewidth=3)

plt.xticks(())plt.yticks(())

plt.show();

交叉驗證方法：

import numpy as np import pandas as pd from sklearn.utils import shufflefrom sklearn.model_selection import KFoldfrom sklearn.linear_model import LinearRegression
### K折交叉驗證def cross_validate(model, x, y, folds=5, repeats=5):        ypred = np.zeros((len(y),repeats))    score = np.zeros(repeats)    for r in range(repeats):        i=0        print('Cross Validating - Run', str(r + 1), 'out of', str(repeats))        x,y = shuffle(x, y, random_state=r) #shuffle data before each repeat        kf = KFold(n_splits=folds,random_state=i+1000) #random split, different each time        for train_ind, test_ind in kf.split(x):            print('Fold', i+1, 'out of', folds)            xtrain,ytrain = x[train_ind,:],y[train_ind]            xtest,ytest = x[test_ind,:],y[test_ind]            model.fit(xtrain, ytrain)            #print(xtrain.shape, ytrain.shape, xtest.shape, ytest.shape)            ypred[test_ind]=model.predict(xtest)            i+=1        score[r] = R2(ypred[:,r],y)    print('\nOverall R2:',str(score))    print('Mean:',str(np.mean(score)))    print('Deviation:',str(np.std(score)))    pass
cross_validate(regr, X, y, folds=5, repeats=5)

模型分析：

class sklearn.linear_model.LinearRegression (fit_intercept=True, normalize=False, copy_X=True,n_jobs=None)

線性回歸模型參數(shù)中沒有對我們的模型有不可替代作用的參數(shù)，這說明，線性回歸的性能，往往取決于數(shù)據(jù)本身，而并非是我們的調(diào)參能力，線性回歸也因此對數(shù)據(jù)有著很高的要求。

輸出關(guān)鍵屬性：

本站僅提供存儲服務(wù)，所有內(nèi)容均由用戶發(fā)布，如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容，請點擊舉報。

中文字幕理论片,69视频免费在线观看,亚洲成人app,国产1级毛片,刘涛最大尺度戏视频,欧美亚洲美女视频,2021韩国美女仙女屋vip视频