中文字幕理论片,69视频免费在线观看,亚洲成人app,国产1级毛片,刘涛最大尺度戏视频,欧美亚洲美女视频,2021韩国美女仙女屋vip视频

打開APP
userphoto
未登錄

開通VIP,暢享免費(fèi)電子書等14項超值服

開通VIP
原創(chuàng)譯文|從神經(jīng)網(wǎng)絡(luò)說起:深度學(xué)習(xí)初學(xué)者不可不知的25個術(shù)語和概念(下)
點(diǎn)擊上方“燈塔大數(shù)據(jù)”可以訂閱哦
轉(zhuǎn)載聲明

本文為燈塔大數(shù)據(jù)原創(chuàng)內(nèi)容,歡迎個人轉(zhuǎn)載至朋友圈,其他機(jī)構(gòu)轉(zhuǎn)載請在文章開頭標(biāo)注:

人工智能,深度學(xué)習(xí)和機(jī)器學(xué)習(xí),不論你現(xiàn)在是否能夠理解這些概念,你都應(yīng)該學(xué)習(xí)。否則三年內(nèi),你就會像滅絕的恐龍一樣被社會淘汰。

——馬克·庫班(NBA小牛隊老板,億萬富翁)

6) 輸入層/輸出層/隱藏層——顧名思義,輸入層是接收輸入信號的一層,也是該網(wǎng)絡(luò)的第一層;輸出層則是傳遞輸出信號的一層,也是該網(wǎng)絡(luò)的最后一層。

處理層則是該網(wǎng)絡(luò)中的“隱含層”。這些“隱含層”將對輸入信號進(jìn)行特殊處理,并將生成的輸出信號傳遞到下一層。輸入層和輸出層均是可見的,而其中間層則是隱藏起來的。

7) MLP (多層神經(jīng)網(wǎng)絡(luò)) –——MLP(多層神經(jīng)網(wǎng)絡(luò)) – 單一神經(jīng)元無法執(zhí)行高度復(fù)雜的任務(wù)。因此,需要大量神經(jīng)元聚集在一起才能生成我們所需要的輸出信號。

最簡單的網(wǎng)絡(luò)由一個輸入層、一個輸出層、一個隱含層組成,每一層上都有多個神經(jīng)元,并且每一層上的神經(jīng)元都和下一層上的神經(jīng)元連接在了一起,這樣的網(wǎng)絡(luò)也被稱為全互連網(wǎng)絡(luò)(fully connected networks)。

8) 正向傳播(Forward Propagation) –——正向傳播指的是輸入信號通過隱藏層傳遞到輸出層的傳遞過程。

在正向傳播中,信號僅沿單一方向向前正向傳播,輸入層將輸入信號提供給隱藏層,隱藏層生成輸出信號,這一過程中沒有任何反向移動。

9) 成本函數(shù)(Cost Function) –——當(dāng)我們建立一個網(wǎng)絡(luò)后,網(wǎng)絡(luò)將盡可能地使輸出值無限接近于實際值。我們用成本函數(shù)(或損失函數(shù))來衡量該網(wǎng)絡(luò)完成這一過程的準(zhǔn)確性。成本函數(shù)(或損失函數(shù))將在該網(wǎng)絡(luò)出錯時,予以警告。

運(yùn)行網(wǎng)絡(luò)時,我們的目標(biāo)是:盡可能地提高我們的預(yù)測精度、減少誤差,由此最小化成本函數(shù)。最優(yōu)化的輸出即當(dāng)成本函數(shù)(或損失函數(shù))為最小值時的輸出。

若將成本函數(shù)定義為均方誤差,則可寫成:

m在這里是訓(xùn)練輸入值(training inputs),a 是預(yù)計值,y是特定事例中的實際值。

學(xué)習(xí)過程圍繞著如何最小化成本。

10) 梯度下降(Gradient Descent) –——梯度下降是一種優(yōu)化算法,以最小化成本。想象一下,當(dāng)你下山時,你必須一小步一小步往下走,而不是縱身一躍跳到山腳。

因此,我們要做的是:比如,我們從X點(diǎn)開始下降,我們下降一點(diǎn)點(diǎn),下降ΔH,到現(xiàn)在的位置,也就是X-ΔH,重復(fù)這一過程,直到我們到達(dá)“山腳”?!吧侥_”就是最低成本點(diǎn)。

從數(shù)學(xué)的角度來說,要找到函數(shù)的局部極小值,須采取與函數(shù)梯度負(fù)相關(guān)的“步子”,即:梯度下降法是用負(fù)梯度方向為搜索方向的,梯度下降法越接近目標(biāo)值,步長越小,前進(jìn)越慢。

11) 學(xué)習(xí)速率 (Learning Rate) –——學(xué)習(xí)率指每次迭代中對成本函數(shù)的最小化次數(shù)。簡單來說,我們把下降到成本函數(shù)最小值的速率稱為學(xué)習(xí)率。選擇學(xué)習(xí)率時,我們必須非常小心,學(xué)習(xí)率既不應(yīng)過大——會錯過最優(yōu)解,也不應(yīng)過小——使網(wǎng)絡(luò)收斂將需要很多很多步、甚至永不可能。

12) 反向傳播(Back propagation) –——在定義一個神經(jīng)網(wǎng)絡(luò)的過程中, 每個節(jié)點(diǎn)會被隨機(jī)地分配權(quán)重和偏置。

一次迭代后,我們可以根據(jù)產(chǎn)生的結(jié)果計算出整個網(wǎng)絡(luò)的偏差,然后用偏差結(jié)合成本函數(shù)的梯度,對權(quán)重因子進(jìn)行相應(yīng)的調(diào)整,使得下次迭代的過程中偏差變小。

這樣一個結(jié)合成本函數(shù)的梯度來調(diào)整權(quán)重因子的過程就叫做反向傳播。在反向傳播中,信號的傳遞方向是朝后的,誤差連同成本函數(shù)的梯度從輸出層沿著隱藏層傳播,同時伴隨著對權(quán)重因子的調(diào)整。

13) 分批 (Batches) —— 當(dāng)我們訓(xùn)練一個神經(jīng)網(wǎng)路時,我們不應(yīng)一次性發(fā)送全部輸入信號,而應(yīng)把輸入信號隨機(jī)分成幾個大小相同的數(shù)據(jù)塊發(fā)送。

與將全部數(shù)據(jù)一次性送入網(wǎng)絡(luò)相比,在訓(xùn)練時將數(shù)據(jù)分批發(fā)送,建立的模型會更具有一般性。

14) 周期 (Epochs) —— 一個周期表示對所有的數(shù)據(jù)批次都進(jìn)行了一次迭代,包括一次正向傳播和一次反向傳播,所以一個周期就意味著對所有的輸入數(shù)據(jù)分別進(jìn)行一次正向傳播和反向傳播。

訓(xùn)練網(wǎng)絡(luò)周期的次數(shù)是可以選擇的,往往周期數(shù)越高,模型的準(zhǔn)確性就越高,但是,耗時往往就越長。同樣你還需要考慮如果周期/紀(jì)元的次數(shù)過高,那么可能會出現(xiàn)過擬合的情況。

15)Dropout方法 —— Dropout是一個可以阻止網(wǎng)絡(luò)過擬合的規(guī)則化方法。就像它的名字那樣,在訓(xùn)練過程中隱藏的某些特定神經(jīng)元會被忽略掉(drop)。

這意味著網(wǎng)絡(luò)的訓(xùn)練是在幾個不同的結(jié)構(gòu)上完成的。這種dropout的方式就像是一場合奏,多個不同結(jié)構(gòu)網(wǎng)絡(luò)的輸出組合產(chǎn)生最終的輸出結(jié)果。

地址:https://arxiv.org/pdf/1207.0580.pdf

16) 分批標(biāo)準(zhǔn)化 (Batch Normalization) –——分批標(biāo)準(zhǔn)化就像是人們在河流中用以監(jiān)測水位的監(jiān)察站一樣。

這是為了保證下一層網(wǎng)絡(luò)得到的數(shù)據(jù)擁有合適的分布。在訓(xùn)練神經(jīng)網(wǎng)絡(luò)的過程中,每一次梯度下降后權(quán)重因子都會得到改變,從而會改變相應(yīng)的數(shù)據(jù)結(jié)構(gòu)。

但是下一層網(wǎng)絡(luò)希望能夠得到與之前分布相似的數(shù)據(jù),因此在每一次數(shù)據(jù)傳遞前都需要對數(shù)據(jù)進(jìn)行一次正則化處理。

卷積神經(jīng)網(wǎng)絡(luò)

17) 過濾器/濾波器 (Filters) ——CNN中的濾波器,具體是指將一個權(quán)重矩陣乘以輸入圖像的一個部分,產(chǎn)生相應(yīng)的卷積輸出。

比方說,對于一個28×28的圖片而言,將一個3×3的濾波器與圖片中3×3的矩陣依次相乘,從而得到相應(yīng)的卷積輸出。

濾波器的尺寸通常比原始圖片要小,與權(quán)重相似,在最小化成本的反向傳播中,濾波器也會被更新。就像下面這張圖片一樣,通過一個過濾器,依次乘以圖片中每個3×3的分塊,從而產(chǎn)生卷積的結(jié)果。

18)卷積神經(jīng)網(wǎng)絡(luò)CNN (Convolutional neural network)——卷積神經(jīng)網(wǎng)絡(luò)通常用來處理圖像數(shù)據(jù),假設(shè)輸入數(shù)據(jù)的形狀。

為28×28×3(28pixels×28pixels×RGBValue),那么對于傳統(tǒng)的神經(jīng)網(wǎng)絡(luò)來說就會有2352(28×28×3)個變量。隨著圖像尺寸的增加,那么變量的數(shù)量就會急劇增加。

通過對圖片進(jìn)行卷積,可以減少變量的數(shù)目(已在過濾器的概念中提及)。隨著過濾器沿著圖像上寬和高的兩個方向滑動,就會產(chǎn)生一個相應(yīng)的2維激活映射,最后再沿縱向?qū)⑺械募せ钣成涠询B在一起,就產(chǎn)生了最后的輸出。

可以參照下面這個示意圖。

19) 池化 (Pooling) –為進(jìn)一步減少變量的數(shù)目同時防止過擬合,一種常見的做法是在卷積層中引入池化層(pooling layer)。

最常用的池化層的操作是將原始圖片中每個4×4分塊取最大值形成一個新的矩陣,這叫做最大值池化(max pooling)。

也有人嘗試諸如平均池化(average pooling)之類的方式,但在實際情況中最大化池化擁有更好的效果。

20) 補(bǔ)白 (Padding)補(bǔ)白(Padding)通常是指給圖像的邊緣增加額外的空白,從而使得卷積后輸出的圖像跟輸入圖像在尺寸上一致,這也被稱作相同補(bǔ)白(Same Padding)。

如應(yīng)用過濾器,在相同補(bǔ)白的情況下,卷積后的圖像大小等于實際圖像的大小。

有效補(bǔ)白(Valid Padding)指的是保持圖片上每個真實的像素點(diǎn),不增加空白,因此在經(jīng)歷卷積后數(shù)據(jù)的尺寸會不斷變小。

21) 數(shù)據(jù)增強(qiáng) (Data Augmentation) –——數(shù)據(jù)增強(qiáng)指的是從已有數(shù)據(jù)中創(chuàng)造出新的數(shù)據(jù),通過增加訓(xùn)練量以期望能夠提高預(yù)測的準(zhǔn)確率。

比如,在數(shù)字識別中,我們遇到的數(shù)字9可能是傾斜或旋轉(zhuǎn)的,因此如果將訓(xùn)練的圖片進(jìn)行適度的旋轉(zhuǎn),增大訓(xùn)練量,那么模型的準(zhǔn)確性就可能會得到提高。

通過“旋轉(zhuǎn)”“照亮”的操作,訓(xùn)練數(shù)據(jù)的品質(zhì)得到了提升,這種過程被稱作數(shù)據(jù)增強(qiáng)。

遞歸神經(jīng)網(wǎng)絡(luò)

22) 遞歸神經(jīng)元 (Recurrent NeuralNetwork) —— 對于遞歸神經(jīng)元來說,經(jīng)由它自己處理過的數(shù)據(jù)會變成自身下一次的輸入,這個過程總共會進(jìn)行t次。

如下圖所示,將遞歸神經(jīng)元展開就相當(dāng)于t個不同的神經(jīng)元串聯(lián)起來,這種神經(jīng)元的長處是能夠產(chǎn)生一個更全面的輸出結(jié)果。

23) 遞歸神經(jīng)網(wǎng)絡(luò)(RNN-Recurrent NeuralNetwork) ——

遞歸神經(jīng)網(wǎng)絡(luò)通常被用于處理序列化的數(shù)據(jù),即前一項的輸出是用來預(yù)測下一項的輸入。

遞歸神經(jīng)網(wǎng)絡(luò)通常被用于處理序列化的數(shù)據(jù),即前一項的輸出是用來預(yù)測下一項的輸入。遞歸神經(jīng)網(wǎng)絡(luò)中存在環(huán)的結(jié)構(gòu),這些神經(jīng)元上的環(huán)狀結(jié)構(gòu)使得它們能夠存儲之前的數(shù)據(jù)一段時間,從而使得能夠預(yù)測輸出。

與遞歸神經(jīng)元相似,在RNN中隱含層的輸出會作為下一次的輸入,如此往復(fù)經(jīng)歷t次,再將輸出的結(jié)果傳遞到下一層網(wǎng)絡(luò)中。這樣,最終輸出的結(jié)果會更全面,而且之前訓(xùn)練的信息被保持的時間會更久。

隱藏層將反向傳遞錯誤以更新權(quán)重。這被稱為backpropagation through time (BPTT).

24) 梯度消失問題 –——當(dāng)激活函數(shù)的梯度非常小時,會出現(xiàn)梯度消失問題。在反向傳播過程中,權(quán)重因子會被多次乘以這些小的梯度。

因此會越變越小,隨著遞歸的深入趨于“消失”, 使得神經(jīng)網(wǎng)絡(luò)失去了長程可靠性。這在遞歸神經(jīng)網(wǎng)絡(luò)中是一個較普遍的問題,對于遞歸神經(jīng)網(wǎng)絡(luò)而言,長程可靠性尤為重要。

這一問題可通過采用ReLu等沒有小梯度的激活函數(shù)來有效避免。

25) 梯度爆炸問題 –——梯度爆炸問題與梯度消失問題正好相反,梯度爆炸問題中,激活函數(shù)的梯度過大。

在反向傳播過程中,部分節(jié)點(diǎn)的大梯度使得他們的權(quán)重變得非常大,從而削弱了其他節(jié)點(diǎn)對于結(jié)果的影響,這個問題可以通過截斷(即設(shè)置一個梯度允許的最大值)的方式來有效避免。

寫在最后

希望你們喜歡這篇文章。

本文對深度學(xué)習(xí)的基本概念做出了高度的概括,希望各位在閱讀這篇文章后,已對這些概念有了初步的了解。 我已盡可能地用最簡單的語言來解釋這些術(shù)語,如有任何疑問或糾正,請隨意發(fā)表評論。

如需瀏覽此文的上半部分,點(diǎn)此鏈接:原創(chuàng)譯文|從神經(jīng)網(wǎng)絡(luò)說起:深度學(xué)習(xí)初學(xué)者不可不知的25個術(shù)語和概念(上)

6) Input/ Output / Hidden Layer – Simply as the name suggests the input layer is theone which receives the input and is essentially the first layer of the network.

The output layer is the one which generates the output or is the final layer ofthe network. The processing layers are the hidden layers within the network.

These hidden layers are the ones which perform specific tasks on the incomingdata and pass on the output generated by them to the next layer.

The input andoutput layers are the ones visible to us, while are the intermediate layers arehidden.

7) MLP(Multi Layer perceptron) – A single neuron would not be able to perform highlycomplex tasks. Therefore, we use stacks of neurons to generate the desiredoutputs.

In the simplest network we would have an input layer, a hidden layerand an output layer.

Each layer has multiple neurons and all the neurons ineach layer are connected to all the neurons in the next layer. These networks canalso be called as fully connected networks.

8)Forward Propagation – Forward Propagation refers to the movement of the inputthrough the hidden layers to the output layers.

In forward propagation, theinformation travels in a single direction FORWARD. The input layer supplies theinput to the hidden layers and then the output is generated. There is nobackward movement.

9) CostFunction – When we build a network, the network tries to predict the output asclose as possible to the actual value.

We measure this accuracy of the networkusing the cost/loss function. The cost or loss function tries to penalize thenetwork when it makes errors.

Ourobjective while running the network is to increase our prediction accuracy andto reduce the error, hence minimizing the cost function. The most optimizedoutput is the one with least value of the cost or loss function.

If Idefine the cost function to be the mean squared error, it can be written as –

where m is the number of traininginputs, a is the predicted value and y is the actual value of that particularexample.

Thelearning process revolves around minimizing the cost.

10)Gradient Descent – Gradient descent is an optimization algorithm for minimizingthe cost. To think of it intuitively, while climbing down a hill you shouldtake small steps and walk down instead of just jumping down at once.

Therefore,what we do is, if we start from a point x, we move down a little i.e. delta h,and update our position to x-delta h and we keep doing the same till we reachthe bottom. Consider bottom to be the minimum cost point.

Mathematically,to find the local minimum of a function one takes steps proportional to thenegative of the gradient of the function.

You cango through this article for a detailed understanding of gradient descent.

11)Learning Rate – The learning rate is defined as the amount of minimization inthe cost function in each iteration. In simple terms, the rate at which wedescend towards the minima of the cost function is the learning rate.

We shouldchoose the learning rate very carefully since it should neither be very largethat the optimal solution is missed and nor should be very low that it takesforever for the network to converge.

12)Backpropagation – When we define a neural network, we assign random weights andbias values to our nodes. Once we have received the output for a singleiteration, we can calculate the error of the network.

This error is then fedback to the network along with the gradient of the cost function to update theweights of the network.

These weights are then updated so that the errors inthe subsequent iterations is reduced. This updating of weights using thegradient of the cost function is known as back-propagation.

Inback-propagation the movement of the network is backwards, the error along withthe gradient flows back from the out layer through the hidden layers and theweights are updated.

13)Batches – While training a neural network, instead of sending the entire inputin one go, we divide in input into several chunks of equal size randomly.

Training the data on batches makes the model more generalized as compared tothe model built when the entire data set is fed to the network in one go.

14)Epochs – An epoch is defined as a single training iteration of all batches inboth forward and back propagation. This means 1 epoch is a single forward andbackward pass of the entire input data.

Thenumber of epochs you would use to train your network can be chosen by you. It’shighly likely that more number of epochs would show higher accuracy of thenetwork.

however, it would also take longer for the network to converge. Alsoyou must take care that if the number of epochs are too high, the network mightbe over-fit.

15)Dropout – Dropout is a regularization technique which prevents over-fitting ofthe network. As the name suggests, during training a certain number of neuronsin the hidden layer is randomly dropped.

This means that the training happenson several architectures of the neural network on different combinations of theneurons.

You can think of drop out as an ensemble technique, where the outputof multiple networks is then used to produce the final output.

16) BatchNormalization – As a concept, batch normalization can be considered as a dam wehave set as specific checkpoints in a river.

This is done to ensure thatdistribution of data is the same as the next layer hoped to get.

When we aretraining the neural network, the weights are changed after each step ofgradient descent. This changes the how the shape of data is sent to the nextlayer

But thenext layer was expecting the distribution similar to what it had previouslyseen. So we explicitly normalize the data before sending it to the next layer.

ConvolutionalNeural Networks

17)Filters – A filter in a CNN is like a weight matrix with which we multiply apart of the input image to generate a convoluted output.

Let’s assume we havean image of size 28*28. We randomly assign a filter of size 3*3, which is thenmultiplied with different 3*3 sections of the image to form what is known as aconvoluted output.

The filter size is generally smaller than the original imagesize. The filter values are updated like weight values during backpropagationfor cost minimization.

Considerthe below image. Here filter is a 3*3 matrix which is multiplied with each 3*3 section of the image to form theconvolved feature.

18) CNN(Convolutional neural network) – Convolutional neural networks are basicallyapplied on image data.

Suppose we have an input of size (28*28*3), If we use anormal neural network, there would be 2352(28*28*3) parameters. And as the sizeof the image increases the number of parameters becomes very large.

We“convolve” the images to reduce the number of parameters (as shown above infilter definition).

As we slide the filter over the width and height of theinput volume we will produce a 2-dimensional activation map that gives theoutput of that filter at every position.

We will stack these activation mapsalong the depth dimension and produce the output volume.

You cansee the below diagram for a clearer picture.

19)Pooling – It is common to periodically introduce pooling layers in between theconvolution layers. This is basically done to reduce a number of parameters andprevent over-fitting.

The most common type of pooling is a pooling layer offilter size(2,2) using the MAX operation. What it would do is, it would takethe maximum of each 4*4 matrix of the original image.

You canalso pool using other operations like Average pooling, but max pooling hasshown to work better in practice.

20)Padding – Padding refers to adding extra layer of zeros across the images sothat the output image has the same size as the input. This is known as samepadding.

After theapplication of filters the convolvedlayer in the case of same padding has the size equal to the actual image.

Validpadding refers to keeping the image as such an having all the pixels of theimage which are actual or “valid”.

In this case after the application offilters the size of the length and the width of the output keeps gettingreduced at each convolutional layer.

21) DataAugmentation – Data Augmentation refers to the addition of new data derivedfrom the given data, which might prove to be beneficial for prediction.

Forexample, it might be easier to view the cat in a dark image if you brighten it,or for instance, a 9 in the digit recognition might be slightly tilted orrotated.

In this case, rotation would solve the problem and increase theaccuracy of our model. By rotating or brightening we’re improving the qualityof our data. This is known as Data augmentation.

RecurrentNeural Network

22)Recurrent Neuron – A recurrent neuron is one in which the output of the neuronis sent back to it for t time stamps.

If you look at the diagram the output issent back as input t times. The unrolled neuron looks like t different neuronsconnected together. The basic advantage of this neuron is that it gives a moregeneralized output.

23)RNN(Recurrent Neural Network) – Recurrent neural networks are used especiallyfor sequential data where the previous output is used to predict the next one.

In this case the networks have loops within them. The loops within the hiddenneuron gives them the capability to store information about the previous wordsfor some time to be able to predict the output.

The output of the hidden layeris sent again to the hidden layer for t time stamps. The unfolded neuron lookslike the above diagram.

The output of the recurrent neuron goes to the nextlayer only after completing all the time stamps. The output sent is moregeneralized and the previous information is retained for a longer period.

The erroris then back propagated according to the unfolded network to update theweights. This is known as backpropagation through time(BPTT).

24)Vanishing Gradient Problem – Vanishing gradient problem arises in cases wherethe gradient of the activation function is very small. During back propagationwhen the weights are multiplied with these low gradients, they tend to becomevery small and “vanish” as they go further deep in the network. This makes theneural network to forget the long range dependency. This generally becomes aproblem in cases of recurrent neural networks where long term dependencies arevery important for the network to remember.

This canbe solved by using activation functions like ReLu which do not have smallgradients.

25)Exploding Gradient Problem – This is the exact opposite of the vanishinggradient problem, where the gradient of the activation function is too large.

During back propagation, it makes the weight of a particular node very highwith respect to the others rendering them insignificant. This can be easilysolved by clipping the gradient so that it doesn’t exceed a certain value.

End Notes

I hopeyou enjoyed going through the article. I have given a high level overview ofthe basic deep learning terms. I hope you now have a basic understanding ofthese terms.

I have tried to explain everything in a language as easy aspossible, however in case of any doubts/clarifications, please feel free todrop in your comments.

翻譯:燈塔大數(shù)據(jù)

閱讀原文
了解更多詳
本站僅提供存儲服務(wù),所有內(nèi)容均由用戶發(fā)布,如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容,請點(diǎn)擊舉報
打開APP,閱讀全文并永久保存 查看更多類似文章
猜你喜歡
類似文章
Implementing a Neural Network from Scratch in Python – An Introduction | WildML
AI(人工智能)術(shù)語中英文對照表
遞歸神經(jīng)網(wǎng)絡(luò)(RNN,Recurrent Neural Networks)和反向傳播的指南 A gu...
HALCON 20.11:深度學(xué)習(xí)筆記(7)
A Beginner's Guide to Recurrent Networks and LSTMs...
Fundamentals of Deep Learning
更多類似文章 >>
生活服務(wù)
熱點(diǎn)新聞
分享 收藏 導(dǎo)長圖 關(guān)注 下載文章
綁定賬號成功
后續(xù)可登錄賬號暢享VIP特權(quán)!
如果VIP功能使用有故障,
可點(diǎn)擊這里聯(lián)系客服!

聯(lián)系客服