本文將介紹在Linux系統(tǒng)中,數(shù)據(jù)包是如何一步一步從網(wǎng)卡傳到進(jìn)程手中的。
如果英文沒有問題,強(qiáng)烈建議閱讀后面參考里的兩篇文章,里面介紹的更詳細(xì)。
本文只討論以太網(wǎng)的物理網(wǎng)卡,不涉及虛擬設(shè)備,并且以一個(gè)UDP包的接收過程作為示例.
本示例里列出的函數(shù)調(diào)用關(guān)系來自于kernel 3.13.0,如果你的內(nèi)核不是這個(gè)版本,函數(shù)名稱和相關(guān)路徑可能不一樣,但背后的原理應(yīng)該是一樣的(或者有細(xì)微差別)
網(wǎng)卡需要有驅(qū)動(dòng)才能工作,驅(qū)動(dòng)是加載到內(nèi)核中的模塊,負(fù)責(zé)銜接網(wǎng)卡和內(nèi)核的網(wǎng)絡(luò)模塊,驅(qū)動(dòng)在加載的時(shí)候?qū)⒆约鹤赃M(jìn)網(wǎng)絡(luò)模塊,當(dāng)相應(yīng)的網(wǎng)卡收到數(shù)據(jù)包時(shí),網(wǎng)絡(luò)模塊會調(diào)用相應(yīng)的驅(qū)動(dòng)程序處理數(shù)據(jù)。
下圖展示了數(shù)據(jù)包(packet)如何進(jìn)入內(nèi)存,并被內(nèi)核的網(wǎng)絡(luò)模塊開始處理:
+-----+ | | Memroy +--------+ 1 | | 2 DMA +--------+--------+--------+--------+ | Packet |-------->| NIC |------------>| Packet | Packet | Packet | ...... | +--------+ | | +--------+--------+--------+--------+ | |<--------+ +-----+ | | +---------------+ | | 3 | Raise IRQ | Disable IRQ | 5 | | | ↓ | +-----+ +------------+ | | Run IRQ handler | | | CPU |------------------>| NIC Driver | | | 4 | | +-----+ +------------+ | 6 | Raise soft IRQ | ↓
軟中斷會觸發(fā)內(nèi)核網(wǎng)絡(luò)模塊中的軟中斷處理函數(shù),后續(xù)流程如下
+-----+ 17 | | +----------->| NIC | | | | |Enable IRQ +-----+ | | +------------+ Memroy | | Read +--------+--------+--------+--------+ +--------------->| NIC Driver |<--------------------- | Packet | Packet | Packet | ...... | | | | 9 +--------+--------+--------+--------+ | +------------+ | | | skb Poll | 8 Raise softIRQ | 6 +-----------------+ | | 10 | | ↓ ↓ +---------------+ Call +-----------+ +------------------+ +--------------------+ 12 +---------------------+ | net_rx_action |<-------| ksoftirqd | | napi_gro_receive |------->| enqueue_to_backlog |----->| CPU input_pkt_queue | +---------------+ 7 +-----------+ +------------------+ 11 +--------------------+ +---------------------+ | | 13 14 | + - - - - - - - - - - - - - - - - - - - - - - + ↓ ↓ +--------------------------+ 15 +------------------------+ | __netif_receive_skb_core |----------->| packet taps(AF_PACKET) | +--------------------------+ +------------------------+ | | 16 ↓ +-----------------+ | protocol layers | +-----------------+
enqueue_to_backlog函數(shù)也會被netif_rx函數(shù)調(diào)用,而netif_rx正是lo設(shè)備發(fā)送數(shù)據(jù)包時(shí)調(diào)用的函數(shù)
由于是UDP包,所以第一步會進(jìn)入IP層,然后一級一級的函數(shù)往下調(diào):
| | ↓ promiscuous mode && +--------+ PACKET_OTHERHOST (set by driver) +-----------------+ | ip_rcv |-------------------------------------->| drop this packet| +--------+ +-----------------+ | | ↓ +---------------------+ | NF_INET_PRE_ROUTING | +---------------------+ | | ↓ +---------+ | | enabled ip forword +------------+ +----------------+ | routing |-------------------->| ip_forward |------->| NF_INET_FORWARD | | | +------------+ +----------------+ +---------+ | | | | destination IP is local ↓ ↓ +---------------+ +------------------+ | dst_output_sk | | ip_local_deliver | +---------------+ +------------------+ | | ↓ +------------------+ | NF_INET_LOCAL_IN | +------------------+ | | ↓ +-----------+ | UDP layer | +-----------+
| | ↓ +---------+ +-----------------------+ | udp_rcv |----------->| __udp4_lib_lookup_skb | +---------+ +-----------------------+ | | ↓ +--------------------+ +-----------+ | sock_queue_rcv_skb |----->| sk_filter | +--------------------+ +-----------+ | | ↓ +------------------+ | __skb_queue_tail | +------------------+ | | ↓ +---------------+ | sk_data_ready | +---------------+
調(diào)用完sk_data_ready之后,一個(gè)數(shù)據(jù)包處理完成,等待應(yīng)用層程序來讀取,上面所有函數(shù)的執(zhí)行過程都在軟中斷的上下文中。
應(yīng)用層一般有兩種方式接收數(shù)據(jù),一種是recvfrom函數(shù)阻塞在那里等著數(shù)據(jù)來,這種情況下當(dāng)socket收到通知后,recvfrom就會被喚醒,然后讀取接收隊(duì)列的數(shù)據(jù);另一種是通過epoll或者select監(jiān)聽相應(yīng)的socket,當(dāng)收到通知后,再調(diào)用recvfrom函數(shù)去讀取接收隊(duì)列的數(shù)據(jù)。兩種情況都能正常的接收到相應(yīng)的數(shù)據(jù)包。
了解數(shù)據(jù)包的接收流程有助于幫助我們搞清楚我們可以在哪些地方監(jiān)控和修改數(shù)據(jù)包,哪些情況下數(shù)據(jù)包可能被丟棄,為我們處理網(wǎng)絡(luò)問題提供了一些參考,同時(shí)了解netfilter中相應(yīng)鉤子的位置,對于了解iptables的用法有一定的幫助,同時(shí)也會幫助我們后續(xù)更好的理解Linux下的網(wǎng)絡(luò)虛擬設(shè)備。
在接下來的幾篇文章中,將會介紹Linux下的網(wǎng)絡(luò)虛擬設(shè)備和iptables。
Monitoring and Tuning the Linux Networking Stack: Receiving Data
Illustrated Guide to Monitoring and Tuning the Linux Networking Stack: Receiving Data
NAPI
聯(lián)系客服