邻居子系统 arp 输入

阅读 84

2022-06-24

要成功添加一条邻居表项,需要满足两个条件:

1. 本机使用该表项;

2. 对方主机进行了确认。

同时,表项的添加引入了NUD(Neighbour Unreachability Detection)机制,从创建NUD_NONE到可用NUD_REACHABLE需要经历一系列状态转移,

而根据达到两个条件顺序的不同,可以分为两条路线:
 先引用再确认- NUD_NONE -> NUD_INCOMPLETE -> NUD_REACHABLE
先确认再引用- NUD_NONE -> NUD_STALE -> NUD_DELAY -> NUD_PROBE -> NUD_REACHABLE

 

 

/*
* Process an arp request.
*/

static int arp_process(struct net *net, struct sock *sk, struct sk_buff *skb)
{
struct net_device *dev = skb->dev;
struct in_device *in_dev = __in_dev_get_rcu(dev);
struct arphdr *arp;
unsigned char *arp_ptr;
struct rtable *rt;
unsigned char *sha;
unsigned char *tha = NULL;
__be32 sip, tip;
u16 dev_type = dev->type;
int addr_type;
struct neighbour *n;
struct dst_entry *reply_dst = NULL;
bool is_garp = false;

/* arp_rcv below verifies the ARP header and verifies the device
* is ARP'able.
*/

if (!in_dev)//dev->ip_ptr 确认网络设备的ip配置块是否正常
goto out_free_skb;

arp = arp_hdr(skb);

switch (dev_type) {
default:
if (arp->ar_pro != htons(ETH_P_IP) ||
htons(dev_type) != arp->ar_hrd)
goto out_free_skb;
break;
case ARPHRD_ETHER:
case ARPHRD_FDDI:
case ARPHRD_IEEE802:
/*
* ETHERNET, and Fibre Channel (which are IEEE 802
* devices, according to RFC 2625) devices will accept ARP
* hardware types of either 1 (Ethernet) or 6 (IEEE 802.2).
* This is the case also of FDDI, where the RFC 1390 says that
* FDDI devices should accept ARP hardware of (1) Ethernet,
* however, to be more robust, we'll accept both 1 (Ethernet)
* or 6 (IEEE 802.2)
*/
if ((arp->ar_hrd != htons(ARPHRD_ETHER) &&
arp->ar_hrd != htons(ARPHRD_IEEE802)) ||
arp->ar_pro != htons(ETH_P_IP))
goto out_free_skb;
break;
case ARPHRD_AX25:
if (arp->ar_pro != htons(AX25_P_IP) ||
arp->ar_hrd != htons(ARPHRD_AX25))
goto out_free_skb;
break;
case ARPHRD_NETROM:
if (arp->ar_pro != htons(AX25_P_IP) ||
arp->ar_hrd != htons(ARPHRD_NETROM))
goto out_free_skb;
break;
}

/* Understand only these message types
只处理arp reply request 请求
*/

if (arp->ar_op != htons(ARPOP_REPLY) &&
arp->ar_op != htons(ARPOP_REQUEST))
goto out_free_skb;

/*
* Extract fields
*/
arp_ptr = (unsigned char *)(arp + 1);
sha = arp_ptr;
arp_ptr += dev->addr_len;
memcpy(&sip, arp_ptr, 4);//发送方 sip
arp_ptr += 4;
switch (dev_type) {
#if IS_ENABLED(CONFIG_FIREWIRE_NET)
case ARPHRD_IEEE1394:
break;
#endif
default:
tha = arp_ptr;
arp_ptr += dev->addr_len;
}
//目的ip
memcpy(&tip, arp_ptr, 4);
/*
* Check for bad requests for 127.x.x.x and requests for multicast
* addresses. If this is one such, delete it.
*/
/*
丢弃目标ip为多播 或者 在没有开启route localnet条件下,
丢弃lo地址route_localnet:作用如下该参数指定一个网络设备是否允许转发目的或源地址为127/8的数据包,
也就是来自或去往lo设备的数据包
Do not consider loopback addresses as martian source or destination while routing.
This enables the use of 127/8 for local routing purposes
*/
if (ipv4_is_multicast(tip) ||
(!IN_DEV_ROUTE_LOCALNET(in_dev) && ipv4_is_loopback(tip)))
goto out_free_skb;

/*
* For some 802.11 wireless deployments (and possibly other networks),
* there will be an ARP proxy and gratuitous ARP frames are attacks
* and thus should not be accepted.
*/
if (sip == tip && IN_DEV_ORCONF(in_dev, DROP_GRATUITOUS_ARP)//丢弃免费arp报文)
goto out_free_skb;

/*
* Special case: We must set Frame Relay source Q.922 address
*/
if (dev_type == ARPHRD_DLCI)
sha = dev->broadcast;

/*
* Process entry. The idea here is we want to send a reply if it is a
* request for us or if it is a request for someone else that we hold
* a proxy for. We want to add an entry to our cache if it is a reply
* to us or if it is a request for our address.
* (The assumption for this last is that if someone is requesting our
* address, they are probably intending to talk to us, so it saves time
* if we cache their address. Their address is also probably not in
* our cache, since ours is not in their cache.)
*
* Putting this another way, we only care about replies if they are to
* us, in which case we add them to the cache. For requests, we care
* about those for us and those for our proxies. We reply to both,
* and in the case of requests for us we add the requester to the arp
* cache.
*/

if (arp->ar_op == htons(ARPOP_REQUEST) && skb_metadata_dst(skb))
reply_dst = (struct dst_entry *)
iptunnel_metadata_reply(skb_metadata_dst(skb),
GFP_ATOMIC);

/* Special case: IPv4 duplicate address detection packet (RFC2131)
用来检测冲突的arp报文
*/
if (sip == 0) {
//在确定目标报文为本机本地ip后
if (arp->ar_op == htons(ARPOP_REQUEST) &&
inet_addr_type_dev_table(net, dev, tip) == RTN_LOCAL &&
!arp_ignore(in_dev, sip, tip))
//arp_ignore参数的作用是控制系统在收到外部的arp请求时,是否要返回arp响应。
//发送arp应答
arp_send_dst(ARPOP_REPLY, ETH_P_ARP, sip, dev, tip,
sha, dev->dev_addr, sha, reply_dst);
goto out_consume_skb;
}
/*如果是arp请求 根据arp 的目的ip tip 查找路由*/
if (arp->ar_op == htons(ARPOP_REQUEST) &&
ip_route_input_noref(skb, tip, sip, 0, dev) == 0) {

rt = skb_rtable(skb);
addr_type = rt->rt_type;

if (addr_type == RTN_LOCAL) {//处理发送给本机的arp 请求
int dont_send;

dont_send = arp_ignore(in_dev, sip, tip);
if (!dont_send && IN_DEV_ARPFILTER(in_dev))
dont_send = arp_filter(sip, tip, dev);
if (!dont_send) {
//call neigh_update(neigh, lladdr, NUD_STALE, NEIGH_UPDATE_F_OVERRIDE, 0); 更新邻居表项
n = neigh_event_ns(&arp_tbl, sha, &sip, dev);
if (n) {
arp_send_dst(ARPOP_REPLY, ETH_P_ARP,
sip, dev, tip, sha,
dev->dev_addr, sha,
reply_dst);
neigh_release(n);
}
}
goto out_consume_skb;
} else if (IN_DEV_FORWARD(in_dev)) {//收到的arp 请求不是本机的报文
if (addr_type == RTN_UNICAST &&
(arp_fwd_proxy(in_dev, dev, rt) ||
arp_fwd_pvlan(in_dev, dev, rt, sip, tip) ||
(rt->dst.dev != dev &&
pneigh_lookup(&arp_tbl, net, &tip, dev, 0)))) {
/*
1. 是否允许代理
2.rp 输入输出设备 不是同一个且arp表中有相关代理?
neigh_event_ns()与neigh_release()配套使用并不代表创建后又被释放?
琻eigh被释放的条件是neigh->refcnt==0,但neigh创建时的refcnt=1,
而neigh_event_ns会使refcnt+1,neigh_release会使-1,
此时refcnt的值还是1,
只有当下次单独调用neigh_release时才会被释放。

?
*/
n = neigh_event_ns(&arp_tbl, sha, &sip, dev);
if (n)
neigh_release(n);//释放邻居表项
/*如果报文来自报文缓冲队列 或者arp报文发送给本机
arp 代理不需要延时 直接回复应答报文
*/
if (NEIGH_CB(skb)->flags & LOCALLY_ENQUEUED ||
skb->pkt_type == PACKET_HOST ||
NEIGH_VAR(in_dev->arp_parms, PROXY_DELAY) == 0) {
arp_send_dst(ARPOP_REPLY, ETH_P_ARP,
sip, dev, tip, sha,
dev->dev_addr, sha,
reply_dst);
} else {
//需要延时处理代理报文请求 加入队列 启动定时器
pneigh_enqueue(&arp_tbl,
in_dev->arp_parms, skb);
goto out_free_dst;
}
goto out_consume_skb;
}
}
}

/* Update our ARP tables
对于arp 应答 reply 或者没有处理的arp请求
*/
//neigh_lookup 最后参数为0 表示只是查找 找不到不会创建
//根据sip 查找
n = __neigh_lookup(&arp_tbl, &sip, dev, 0);
/*
arp_accept - BOOLEAN
Define behavior for gratuitous ARP frames who's IP is not
already present in the ARP table:
0 - don't create new entries in the ARP table
1 - create new entries in the ARP table

Both replies and requests type gratuitous arp will trigger the
ARP table to be updated, if this setting is on.

If the ARP table already contains the IP address of the
gratuitous arp frame, the arp table will be updated regardless
if this setting is on or off.

*/
addr_type = -1;
if (n || IN_DEV_ARP_ACCEPT(in_dev)) {
//是否为免费arp请求
is_garp = arp_is_garp(net, dev, &addr_type, arp->ar_op,
sip, tip, sha, tha);
}

if (IN_DEV_ARP_ACCEPT(in_dev)) {
/* Unsolicited ARP is not accepted by default.
It is possible, that this option should be enabled for some
devices (strip is candidate)
*/
if (!n &&
(is_garp ||//如果是免费arp 创建 neigh
(arp->ar_op == htons(ARPOP_REPLY) &&
(addr_type == RTN_UNICAST ||
(addr_type < 0 &&
/* postpone calculation to as late as possible */
inet_addr_type_dev_table(net, dev, sip) ==
RTN_UNICAST)))))
n = __neigh_lookup(&arp_tbl, &sip, dev, 1);
}

if (n) {
int state = NUD_REACHABLE;
int override;

/* If several different ARP replies follows back-to-back,
use the FIRST one. It is possible, if several proxy
agents are active. Taking the first reply prevents
arp trashing and chooses the fastest router.
*/
override = time_after(jiffies,
n->updated +
NEIGH_VAR(n->parms, LOCKTIME)) ||
is_garp;

/* Broadcast replies and request packets
do not assert neighbour reachability.
*/
if (arp->ar_op != htons(ARPOP_REPLY) ||
skb->pkt_type != PACKET_HOST)
state = NUD_STALE;
/*如果是发送给本机的arp reply 则应该是 reachbale 状态
否者NUD_STALE,如果跟新时间已经超过LOCKTIME
则直接NEIGH_UPDATE_F_OVERRIDE
*/
neigh_update(n, sha, state,
override ? NEIGH_UPDATE_F_OVERRIDE : 0, 0);
neigh_release(n);
}
/*
先引用再确认- NUD_NONE -> NUD_INCOMPLETE -> NUD_REACHABLE
先确认再引用- NUD_NONE -> NUD_STALE -> NUD_DELAY -> NUD_PROBE -> NUD_REACHABLE
NEIGH_CB(skb)实际就是skb->cb,在skb声明为u8 char[48],它用作每个协议模
块的私有数据区(control buffer),每个协议模块可以根据自
身需求在其中存储私有数据。
而arp模块就利用了它存储控制结构neighbour_cb,
它声明如下,占8字节。这个控制结构在代理ARP中使用
工作队列时会发挥作用,sched_next代表下次被调度的时间,
flags是标志。

收到arp请求,NUD_NONE -> NUD_STALE;
收到arp响应,NUD_INCOMPLETE/NUD_DELAY/NUD_PROBE -> NUD_REACHABLE。

还存在NUD_NONE -> NUD_REACHABLE和NUD_INCOMPLETE -> NUD_STALE的转移????

neigh_timer_handler定时器、neigh_periodic_work工作队列会异步的更改NUD状态,
neigh_timer_handler用于NUD_INCOMPLETE, NUD_DELAY, NUD_PROBE, NUD_REACHABLE状态;
neigh_periodic_work用于NUD_STALE。注意neigh_timer_handler是每个表项一个的,
而neigh_periodic_work是唯一的,NUD_STALE状态的表项没必要单独使用定时器,
定期检查过期就可以了,这样大大节省了资源。
neigh_update则专门用于更新表项状态,neigh_send_event则是
解析表项时的状态更新

*/
out_consume_skb:
consume_skb(skb);

out_free_dst:
dst_release(reply_dst);
return NET_RX_SUCCESS;

out_free_skb:
kfree_skb(skb);
return NET_RX_DROP;
}

 

 

/*
Define different modes for sending replies in response toreceived ARP requests that resolve local target IP addresses:
0 - (default): reply for any local target IP address, configuredon any interface
1 - reply only if the target IP address is local addressconfigured on the incoming interface
2 - reply only if the target IP address is local addressconfigured on the incoming interface and both
with thesender's IP address are part from same subnet on this interface
3 - do not reply for local addresses configured with scope host,only resolutions for global and link addresses are replied
4-7 - reserved
8 - do not reply for all local addresses

0:响应任意网卡上接收到的对本机IP地址的arp请求(包括环回网卡上的地址),
而不管该目的IP是否在接收网卡上。
1:只响应目的IP地址为接收网卡上的本地地址的arp请求。
2:只响应目的IP地址为接收网卡上的本地地址的arp请求,并且arp请求的源IP必须和接收网卡同网段。
3:如果ARP请求数据包所请求的IP地址对应的本地地址其作用域(scope)为主机(host),
则不回应ARP响应数据包,如果作用域为全局(global)或链路(link),则回应ARP响应数据包。
4~7:保留未使用
8:不回应所有的arp请求作者?

*/
static int arp_ignore(struct in_device *in_dev, __be32 sip, __be32 tip)
{
struct net *net = dev_net(in_dev->dev);
int scope;

switch (IN_DEV_ARP_IGNORE(in_dev)) {
case 0: /* Reply, the tip is already validated */
return 0;
case 1: /* Reply only if tip is configured on the incoming interface */
sip = 0;
scope = RT_SCOPE_HOST;
break;
case 2: /*
* Reply only if tip is configured on the incoming interface
* and is in same subnet as sip
*/
scope = RT_SCOPE_HOST;
break;
case 3: /* Do not reply for scope host addresses */
sip = 0;
scope = RT_SCOPE_LINK;
in_dev = NULL;
break;
case 4: /* Reserved */
case 5:
case 6:
case 7:
return 0;
case 8: /* Do not reply */
return 1;
default:
return 0;
}
return !inet_confirm_addr(net, in_dev, sip, tip, scope);
}
/*
根据arp请求中的发送方ip 目的ip ,查找输出到arp请求报文的发送方路由
arp_filter -
BOOLEAN 1 - Allows you to have multiple network interfaces on the same subnet,
and have the ARPs for each interface be answered based on whether or not the kernel would route a packet
from the ARP'd IP out that interface (therefore you must use source based routing for this to work). In other words
it allows control of which cards (usually 1) will respond to an arp request.

0 - (default) The kernel can respond to arp requests with addresses from other interfaces. This may seem wrong but
it usually makes sense, because it increases the chance of successful communication. IP addresses are owned by the
complete host on Linux, not by particular interfaces. Only for more complex setups like load- balancing, does this behaviour cause problems.
arpfilter for the interface will be enabled if at least one of conf/{all,interface}/arpfilter is set to TRUE, it will be disabled otherwise
这个参数对arp报文的源ip进行判断决定响应行为
和 arp 响应有关系
当arp_filter设置为0时如果从某张网卡上收到了一个arp请求同时目的ip在此主机上。
不论目的ip是否在接收到此arp请求的网卡上那么主机便会进行响应
响应的mac地址为接收到此请求的网卡的mac地址。

当arp_filter设置为1时如果从某张网卡上收到了一个arp请求同时目的ip在此主机上
不要求目的ip是一定在接收到此arp请求的网卡上
那么主机便会查询到此请求的源ip的路由是通过哪张网卡
如果是接收到此arp请求的网卡则发送arp响应响应的mac地址为接收到此请求的网卡的mac地址
否者不发发送
*/
static int arp_filter(__be32 sip, __be32 tip, struct net_device *dev)
{
struct rtable *rt;
int flag = 0;
/*unsigned long now; */
struct net *net = dev_net(dev);

rt = ip_route_output(net, sip, tip, 0, 0);
if (IS_ERR(rt))
return 1;
if (rt->dst.dev != dev) {
__NET_INC_STATS(net, LINUX_MIB_ARPFILTER);
flag = 1;
}
ip_rt_put(rt);
return flag;
}

 

 

/* Called when a timer expires for a neighbour entry. 
neigh_timer_handler 定时器函数
当neigh处于NUD_INCOMPLETE, NUD_DELAY, NUD_PEOBE, NUD_REACHABLE时会添加定时器,即neigh_timer_handler,它处理各个状态在定时器到期时的情况。
当neigh处于NUD_REACHABLE状态时,根据NUD的状态转移图,它有三种转移可能,分别对应下面三个条件语句。
neigh->confirmed代表最近收到来自对应邻居项的报文时间,neigh->used代表最近使用该邻居项的时间。
-如果超时,但期间收到对方的报文,不更改状态,并重置超时时间为neigh->confirmed+reachable_time;
-如果超时,期间未收到对方报文,但主机使用过该项,则迁移至NUD_DELAY状态,并重置超时时间为neigh->used+delay_probe_time;
-如果超时,且既未收到对方报文,也未使用过该项,则怀疑该项可能不可用了,迁移至NUD_STALE状态,而不是立即删除,neigh_periodic_work()会定时的清除NUD_STALE状态的表项。

当neigh处于NUD_DELAY状态时,根据NUD的状态转移图,它有二种转移可能,分别对应下面二个条件语句。
-如果超时,期间收到对方报文,迁移至NUD_REACHABLE,记录下次检查时间到next;
-如果超时,期间未收到对方的报文,迁移至NUD_PROBE,记录下次检查时间到next。
在NUD_STALE->NUD_PROBE中间还插入NUD_DELAY状态,是为了减少ARP包的数目,期望在定时时间内会收到对方的确认报文,而不必再进行地址解析


*/

static void neigh_timer_handler(unsigned long arg)
{
unsigned long now, next;
struct neighbour *neigh = (struct neighbour *)arg;
unsigned int state;
int notify = 0;

write_lock(&neigh->lock);

state = neigh->nud_state;
now = jiffies;
next = now + HZ;

if (!(state & NUD_IN_TIMER))
goto out;

if (state & NUD_REACHABLE) {
/* Called when a timer expires for a neighbour entry.
neigh_timer_handler 定时器函数
当neigh处于NUD_INCOMPLETE, NUD_DELAY, NUD_PEOBE, NUD_REACHABLE时会添加定时器,即neigh_timer_handler,它处理各个状态在定时器到期时的情况。
当neigh处于NUD_REACHABLE状态时,根据NUD的状态转移图,它有三种转移可能,分别对应下面三个条件语句。
neigh->confirmed代表最近收到来自对应邻居项的报文时间,neigh->used代表最近使用该邻居项的时间。
-如果超时,但期间收到对方的报文,不更改状态,并重置超时时间为neigh->confirmed+reachable_time;
-如果超时,期间未收到对方报文,但主机使用过该项,则迁移至NUD_DELAY状态,并重置超时时间为neigh->used+delay_probe_time;
-如果超时,且既未收到对方报文,也未使用过该项,则怀疑该项可能不可用了,迁移至NUD_STALE状态,而不是立即删除,neigh_periodic_work()会定时的清除NUD_STALE状态的表项。


*/
if (time_before_eq(now,
neigh->confirmed + neigh->parms->reachable_time)) {
neigh_dbg(2, "neigh %p is still alive\n", neigh);
next = neigh->confirmed + neigh->parms->reachable_time;
} else if (time_before_eq(now,
neigh->used +
NEIGH_VAR(neigh->parms, DELAY_PROBE_TIME))) {
neigh_dbg(2, "neigh %p is delayed\n", neigh);
neigh->nud_state = NUD_DELAY;
neigh->updated = jiffies;
neigh_suspect(neigh);
next = now + NEIGH_VAR(neigh->parms, DELAY_PROBE_TIME);
} else {
neigh_dbg(2, "neigh %p is suspected\n", neigh);
neigh->nud_state = NUD_STALE;
neigh->updated = jiffies;
neigh_suspect(neigh);
notify = 1;
}
} else if (state & NUD_DELAY) {
/*
当neigh处于NUD_DELAY状态时,根据NUD的状态转移图,它有二种转移可能,分别对应下面二个条件语句。
-如果超时,期间收到对方报文,迁移至NUD_REACHABLE,记录下次检查时间到next;
-如果超时,期间未收到对方的报文,迁移至NUD_PROBE,记录下次检查时间到next。
在NUD_STALE->NUD_PROBE中间还插入NUD_DELAY状态,是为了减少ARP包的数目,期望在定时时间内会收到对方的确认报文,而不必再进行地址解析

*/
if (time_before_eq(now,
neigh->confirmed +
NEIGH_VAR(neigh->parms, DELAY_PROBE_TIME))) {
neigh_dbg(2, "neigh %p is now reachable\n", neigh);
neigh->nud_state = NUD_REACHABLE;
neigh->updated = jiffies;
neigh_connect(neigh);
notify = 1;
next = neigh->confirmed + neigh->parms->reachable_time;
} else {
neigh_dbg(2, "neigh %p is probed\n", neigh);
neigh->nud_state = NUD_PROBE;
neigh->updated = jiffies;
atomic_set(&neigh->probes, 0);
notify = 1;
next = now + NEIGH_VAR(neigh->parms, RETRANS_TIME);
}
} else {
/* NUD_PROBE|NUD_INCOMPLETE
当neigh处于NUD_PROBE或NUD_INCOMPLETE状态时,记录下次检查时间到next,
因为这两种状态需要发送ARP解析报文,它们过程的迁移依赖于ARP解析的进程。*/
next = now + NEIGH_VAR(neigh->parms, RETRANS_TIME);
}

if ((neigh->nud_state & (NUD_INCOMPLETE | NUD_PROBE)) &&
atomic_read(&neigh->probes) >= neigh_max_probes(neigh)) {
/* 经过定时器超时后的状态转移,如果neigh处于NUD_PROBE或NUD_INCOMPLETE,
则会发送ARP报文,先会检查报文发送的次数,如果超过了限度,
表明对方主机没有回应,则neigh进入NUD_FAILED,被释放掉。*/
neigh->nud_state = NUD_FAILED;
notify = 1;
neigh_invalidate(neigh);
goto out;
}
/*
实际上,neigh_timer_handler处理启用了定时器状态超时的情况,
下图反映了neigh_timer_handler中所涉及的状态转移,
值得注意的是NUD_DELAY -> NUD_REACHABLE的状态转移,
在arp_process中也提到过,收到arp reply时会有表项状态NUD_DELAY -> NUD_REACHABLE。
它们两者的区别在于arp_process处理的是arp的确认报文,
而neigh_timer_handler处理的是4层的确认报文。

*/
if (neigh->nud_state & NUD_IN_TIMER) {
if (time_before(next, jiffies + HZ/2))
next = jiffies + HZ/2;
if (!mod_timer(&neigh->timer, next))
neigh_hold(neigh);
}
if (neigh->nud_state & (NUD_INCOMPLETE | NUD_PROBE)) {
neigh_probe(neigh);
} else {
out:
write_unlock(&neigh->lock);
}

if (notify)
neigh_update_notify(neigh, 0);

neigh_release(neigh);
}

neigh_periodic_work NUD_STALE状态的定时函数
     当neigh处于NUD_STALE状态时,此时它等待一段时间,主机引用到它,从而转入NUD_DELAY状态;没有引用,则转入NUD_FAIL,被释放。

不同于NUD_INCOMPLETE、NUD_DELAY、NUD_PROBE、NUD_REACHABLE状态时的定时器,这里使用的异步机制,通过定期触发neigh_periodic_work()来检查NUD_STALE状态。

  在工作最后,再次添加该工作到队列中,并延时1/2 base_reachable_time开始执行,这样,完成了neigh_periodic_work工作每隔1/2 base_reachable_time执行一次。
schedule_delayed_work(&tbl->gc_work, tbl->parms.base_reachable_time >> 1);
      neigh_periodic_work定期执行,但要保证表项不会刚添加就被neigh_periodic_work清理掉,

这里的策略是:gc_staletime大于1/2 base_reachable_time。默认的,gc_staletime = 30,base_reachable_time = 30。

也就是说,neigh_periodic_work会每15HZ执行一次,但表项在NUD_STALE的存活时间是30HZ,这样,保证了每项在最差情况下也有(30 - 15)HZ的生命周期。

/*
* It is random distribution in the interval (1/2)*base...(3/2)*base.
* It corresponds to default IPv6 settings and is not overridable,
* because it is really reasonable choice.

当neigh_periodic_work执行时,首先计算到达时间(reachable_time),其中要注意的是
reachable_time实际取值是1/2 base ~ 2/3 base,而base = base_reachable_time,当表项处于NUD_REACHABLE状态时,
会启动一个定时器,时长为reachable_time,
即一个表项在不被使用时存活时间是1/2 base_reachable_time ~ 2/3 base_reachable_time。
*/

unsigned long neigh_rand_reach_time(unsigned long base)
{
return base ? (prandom_u32() % base) + (base >> 1) : 0;
}

static void neigh_periodic_work(struct work_struct *work)
{
struct neigh_table *tbl = container_of(work, struct neigh_table, gc_work.work);
struct neighbour *n;
struct neighbour __rcu **np;
unsigned int i;
struct neigh_hash_table *nht;

NEIGH_CACHE_STAT_INC(tbl, periodic_gc_runs);

write_lock_bh(&tbl->lock);
nht = rcu_dereference_protected(tbl->nht,
lockdep_is_held(&tbl->lock));

/*
* periodically recompute ReachableTime from random function
*/

if (time_after(jiffies, tbl->last_rand + 300 * HZ)) {
struct neigh_parms *p;
tbl->last_rand = jiffies;
list_for_each_entry(p, &tbl->parms_list, list)
p->reachable_time =
neigh_rand_reach_time(NEIGH_VAR(p, BASE_REACHABLE_TIME));
}

if (atomic_read(&tbl->entries) < tbl->gc_thresh1)
goto out;

for (i = 0 ; i < (1 << nht->hash_shift); i++) {
np = &nht->hash_buckets[i];

while ((n = rcu_dereference_protected(*np,
lockdep_is_held(&tbl->lock))) != NULL) {
unsigned int state;

write_lock(&n->lock);

state = n->nud_state;
if (state & (NUD_PERMANENT | NUD_IN_TIMER)) {
write_unlock(&n->lock);
goto next_elt;
}

if (time_before(n->used, n->confirmed))
n->used = n->confirmed;
/*
它会遍历整个邻居表,每个hash_buckets的每个表项,
如果在gc_staletime内仍未被引用过,则会从邻居表中清除。

*/
if (atomic_read(&n->refcnt) == 1 &&
(state == NUD_FAILED ||
time_after(jiffies, n->used + NEIGH_VAR(n->parms, GC_STALETIME)))) {
*np = n->next;
n->dead = 1;
write_unlock(&n->lock);
neigh_cleanup_and_release(n);
continue;
}
write_unlock(&n->lock);

next_elt:
np = &n->next;
}
/*
* It's fine to release lock here, even if hash table
* grows while we are preempted.
*/
write_unlock_bh(&tbl->lock);
cond_resched();
write_lock_bh(&tbl->lock);
nht = rcu_dereference_protected(tbl->nht,
lockdep_is_held(&tbl->lock));
}
out:
/* Cycle through all hash buckets every BASE_REACHABLE_TIME/2 ticks.
* ARP entry timeouts range from 1/2 BASE_REACHABLE_TIME to 3/2
* BASE_REACHABLE_TIME.
*/
queue_delayed_work(system_power_efficient_wq, &tbl->gc_work,
NEIGH_VAR(&tbl->parms, BASE_REACHABLE_TIME) >> 1);
write_unlock_bh(&tbl->lock);
}

 

 邻居子系统 arp 输入_状态转移图

 

 

arp_announce - INTEGER
Define different restriction levels for announcing the local
source IP address from IP packets in ARP requests sent on
interface:
0 - (default) Use any local address, configured on any interface
1 - Try to avoid local addresses that are not in the target's
subnet for this interface. This mode is useful when target
hosts reachable via this interface require the source IP
address in ARP requests to be part of their logical network
configured on the receiving interface. When we generate the
request we will check all our subnets that include the
target IP and will preserve the source address if it is from
such subnet. If there is no such subnet we select source
address according to the rules for level 2.
2 - Always use the best local address for this target.
In this mode we ignore the source address in the IP packet
and try to select local address that we prefer for talks with
the target host. Such local address is selected by looking
for primary IP addresses on all our subnets on the outgoing
interface that include the target IP address. If no suitable
local address is found we select the first local address
we have on the outgoing interface or on all other interfaces,
with the hope we will receive reply for our request and
even sometimes no matter the source IP address we announce.

The max value from conf/{all,interface}/arp_announce is used.

Increasing the restriction level gives more chance for
receiving answer from the resolved target while decreasing
the level announces more valid sender's information.

arp_ignore - INTEGER
Define different modes for sending replies in response to
received ARP requests that resolve local target IP addresses:
0 - (default): reply for any local target IP address, configured
on any interface
1 - reply only if the target IP address is local address
configured on the incoming interface
2 - reply only if the target IP address is local address
configured on the incoming interface and both with the
sender's IP address are part from same subnet on this interface
3 - do not reply for local addresses configured with scope host,
only resolutions for global and link addresses are replied
4-7 - reserved
8 - do not reply for all local addresses

The max value from conf/{all,interface}/arp_ignore is used
when ARP request is received on the {interface}

arp_accept - BOOLEAN
Define behavior for gratuitous ARP frames who's IP is not
already present in the ARP table:
0 - don't create new entries in the ARP table
1 - create new entries in the ARP table

Both replies and requests type gratuitous arp will trigger the
ARP table to be updated, if this setting is on.

If the ARP table already contains the IP address of the
gratuitous arp frame, the arp table will be updated regardless
if this setting is on or off.

 

http代理服务器(3-4-7层代理)-网络事件库公共组件、内核kernel驱动 摄像头驱动 tcpip网络协议栈、netfilter、bridge 好像看过!!!! 但行好事 莫问前程 --身高体重180的胖子

精彩评论(0)

0 0 举报