ARP table quiz in Cumulus Linux
The weird situation when you notice that IPv4 get a packet loss, while IPv6 works as expected for the same physical link:
--- 2a02:4780:face:f00d::1 ping statistics --- 1368 packets transmitted, 1368 received, 0% packet loss, time 1367006ms rtt min/avg/max/mdev = 0.217/0.285/0.667/0.045 ms root@sg-nme-leaf1:~# --- 10.0.31.1 ping statistics --- 1366 packets transmitted, 1348 received, 1% packet loss, time 1365005ms rtt min/avg/max/mdev = 0.147/0.227/0.874/0.064 ms root@sg-nme-leaf1:~#
There is a lot of what to check, but this is what I did solving this issue.
First I checked ptmd status which surprised me more. BFD status was failed, but BGP session was UP. That’s because we use a single IPv6 session for both IPv4 and IPv6.
10.0.31.1 is down while 2a02:4780:face:f00d::1 is up. Why? It’s for the same reason (packet loss).
Checking for drops/errors with
ethtool -S swp48 | grep -iE "drop|err|disc" gave nothing except that there is a usual drops count which is normal due ACL, bursty buffer congestions, etc.
If IPv6 works while IPv4 not, it seems related to the ARP table. I double-checked ARP table entries with
ip neigh | wc -l. It was around 6k. Nothing special as well, just a well-worn node.
Unfortunately, Broadcom devices do not have native ASIC monitoring, which could provide me with the stats about the buffers, packets count, queue lengths, etc.
I thought I would inspect buffer congestions or so, but anyway. Continuing.
Running this command in a terminal I noticed host entries drop when packet loss happens:
watch -n 1 'date >> /tmp/host_count.log ; cat /cumulus/switchd/run/route_info/host/count_0 >> /tmp/host_count.log ; tail /tmp/host_count.log' Fri Jul 17 08:14:06 UTC 2020 12702 Fri Jul 17 08:14:07 UTC 2020 9281
Like from 12k to 9k and varying. The maximum is 16k, but that’s not an issue since it’s not hitting nearly 16k.
dmesg is clear. If it would be garbage collection for a stale ARP entries it would be an error message in
dmesg output, like:
kernel: Neighbour table overflow.
Just in case I checked
net.ipv4.neigh.default.gc_thresh1 which was a default 128. Like I mentioned above current ARP entries were around 6k.
128 is the maximum number of ARP entries in the cache. Garbage collection won’t be triggered if below 128. We have 6k. That’s why I noticed a host entries drop when packet loss happens. I double-checked this few times and confirmed.
Lowered gc_thresh1 to slightly less than we have ARP entries fixed the problem.
If you raise gc_thresh1 higher than you have entries, you will probably gonna have lots of FAILED/STALE entries in the neighbor table and fun things could start happening (lots of TCP spurious retransmissions, out of order packets, and so on).
So keep eyes carefully on this ;-)