netdev - Re: Any reason why arp monitor keeps emitting netlink failover events?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2db298d5-4e3d-0e99-6ce7-6a4a0df4bb48@redhat.com>
Date:   Wed, 15 Jun 2022 11:51:16 -0400
From:   Jonathan Toppins <jtoppins@...hat.com>
To:     Jay Vosburgh <jay.vosburgh@...onical.com>
Cc:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Veaceslav Falico <vfalico@...il.com>,
        Andy Gospodarek <andy@...yhouse.net>,
        Hangbin Liu <liuhangbin@...il.com>
Subject: Re: Any reason why arp monitor keeps emitting netlink failover
 events?

On 6/14/22 20:26, Jay Vosburgh wrote:
> Jonathan Toppins <jtoppins@...hat.com> wrote:
> 
>> On 6/14/22 11:29, Jay Vosburgh wrote:
>>> Jonathan Toppins <jtoppins@...hat.com> wrote:
>>>
>>>> On net-next/master from today, I see netlink failover events being emitted
>>> >from an active-backup bond. In the ip monitor dump you can see the bond is
>>>> up (according to the link status) but keeps emitting failover events and I
>>>> am not sure why. This appears to be an issue also on Fedora 35 and CentOS
>>>> 8 kernels. The configuration appears to be correct, though I could be
>>>> missing something. Thoughts?
>>> 	Anything showing up in the dmesg?  There's only one place that
>>> generates the FAILOVER notifier, and it ought to have a corresponding
>>> message in the kernel log.
>>> 	Also, I note that the link1_1 veth has a lot of failures:
>>
>> Yes all those failures are created by the setup, I waited about 5 minutes
>> before dumping the link info. The failover occurs about every second. Note
>> this is just a representation of a physical network so that others can run
>> the setup. The script `bond-bz2094911.sh`, easily reproduces the issue
>> which I dumped with cat below in the original email.
>>
>> Here is the kernel log, I have dynamic debug enabled for the entire
>> bonding module:
> 
> 	I set up the test, and added some additional instrumentation to
> bond_ab_arp_inspect, and what seems to be going on is that the
> dev_trans_start for link1_1 is never updating.  The "down to up"
> transition for the ARP monitor only checks last_rx, but the "up to down"
> check for the active interface requires both TX and RX recently
> ("recently" being within the past missed_max * arp_interval).
> 
> 	This looks to be due to HARD_TX_LOCK not actually locking for
> NETIF_F_LLTX devices:
> 
> #define HARD_TX_LOCK(dev, txq, cpu) {                           if ((dev->features & NETIF_F_LLTX) == 0) {                      __netif_tx_lock(txq, cpu);                      } else {                                                        __netif_tx_acquire(txq);                        }                                               }
> 
> 	in combination with
> 
> static inline void txq_trans_update(struct netdev_queue *txq)
> {
>          if (txq->xmit_lock_owner != -1)
>                  WRITE_ONCE(txq->trans_start, jiffies);
> }
> 
> 	causes the trans_start update to be skipped on veth devices.
> 
> 	And, sure enough, if I apply the following, the test case
> appears to work:
> 
> diff --git a/drivers/net/veth.c b/drivers/net/veth.c
> index 466da01ba2e3..2cb833b3006a 100644
> --- a/drivers/net/veth.c
> +++ b/drivers/net/veth.c
> @@ -312,6 +312,7 @@ static bool veth_skb_is_eligible_for_gro(const struct net_device *dev,
>   static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev)
>   {
>   	struct veth_priv *rcv_priv, *priv = netdev_priv(dev);
> +	struct netdev_queue *queue = NULL;
>   	struct veth_rq *rq = NULL;
>   	struct net_device *rcv;
>   	int length = skb->len;
> @@ -329,6 +330,7 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev)
>   	rxq = skb_get_queue_mapping(skb);
>   	if (rxq < rcv->real_num_rx_queues) {
>   		rq = &rcv_priv->rq[rxq];
> +		queue = netdev_get_tx_queue(dev, rxq);
>   
>   		/* The napi pointer is available when an XDP program is
>   		 * attached or when GRO is enabled
> @@ -340,6 +342,8 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev)
>   
>   	skb_tx_timestamp(skb);
>   	if (likely(veth_forward_skb(rcv, skb, rq, use_napi) == NET_RX_SUCCESS)) {
> +		if (queue)
> +			txq_trans_cond_update(queue);
>   		if (!use_napi)
>   			dev_lstats_add(dev, length);
>   	} else {
> 
> 
> 	I'm not entirely sure this is the best way to get the
> trans_start updated in veth, but LLTX devices need to handle it
> internally (and others do, e.g., tun).
> 
> 	Could you test the above and see if it resolves the problem in
> your environment as well?
> 
> 	-J
> 
> ---
> 	-Jay Vosburgh, jay.vosburgh@...onical.com
> 

Hi Jay,

This patch appears to work, you can apply my tested-by.

Tested-by: Jonathan Toppins <jtoppins@...hat.com>

Now this exposes an easily reproducible bonding issue with 
bond_should_notify_peers() which is every second the bond issues a 
NOTIFY_PEERS event. This notify peers event issue has been observed on 
physical hardware (tg3, i40e, igb) drivers. I have not traced the code 
yet, wanted to point this out. Run the same reproducer script and start 
monitoring the bond;

[root@...ora ~]# ip -ts -o monitor link dev bond0
[2022-06-15T11:30:44.337568] 9: bond0: 
<BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP 
group default event NOTIFY PEERS \    link/ether ce:d3:22:ef:13:d0 brd 
ff:ff:ff:ff:ff:ff
[2022-06-15T11:30:45.361381] 9: bond0: 
<BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP 
group default event NOTIFY PEERS \    link/ether ce:d3:22:ef:13:d0 brd 
ff:ff:ff:ff:ff:ff
[.. trimmed ..]
[2022-06-15T11:30:56.618621] [2022-06-15T11:30:56.622657] 9: bond0: 
<BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP 
group default event NOTIFY PEERS \    link/ether ce:d3:22:ef:13:d0 brd 
ff:ff:ff:ff:ff:ff
[2022-06-15T11:30:57.647644] 9: bond0: 
<BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP 
group default event NOTIFY PEERS \    link/ether ce:d3:22:ef:13:d0 brd 
ff:ff:ff:ff:ff:ff

In another shell take down the active interface:
# ip link set link1_1 down

we get the failover below, as expected.

[2022-06-15T11:30:58.671501] [2022-06-15T11:30:58.671576] 
[2022-06-15T11:30:58.671611] [2022-06-15T11:30:58.671643] 
[2022-06-15T11:30:58.671676] [2022-06-15T11:30:58.671709] 9: bond0: 
<BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP 
group default event BONDING FAILOVER \    link/ether ce:d3:22:ef:13:d0 
brd ff:ff:ff:ff:ff:ff
[2022-06-15T11:30:58.671782] 9: bond0: 
<BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP 
group default event NOTIFY PEERS \    link/ether ce:d3:22:ef:13:d0 brd 
ff:ff:ff:ff:ff:ff
[2022-06-15T11:30:58.676862] 9: bond0: 
<BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP 
group default event NOTIFY PEERS \    link/ether ce:d3:22:ef:13:d0 brd 
ff:ff:ff:ff:ff:ff
[2022-06-15T11:30:58.676948] 9: bond0: 
<BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP 
group default event RESEND IGMP \    link/ether ce:d3:22:ef:13:d0 brd 
ff:ff:ff:ff:ff:ff

Now bring back up link1_1 and notice no more NOTIFY_PEERS event every 
second. The issue stops with the first failover just brought back up the 
primary for completeness.

# ip link set link1_1 up

[2022-06-15T11:31:01.629256] [2022-06-15T11:31:01.630275] 
[2022-06-15T11:31:01.742927] [2022-06-15T11:31:01.742991] 
[2022-06-15T11:31:01.743021] [2022-06-15T11:31:01.743045] 
[2022-06-15T11:31:01.743070] [2022-06-15T11:31:01.743094] 9: bond0: 
<BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP 
group default event BONDING FAILOVER \    link/ether ce:d3:22:ef:13:d0 
brd ff:ff:ff:ff:ff:ff
[2022-06-15T11:31:01.743151] 9: bond0: 
<BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP 
group default event NOTIFY PEERS \    link/ether ce:d3:22:ef:13:d0 brd 
ff:ff:ff:ff:ff:ff
[2022-06-15T11:31:01.746412] 9: bond0: 
<BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP 
group default event RESEND IGMP \    link/ether ce:d3:22:ef:13:d0 brd 
ff:ff:ff:ff:ff:ff

-Jon