netdev - Re: Any reason why arp monitor keeps emitting netlink failover events?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <8133.1655252763@famine>
Date:   Tue, 14 Jun 2022 17:26:03 -0700
From:   Jay Vosburgh <jay.vosburgh@...onical.com>
To:     Jonathan Toppins <jtoppins@...hat.com>
cc:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Veaceslav Falico <vfalico@...il.com>,
        Andy Gospodarek <andy@...yhouse.net>,
        Hangbin Liu <liuhangbin@...il.com>
Subject: Re: Any reason why arp monitor keeps emitting netlink failover events?

Jonathan Toppins <jtoppins@...hat.com> wrote:

>On 6/14/22 11:29, Jay Vosburgh wrote:
>> Jonathan Toppins <jtoppins@...hat.com> wrote:
>> 
>>> On net-next/master from today, I see netlink failover events being emitted
>>>from an active-backup bond. In the ip monitor dump you can see the bond is
>>> up (according to the link status) but keeps emitting failover events and I
>>> am not sure why. This appears to be an issue also on Fedora 35 and CentOS
>>> 8 kernels. The configuration appears to be correct, though I could be
>>> missing something. Thoughts?
>> 	Anything showing up in the dmesg?  There's only one place that
>> generates the FAILOVER notifier, and it ought to have a corresponding
>> message in the kernel log.
>> 	Also, I note that the link1_1 veth has a lot of failures:
>
>Yes all those failures are created by the setup, I waited about 5 minutes
>before dumping the link info. The failover occurs about every second. Note
>this is just a representation of a physical network so that others can run
>the setup. The script `bond-bz2094911.sh`, easily reproduces the issue
>which I dumped with cat below in the original email.
>
>Here is the kernel log, I have dynamic debug enabled for the entire
>bonding module:

	I set up the test, and added some additional instrumentation to
bond_ab_arp_inspect, and what seems to be going on is that the
dev_trans_start for link1_1 is never updating.  The "down to up"
transition for the ARP monitor only checks last_rx, but the "up to down"
check for the active interface requires both TX and RX recently
("recently" being within the past missed_max * arp_interval).

	This looks to be due to HARD_TX_LOCK not actually locking for
NETIF_F_LLTX devices:

#define HARD_TX_LOCK(dev, txq, cpu) {                           if ((dev->features & NETIF_F_LLTX) == 0) {                      __netif_tx_lock(txq, cpu);                      } else {                                                        __netif_tx_acquire(txq);                        }                                               }

	in combination with

static inline void txq_trans_update(struct netdev_queue *txq)
{
        if (txq->xmit_lock_owner != -1)
                WRITE_ONCE(txq->trans_start, jiffies);
}

	causes the trans_start update to be skipped on veth devices.

	And, sure enough, if I apply the following, the test case
appears to work:

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 466da01ba2e3..2cb833b3006a 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -312,6 +312,7 @@ static bool veth_skb_is_eligible_for_gro(const struct net_device *dev,
 static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct veth_priv *rcv_priv, *priv = netdev_priv(dev);
+	struct netdev_queue *queue = NULL;
 	struct veth_rq *rq = NULL;
 	struct net_device *rcv;
 	int length = skb->len;
@@ -329,6 +330,7 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev)
 	rxq = skb_get_queue_mapping(skb);
 	if (rxq < rcv->real_num_rx_queues) {
 		rq = &rcv_priv->rq[rxq];
+		queue = netdev_get_tx_queue(dev, rxq);
 
 		/* The napi pointer is available when an XDP program is
 		 * attached or when GRO is enabled
@@ -340,6 +342,8 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	skb_tx_timestamp(skb);
 	if (likely(veth_forward_skb(rcv, skb, rq, use_napi) == NET_RX_SUCCESS)) {
+		if (queue)
+			txq_trans_cond_update(queue);
 		if (!use_napi)
 			dev_lstats_add(dev, length);
 	} else {


	I'm not entirely sure this is the best way to get the
trans_start updated in veth, but LLTX devices need to handle it
internally (and others do, e.g., tun).

	Could you test the above and see if it resolves the problem in
your environment as well?

	-J

---
	-Jay Vosburgh, jay.vosburgh@...onical.com