[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250917055355.GA31577@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net>
Date: Tue, 16 Sep 2025 22:53:55 -0700
From: Erni Sri Satya Vennela <ernis@...ux.microsoft.com>
To: Paolo Abeni <pabeni@...hat.com>
Cc: kys@...rosoft.com, haiyangz@...rosoft.com, wei.liu@...nel.org,
decui@...rosoft.com, andrew+netdev@...n.ch, davem@...emloft.net,
edumazet@...gle.com, kuba@...nel.org, longli@...rosoft.com,
kotaranov@...rosoft.com, horms@...nel.org,
shradhagupta@...ux.microsoft.com, dipayanroy@...ux.microsoft.com,
shirazsaleem@...rosoft.com, rosenp@...il.com,
linux-hyperv@...r.kernel.org, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-rdma@...r.kernel.org
Subject: Re: [PATCH 2/2] net: mana: Add standard counter rx_missed_errors
On Tue, Sep 16, 2025 at 03:22:54PM +0200, Paolo Abeni wrote:
> On 9/15/25 5:58 AM, Erni Sri Satya Vennela wrote:
> > Report standard counter stats->rx_missed_errors
> > using hc_rx_discards_no_wqe from the hardware.
> >
> > Add a dedicated workqueue to periodically run
> > mana_query_gf_stats every 2 seconds to get the latest
> > info in eth_stats and define a driver capability flag
> > to notify hardware of the periodic queries.
> >
> > To avoid repeated failures and log flooding, the workqueue
> > is not rescheduled if mana_query_gf_stats fails.
>
> Can the failure root cause be a "transient" one? If so, this looks like
> a dangerous strategy; is such scenario, AFAICS, stats will be broken
> until the device is removed and re-probed.
>
We are working on using the stats query as a health check for the
hardware and its channel. Even if it fails once, the VF needs to
be reset, similar to a probe. The hardware team also confirmed that even
a one-time or temporary failure needs a VF reset.
- Vennela
> /P
Powered by blists - more mailing lists