[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <IA3PR11MB8986697A94FB36E893C7E87FE5A7A@IA3PR11MB8986.namprd11.prod.outlook.com>
Date: Fri, 5 Dec 2025 08:26:54 +0000
From: "Loktionov, Aleksandr" <aleksandr.loktionov@...el.com>
To: Jesse Brandeburg <jbrandeb@...nel.org>, "netdev@...r.kernel.org"
<netdev@...r.kernel.org>
CC: "Brandeburg, Jesse" <jbrandeburg@...udflare.com>, "Nguyen, Anthony L"
<anthony.l.nguyen@...el.com>, "Keller, Jacob E" <jacob.e.keller@...el.com>,
IWL <intel-wired-lan@...ts.osuosl.org>, "Kitszel, Przemyslaw"
<przemyslaw.kitszel@...el.com>, Andrew Lunn <andrew+netdev@...n.ch>, "David
S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, "Jakub
Kicinski" <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, Brett Creeley
<brett.creeley@...el.com>
Subject: RE: [Intel-wired-lan] [PATCH net v1] ice: stop counting UDP csum
mismatch as rx_errors
> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@...osl.org> On Behalf
> Of Jesse Brandeburg
> Sent: Tuesday, December 2, 2025 12:39 AM
> To: netdev@...r.kernel.org
> Cc: Brandeburg, Jesse <jbrandeburg@...udflare.com>; Nguyen, Anthony L
> <anthony.l.nguyen@...el.com>; Keller, Jacob E
> <jacob.e.keller@...el.com>; IWL <intel-wired-lan@...ts.osuosl.org>;
> Kitszel, Przemyslaw <przemyslaw.kitszel@...el.com>; Andrew Lunn
> <andrew+netdev@...n.ch>; David S. Miller <davem@...emloft.net>; Eric
> Dumazet <edumazet@...gle.com>; Jakub Kicinski <kuba@...nel.org>; Paolo
> Abeni <pabeni@...hat.com>; Brett Creeley <brett.creeley@...el.com>
> Subject: [Intel-wired-lan] [PATCH net v1] ice: stop counting UDP csum
> mismatch as rx_errors
>
> From: Jesse Brandeburg <jbrandeburg@...udflare.com>
>
> Since the beginning, the Intel ice driver has counted receive checksum
> offload mismatches into the rx_errors member of the rtnl_link_stats64
> struct. In ethtool -S these show up as rx_csum_bad.nic.
>
> I believe counting these in rx_errors is fundamentally wrong, as it's
> pretty clear from the comments in if_link.h and from every other
> statistic
> the driver is summing into rx_errors, that all of them would cause a
> "hardware drop" except for the UDP checksum mismatch, as well as the
> fact
> that all the other causes for rx_errors are L2 reasons, and this L4
> UDP
> "mismatch" is an outlier.
>
> A last nail in the coffin is that rx_errors is monitored in production
> and
> can indicate a bad NIC/cable/Switch port, but instead some random
> series of
> UDP packets with bad checksums will now trigger this alert. This false
> positive makes the alert useless and affects us as well as other
> companies.
>
> This packet with presumably a bad UDP checksum is *already* passed to
> the
> stack, just not marked as offloaded by the hardware/driver. If it is
> dropped by the stack it will show up as UDP_MIB_CSUMERRORS.
>
> And one more thing, none of the other Intel drivers, and at least
> bnxt_en
> and mlx5 both don't appear to count UDP offload mismatches as
> rx_errors.
>
> Here is a related customer complaint:
> https://community.intel.com/t5/Ethernet-Products/ice-rx-errros-is-too-
> sensitive-to-IP-TCP-attack-packets-Intel/td-p/1662125
>
> Fixes: 4f1fe43c920b ("ice: Add more Rx errors to netdev's rx_error
> counter")
> Cc: Tony Nguyen <anthony.l.nguyen@...el.com>
> Cc: Jake Keller <jacob.e.keller@...el.com>
> Cc: IWL <intel-wired-lan@...ts.osuosl.org>
> Signed-off-by: Jesse Brandeburg <jbrandeburg@...udflare.com>
> --
> I am sending this to net as I consider it a bug, and it will backport
> cleanly.
> ---
> drivers/net/ethernet/intel/ice/ice_main.c | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/intel/ice/ice_main.c
> b/drivers/net/ethernet/intel/ice/ice_main.c
> index 86f5859e88ef..d004acfa0f36 100644
> --- a/drivers/net/ethernet/intel/ice/ice_main.c
> +++ b/drivers/net/ethernet/intel/ice/ice_main.c
> @@ -6995,7 +6995,6 @@ void ice_update_vsi_stats(struct ice_vsi *vsi)
> cur_ns->rx_errors = pf->stats.crc_errors +
> pf->stats.illegal_bytes +
> pf->stats.rx_undersize +
> - pf->hw_csum_rx_error +
Good day , Jesse
It looks like you remove the single place where the ' hw_csum_rx_error' var is being really used.
What about removing it's declaration and calculation then?
> pf->stats.rx_jabber +
> pf->stats.rx_fragments +
> pf->stats.rx_oversize;
> --
> 2.47.3
Powered by blists - more mailing lists