[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20181116015242.wmm6mooubjnum7qv@gondor.apana.org.au>
Date: Fri, 16 Nov 2018 09:52:42 +0800
From: Herbert Xu <herbert@...dor.apana.org.au>
To: Cong Wang <xiyou.wangcong@...il.com>
Cc: netdev@...r.kernel.org, Tom Herbert <tom@...bertland.com>,
Eric Dumazet <edumazet@...gle.com>
Subject: Re: [Patch net] net: invert the check of detecting hardware RX
checksum fault
On Thu, Nov 15, 2018 at 03:16:02PM -0800, Cong Wang wrote:
> The following evidences indicate this check is likely wrong:
>
> 1. In the assignment "skb->csum_valid = !sum", sum==0 indicates a valid checksum.
>
> 2. __skb_checksum_complete() always returns sum, and TCP packets are dropped
> only when it returns non-zero. So non-zero indicates a failure.
>
> 3. In __skb_checksum_validate_complete(), we have a nearly same check, where
> zero is considered as success.
>
> 4. csum_fold() already does the one’s complement, this indicates 0 should
> be considered as a successful validation.
>
> 5. We have triggered this fault for many times, but InCsumErrors field in
> /proc/net/snmp remains 0.
>
> Base on the above, non-zero should be used as a checksum mismatch.
>
> I tested this with mlx5 driver, no warning or InCsumErrors after 1 hour.
>
> Fixes: fb286bb2990a ("[NET]: Detect hardware rx checksum faults correctly")
> Cc: Herbert Xu <herbert@...dor.apana.org.au>
> Cc: Tom Herbert <tom@...bertland.com>
> Cc: Eric Dumazet <edumazet@...gle.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@...il.com>
> ---
> net/core/datagram.c | 4 ++--
> net/core/dev.c | 2 +-
> 2 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/net/core/datagram.c b/net/core/datagram.c
> index 57f3a6fcfc1e..e542a9a212a7 100644
> --- a/net/core/datagram.c
> +++ b/net/core/datagram.c
> @@ -733,7 +733,7 @@ __sum16 __skb_checksum_complete_head(struct sk_buff *skb, int len)
> __sum16 sum;
>
> sum = csum_fold(skb_checksum(skb, 0, len, skb->csum));
> - if (likely(!sum)) {
> + if (unlikely(sum)) {
> if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
> !skb->csum_complete_sw)
> netdev_rx_csum_fault(skb->dev);
Normally if the hardware's partial checksum is valid then we just
trust it and send the packet along. However, if the partial
checksum is invalid we don't trust it and we will compute the
whole checksum manually which is what ends up in sum.
netdev_rx_csum_fault is meant to warn about the situation where
a packet with a valid checksum (i.e., sum == 0) was given to us
by the hardware with a partial checksum that was invalid.
So changing it to sum here is wrong.
Can you give more information as to how you got the warnings with
mlx5? It sounds like there may be a real bug there because if you
are getting the warning then it means that a packet with an invalid
hardware-computed partial checksum passed the manual check and
was actually valid. This implies that either the hardware or the
driver is broken.
Cheers,
--
Email: Herbert Xu <herbert@...dor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Powered by blists - more mailing lists