lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 16 Nov 2018 12:06:00 -0800
From:   Cong Wang <xiyou.wangcong@...il.com>
To:     Herbert Xu <herbert@...dor.apana.org.au>
Cc:     Linux Kernel Network Developers <netdev@...r.kernel.org>,
        Tom Herbert <tom@...bertland.com>,
        Eric Dumazet <edumazet@...gle.com>
Subject: Re: [Patch net] net: invert the check of detecting hardware RX
 checksum fault

On Thu, Nov 15, 2018 at 8:50 PM Herbert Xu <herbert@...dor.apana.org.au> wrote:
>
> On Thu, Nov 15, 2018 at 06:23:38PM -0800, Cong Wang wrote:
> >
> > > Normally if the hardware's partial checksum is valid then we just
> > > trust it and send the packet along.  However, if the partial
> > > checksum is invalid we don't trust it and we will compute the
> > > whole checksum manually which is what ends up in sum.
> >
> > Not sure if I understand partial checksum here, but it is the
> > CHECKSUM_COMPLETE case which I am trying to fix, not
> > CHECKSUM_PARTIAL.
>
> What I meant by partial checksum is the checksum produced by the
> hardware on RX.  In the kernel we call that CHECKSUM_COMPLETE.
> CHECKSUM_PARTIAL is the absence of the substantial part of the
> checksum which is something we use in the kernel primarily for TX.
>
> Yes the names are confusing :)

Yeah, understood. The hardware provides skb->csum in this case, but
we keep adjusting it each time when we change skb->data.


>
> > So, in other word, a checksum *match* is the intended to detect
> > this HW RX checksum fault?
>
> Correct.  Or more likely it's probably a bug in either the driver
> or if there are overlaying code such as VLAN then in that code.
>
> Basically if the RX checksum is buggy, it's much more likely to
> cause a valid packet to be rejected than to cause an invalid packet
> to be accepted, because we still verify that checksum against the
> pseudoheader.  So we only attempt to catch buggy hardware/drivers
> by doing a second manual verification for the case where the packet
> is flagged as invalid.

Hmm, now I see how it works. Actually it uses the differences between
these two check's as the difference between hardware checksum with
skb_checksum().

I will send a patch to add a comment there to avoid confusion.


>
> > Sure, my case is nearly same with Pawel's, except I have no vlan:
> > https://marc.info/?l=linux-netdev&m=154086647601721&w=2
>
> Can you please provide your backtrace?

I already did:
https://marc.info/?l=linux-netdev&m=154092211305599&w=2

Note, the offending commit has been backported to 4.14, which
is why I saw this warning. I have no idea why it is backported
from the beginning, it is just an optimization, doesn't fix any bug,
IMHO.

Also, it is much harder for me to reproduce it than Pawel who
saw the warning every second. Sometimes I need 1 hour to trigger
it, sometimes other people here needs 10+ hours to trigger it.

Let me see if I can add vlan on my side to make it more reproducible,
it seems hard as our switch doesn't use vlan either.

We have warnings with conntrack involved too, I can provide it too
if you are interested.

I tend to revert it for -stable, at least that is what I plan to do
on my side unless there is a fix coming soon.

Thanks.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ