lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKgT0Uf8_Eb0G+k6N4vz69HefM3_oCkj7PB30z3keBCR4HAUnQ@mail.gmail.com>
Date:	Tue, 29 Mar 2016 19:20:27 -0700
From:	Alexander Duyck <alexander.duyck@...il.com>
To:	Jesse Brandeburg <jesse.brandeburg@...il.com>
Cc:	Jeff Kirsher <jeffrey.t.kirsher@...el.com>,
	Dave Miller <davem@...emloft.net>,
	Alexander Duyck <aduyck@...antis.com>,
	NetDEV list <netdev@...r.kernel.org>,
	Neil Horman <nhorman@...hat.com>, sassmann@...hat.com,
	John Greene <jogreene@...hat.com>,
	Jesse Brandeburg <jesse.brandeburg@...el.com>
Subject: Re: [net-next 02/16] i40e/i40evf: Rewrite logic for 8 descriptor per
 packet check

On Tue, Mar 29, 2016 at 5:34 PM, Jesse Brandeburg
<jesse.brandeburg@...il.com> wrote:
> stupid gmail... sent again to the list, sorry for duplicate (but
> formatted better this time)
>
>
> Hey Alex, this patch appears to have caused a regression (probably both
> i40e/i40evf).  Easily reproducible running rds-stress (see the thread titled
> "i40e card Tx resets", the middle post by sowmini has the repro steps,
> ignore the pktgen discussion in the thread.)

I'll see about setting up rds tomorrow.  It should be pretty straight forward.

> I've spent some time trying to figure out the right fix, but keep getting
> stuck in the complicated logic of the function, so I'm sending this in the
> hope you'll take a look.
>
> This is (one example of) the skb that doesn't get linearized:
> skb_print_bits:  > headlen=114 datalen=33824 data=238 mac=238 nh=252
> h=272 gso=1448 frag=17

So the code as I had written it up should be verifying we use no more
than 8 descriptors as that was what the data sheet states.  You might
want to request a spec update if the hardware only supports 7 data
descriptors per segment for a TSO as that isn't what is in the
documentation.

> skb_print_bits: frag[0]: len: 256
> skb_print_bits: frag[1]: len: 48
> skb_print_bits: frag[2]: len: 256
> skb_print_bits: frag[3]: len: 48
> skb_print_bits: frag[4]: len: 256
> skb_print_bits: frag[5]: len: 48
> skb_print_bits: frag[6]: len: 4096
>
> This descriptor ^^^ is the 8th, I believe the hardware mechanism faults on
> this input.

Is this something that was previously being linearized?  It doesn't
seem like it would be.  We are only 8 descriptors in and that 7th
fragment should have flushed the count back to 1 with a remainder.

If you are wanting to trigger a linearize with the updated code you
would just need to update 2 lines.  Replace "I40E_MAX_BUFFER_TXD - 1"
with just I40E_MAX_BUFFER_TXD" in the line where we subtract that
value from nr_frags.  Then remove one of the "sum +=
skb_frag_size(++frag);" lines.  That should update the code so that
you only evaluate 5 buffers at a time instead of 6.  It should force
things to coalesce if we would span 8 or more descriptors instead of 9
or more which is what the code currently does based on what was
required in the EAS.

> I added a print of the sum at each point it is subtracted or added to after
> initially being set, sum7/8 are in the for loop.
>
> skb_print_bits: frag[7]: len: 4096
> skb_print_bits: frag[8]: len: 48
> skb_print_bits: frag[9]: len: 4096
> skb_print_bits: frag[10]: len: 4096
> skb_print_bits: frag[11]: len: 48
> skb_print_bits: frag[12]: len: 4096
> skb_print_bits: frag[13]: len: 4096
> skb_print_bits: frag[14]: len: 48
> skb_print_bits: frag[15]: len: 4096
> skb_print_bits: frag[16]: len: 4096
> __i40e_chk_linearize: sum1: -1399
> __i40e_chk_linearize: sum2: -1143
> __i40e_chk_linearize: sum3: -1095
> __i40e_chk_linearize: sum4: -839
> __i40e_chk_linearize: sum5: -791
> __i40e_chk_linearize: sum7: 3305
> __i40e_chk_linearize: sum8: 3257
> __i40e_chk_linearize: sum7: 7353
> __i40e_chk_linearize: sum8: 7097
> __i40e_chk_linearize: sum7: 7145
> __i40e_chk_linearize: sum8: 7097
> __i40e_chk_linearize: sum7: 11193
> __i40e_chk_linearize: sum8: 10937
> __i40e_chk_linearize: sum7: 15033
> __i40e_chk_linearize: sum8: 14985
> __i40e_chk_linearize: sum7: 15033
>
> This was the descriptors generated from the above:
> d[054] = 0x0000000000000000 0x16a0211400000011
> d[055] = 0x0000000bfbaa40ee 0x000001ca02871640

len 72

> d[056] = 0x0000000584bcd000 0x0000040202871640

len 256

> d[057] = 0x0000000c0bfea9d0 0x000000c202871640

len 48

> d[058] = 0x0000000584bcd100 0x0000040202871640

len 256

> d[059] = 0x0000000c0bfeaa00 0x000000c202871640

len 48

> d[05a] = 0x0000000584bcd200 0x0000040202871640

len 256

> d[05b] = 0x0000000c0bfeaa30 0x000000c202871640

len 48

> d[05c] = 0x000000056d5f0000 0x0000400202871640

len 4096

>From what I can see we are at 8 descriptors and have exceeded 1 MSS.
According to the EAS this is supposed to work.

> d[05d] = 0x000000056d5f1000 0x0000400202871640
> d[05e] = 0x0000000c0bfeaa60 0x000000c202871640
> d[05f] = 0x00000005f2762000 0x0000400202871640
> d[060] = 0x00000005f765e000 0x0000400202871640
> d[061] = 0x0000000c0bfeaa90 0x000000c202871640
> d[062] = 0x0000000574928000 0x0000400202871640
> d[063] = 0x0000000568ba5000 0x0000400202871640
> d[064] = 0x0000000c0bfeaac0 0x000000c202871640
> d[065] = 0x00000005f68cd000 0x0000400202871640
> d[066] = 0x0000000585a2a000 0x0000400202871670
>
>
> On Fri, Feb 19, 2016 at 3:54 AM Jeff Kirsher <jeffrey.t.kirsher@...el.com>
> wrote:
>>
>> From: Alexander Duyck <aduyck@...antis.com>
>>
>> This patch is meant to rewrite the logic for how we determine if we can
>> transmit the frame or if it needs to be linearized.
>>
>> +       /* Initialize size to the negative value of gso_size minus 1.  We
>> +        * use this as the worst case scenerio in which the frag ahead
>> +        * of us only provides one byte which is why we are limited to 6
>> +        * descriptors for a single transmit as the header and previous
>> +        * fragment are already consuming 2 descriptors.
>> +        */
>> +       sum = 1 - gso_size;
>> +
>> +       /* Add size of frags 1 through 5 to create our initial sum */
>> +       sum += skb_frag_size(++frag);
>
>
> I'm pretty sure this code skips frag[0] due to the pre-increment, the bug
> seems to occur in the algorithm because skb->data contains L2/L3/L4 header
> plus some data, and should counts as 1 descriptor for every subsequent sent
> packet, and if the incoming skb has 7 chunks that add up to < MSS then we
> get in trouble.

Yep we were skipping frag 0 on purpose.  There is a comment earlier
that basically states that frag 0 cannot borrow from an earlier
descriptor.  The assumption is that if we have a frag it contains at
least 1 byte.  So if the 6 frags following the first one do not
provide at least gso_size - 1 in data the packet needs to be
linearized.

The code as written was meant to cap us at 8 data descriptors per
packet.  It sounds like we are wanting to reduce that to 7 if I
understand what you are describing correctly.

>> +       sum += skb_frag_size(++frag);
>> +       sum += skb_frag_size(++frag);
>> +       sum += skb_frag_size(++frag);
>> +       sum += skb_frag_size(++frag);
>> +
>> +       /* Walk through fragments adding latest fragment, testing it, and
>> +        * then removing stale fragments from the sum.
>> +        */
>> +       stale = &skb_shinfo(skb)->frags[0];
>> +       for (;;) {
>> +               sum += skb_frag_size(++frag);
>> +
>> +               /* if sum is negative we failed to make sufficient progress */
>> +               if (sum < 0)
>> +                       return true;
>> +
>> +               /* use pre-decrement to avoid processing last fragment */
>> +               if (!--nr_frags)
>> +                       break;
>> +
>> +               sum -= skb_frag_size(++stale);
>
>
> I think this line also skips stale[0]

Right we skip frag 0 and the last fragment in the packet.  The general
idea was that we were preventing us from going over 8 descriptors per
packet so I was only checking 6 fragments starting at the second and
as long as we completed one MSS in those 6 descriptors we were
guaranteed to complete a packet in less than 8 total descriptors.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ