lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Mon, 16 Oct 2017 09:11:30 -0700
From:   Alexander Duyck <alexander.duyck@...il.com>
To:     "Neftin, Sasha" <sasha.neftin@...el.com>
Cc:     David Laight <David.Laight@...lab.com>,
        Jeff Kirsher <jeffrey.t.kirsher@...el.com>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "nhorman@...hat.com" <nhorman@...hat.com>,
        "sassmann@...hat.com" <sassmann@...hat.com>,
        "jogreene@...hat.com" <jogreene@...hat.com>
Subject: Re: [net-next 6/9] e1000e: fix buffer overrun while the I219 is
 processing DMA transactions

On Mon, Oct 16, 2017 at 3:24 AM, Neftin, Sasha <sasha.neftin@...el.com> wrote:
> On 10/11/2017 12:07, David Laight wrote:
>>
>> From: Jeff Kirsher
>>>
>>> Sent: 10 October 2017 18:22
>>> Intel 100/200 Series Chipset platforms reduced the round-trip
>>> latency for the LAN Controller DMA accesses, causing in some high
>>> performance cases a buffer overrun while the I219 LAN Connected
>>> Device is processing the DMA transactions. I219LM and I219V devices
>>> can fall into unrecovered Tx hang under very stressfully UDP traffic
>>> and multiple reconnection of Ethernet cable. This Tx hang of the LAN
>>> Controller is only recovered if the system is rebooted. Slightly slow
>>> down DMA access by reducing the number of outstanding requests.
>>> This workaround could have an impact on TCP traffic performance
>>> on the platform. Disabling TSO eliminates performance loss for TCP
>>> traffic without a noticeable impact on CPU performance.
>>>
>>> Please, refer to I218/I219 specification update:
>>> https://www.intel.com/content/www/us/en/embedded/products/networking/
>>> ethernet-connection-i218-family-documentation.html
>>>
>>> Signed-off-by: Sasha Neftin <sasha.neftin@...el.com>
>>> Reviewed-by: Dima Ruinskiy <dima.ruinskiy@...el.com>
>>> Reviewed-by: Raanan Avargil <raanan.avargil@...el.com>
>>> Tested-by: Aaron Brown <aaron.f.brown@...el.com>
>>> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@...el.com>
>>> ---
>>>   drivers/net/ethernet/intel/e1000e/netdev.c | 8 +++++---
>>>   1 file changed, 5 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c
>>> b/drivers/net/ethernet/intel/e1000e/netdev.c
>>> index ee9de3500331..14b096f3d1da 100644
>>> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
>>> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
>>> @@ -3021,8 +3021,8 @@ static void e1000_configure_tx(struct e1000_adapter
>>> *adapter)
>>>
>>>         hw->mac.ops.config_collision_dist(hw);
>>>
>>> -       /* SPT and CNP Si errata workaround to avoid data corruption */
>>> -       if (hw->mac.type >= e1000_pch_spt) {
>>> +       /* SPT and KBL Si errata workaround to avoid data corruption */
>>> +       if (hw->mac.type == e1000_pch_spt) {
>>>                 u32 reg_val;
>>>
>>>                 reg_val = er32(IOSFPC);
>>> @@ -3030,7 +3030,9 @@ static void e1000_configure_tx(struct e1000_adapter
>>> *adapter)
>>>                 ew32(IOSFPC, reg_val);
>>>
>>>                 reg_val = er32(TARC(0));
>>> -               reg_val |= E1000_TARC0_CB_MULTIQ_3_REQ;
>>> +               /* SPT and KBL Si errata workaround to avoid Tx hang */
>>> +               reg_val &= ~BIT(28);
>>> +               reg_val |= BIT(29);
>>
>> Shouldn't some more of the commit message about what this is doing
>> be in the comment?
>
> There is provided link on specification update:
> https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/i218-i219-ethernet-connection-spec-update.pdf?asset=9561.
> This is Intel's public edition.
>>
>> And shouldn't the 28 and 28 be named constants?
>
> (28 and 29) you can easy understand from the code that value has been
> changed from 3 to 2. There is no point add flags here I thought.

I have to agree with David. This isn't clear and this is going in the
opposite direction of being clear towards being very murky.

You already had the E1000_TARC0_CB_MULTIQ_3_REQ define. It shouldn't
be hard to come up with a bitmask that defines the full width of the
field you are updating so that you can use that mask to clear out the
value, and then also define a value for "MULTIQ_2_REQ" to replace it
the value you were using before. Assuming we still want to go with
this route.

He also has a point about using netif_set_gso_max_size() to restrict
the GSO size. If that would work for something like this then that
might be the preferred way to go as you wouldn't be introducing the
same type of issues as you currently do in that you are requiring
disabling TSO in order to avoid "performance loss" which in this case
I assume you are only referring to throughput without taking CPU into
account.

- Alex

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ