lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <eb427583ff2444dcae18e1e37fb27918@EX13D11EUB003.ant.amazon.com>
Date:   Wed, 11 Mar 2020 13:24:17 +0000
From:   "Jubran, Samih" <sameehj@...zon.com>
To:     "Machulsky, Zorik" <zorik@...zon.com>,
        Josh Triplett <josh@...htriplett.org>
CC:     "Belgazal, Netanel" <netanel@...zon.com>,
        "Kiyanovski, Arthur" <akiyano@...zon.com>,
        "Tzalik, Guy" <gtzalik@...zon.com>,
        "Bshara, Saeed" <saeedb@...zon.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: Re: [PATCH] ena: Speed up initialization 90x by reducing poll
 delays

Hi Josh,

Thanks for taking the time to write this patch. I have faced a bug while testing it that I haven't pinpointed yet the root cause of the issue, but it seems to me like a race in the netlink infrastructure.

Here is the bug scenario:
1. created ac  c5.24xlarge instance in AWS in v_virginia region using the default amazon Linux 2 AMI 
2. apply your patch won top of net-next v5.2 and install the kernel (currently I'm able to boot net-next v5.2 only, higher versions of net-next suffer from errors during boot time)
3. run "rmmod ena && insmod ena.ko" twice

Result:
The interface is not in up state

Expected result:
The interface should be in up state

What I know so far:
* ena_probe() seems to finish with no errors whatsoever
* adding prints / delays to ena_probe() causes the bug to vanish or less likely to occur depending on the amount of delays I add
* ena_up() is not called at all when the bug occurs, so it's something to do with netlink not invoking dev_open()

Did you face such issues? Do you have any idea what might be causing this?

> -----Original Message-----
> From: linux-kernel-owner@...r.kernel.org <linux-kernel-
> owner@...r.kernel.org> On Behalf Of Machulsky, Zorik
> <zorik@...zon.com>
> Sent: Tuesday, March 3, 2020 2:54 AM
> To: Josh Triplett <josh@...htriplett.org>
> Cc: Belgazal, Netanel <netanel@...zon.com>; Kiyanovski, Arthur
> <akiyano@...zon.com>; Tzalik, Guy <gtzalik@...zon.com>; Bshara, Saeed
> <saeedb@...zon.com>; netdev@...r.kernel.org; linux-
> kernel@...r.kernel.org
> Subject: Re: [PATCH] ena: Speed up initialization 90x by reducing poll delays
> 
> 
> 
> On 3/2/20, 4:40 PM, "Josh Triplett" <josh@...htriplett.org> wrote:
> 
> 
>     On Mon, Mar 02, 2020 at 11:16:32PM +0000, Machulsky, Zorik wrote:
>     >
>     > On 2/28/20, 4:29 PM, "Josh Triplett" <josh@...htriplett.org> wrote:
>     >
>     >     Before initializing completion queue interrupts, the ena driver uses
>     >     polling to wait for responses on the admin command queue. The ena
> driver
>     >     waits 5ms between polls, but the hardware has generally finished long
>     >     before that. Reduce the poll time to 10us.
>     >
>     >     On a c5.12xlarge, this improves ena initialization time from 173.6ms to
>     >     1.920ms, an improvement of more than 90x. This improves server boot
> time
>     >     and time to network bringup.
>     >
>     > Thanks Josh,
>     > We agree that polling rate should be increased, but prefer not to do it
> aggressively and blindly.
>     > For example linear backoff approach might be a better choice. Please let
> us re-work a little this
>     > patch and bring it to review. Thanks!
> 
>     That's fine, as long as it has the same net improvement on boot time.
> 
>     I'd appreciate the opportunity to test any alternate approach you might
>     have.
> 
>     (Also, as long as you're working on this, you might wish to make a
>     similar change to the EFA driver, and to the FreeBSD drivers.)
> 
> Absolutely! Already forwarded this to the owners of these drivers.  Thanks!
> 
>     >     Before:
>     >     [    0.531722] calling  ena_init+0x0/0x63 @ 1
>     >     [    0.531722] ena: Elastic Network Adapter (ENA) v2.1.0K
>     >     [    0.531751] ena 0000:00:05.0: Elastic Network Adapter (ENA) v2.1.0K
>     >     [    0.531946] PCI Interrupt Link [LNKD] enabled at IRQ 11
>     >     [    0.547425] ena: ena device version: 0.10
>     >     [    0.547427] ena: ena controller version: 0.0.1 implementation version
> 1
>     >     [    0.709497] ena 0000:00:05.0: Elastic Network Adapter (ENA) found at
> mem febf4000, mac addr 06:c4:22:0e:dc:da, Placement policy: Low Latency
>     >     [    0.709508] initcall ena_init+0x0/0x63 returned 0 after 173616 usecs
>     >
>     >     After:
>     >     [    0.526965] calling  ena_init+0x0/0x63 @ 1
>     >     [    0.526966] ena: Elastic Network Adapter (ENA) v2.1.0K
>     >     [    0.527056] ena 0000:00:05.0: Elastic Network Adapter (ENA) v2.1.0K
>     >     [    0.527196] PCI Interrupt Link [LNKD] enabled at IRQ 11
>     >     [    0.527211] ena: ena device version: 0.10
>     >     [    0.527212] ena: ena controller version: 0.0.1 implementation version
> 1
>     >     [    0.528925] ena 0000:00:05.0: Elastic Network Adapter (ENA) found at
> mem febf4000, mac addr 06:c4:22:0e:dc:da, Placement policy: Low Latency
>     >     [    0.528934] initcall ena_init+0x0/0x63 returned 0 after 1920 usecs
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ