lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1679d519016c4984b67eeb510d50e4b4@EX13D11EUB003.ant.amazon.com>
Date:   Sun, 12 Apr 2020 09:37:22 +0000
From:   "Jubran, Samih" <sameehj@...zon.com>
To:     Josh Triplett <josh@...htriplett.org>
CC:     "Machulsky, Zorik" <zorik@...zon.com>,
        "Belgazal, Netanel" <netanel@...zon.com>,
        "Kiyanovski, Arthur" <akiyano@...zon.com>,
        "Tzalik, Guy" <gtzalik@...zon.com>,
        "Bshara, Saeed" <saeedb@...zon.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: Re: [PATCH] ena: Speed up initialization 90x by reducing poll delays

Hi Josh,

I wanted to let you know that we are still looking into your patch. 
After some careful considerations we have decided to set the value of 
ENA_POLL_US to 100us. The rationale behind this choice is that the 
device might take up to 1ms to complete the reset operation and we 
don't want to bombard device. We do agree with most of your patch 
and we will be sending one based on it for review.

Thanks,
Sameeh

> -----Original Message-----
> From: Josh Triplett <josh@...htriplett.org>
> Sent: Friday, March 13, 2020 2:28 PM
> To: Jubran, Samih <sameehj@...zon.com>
> Cc: Machulsky, Zorik <zorik@...zon.com>; Belgazal, Netanel
> <netanel@...zon.com>; Kiyanovski, Arthur <akiyano@...zon.com>;
> Tzalik, Guy <gtzalik@...zon.com>; Bshara, Saeed <saeedb@...zon.com>;
> netdev@...r.kernel.org; linux-kernel@...r.kernel.org
> Subject: RE: [EXTERNAL]Re: [PATCH] ena: Speed up initialization 90x by
> reducing poll delays
> 
> CAUTION: This email originated from outside of the organization. Do not click
> links or open attachments unless you can confirm the sender and know the
> content is safe.
> 
> 
> 
> On Wed, Mar 11, 2020 at 01:24:17PM +0000, Jubran, Samih wrote:
> > Hi Josh,
> >
> > Thanks for taking the time to write this patch. I have faced a bug while
> testing it that I haven't pinpointed yet the root cause of the issue, but it
> seems to me like a race in the netlink infrastructure.
> >
> > Here is the bug scenario:
> > 1. created ac  c5.24xlarge instance in AWS in v_virginia region using
> > the default amazon Linux 2 AMI 2. apply your patch won top of net-next
> > v5.2 and install the kernel (currently I'm able to boot net-next v5.2
> > only, higher versions of net-next suffer from errors during boot time)
> > 3. run "rmmod ena && insmod ena.ko" twice
> >
> > Result:
> > The interface is not in up state
> >
> > Expected result:
> > The interface should be in up state
> >
> > What I know so far:
> > * ena_probe() seems to finish with no errors whatsoever
> > * adding prints / delays to ena_probe() causes the bug to vanish or
> > less likely to occur depending on the amount of delays I add
> > * ena_up() is not called at all when the bug occurs, so it's something
> > to do with netlink not invoking dev_open()
> >
> > Did you face such issues? Do you have any idea what might be causing this?
> 
> I haven't observed anything like this. I didn't test with Amazon Linux 2,
> though.
> 
> To rule out some possibilities, could you try disabling *all* userspace
> networking bits, so that userspace does nothing with a newly discovered
> interface, and then testing again? (The interface wouldn't be "up" in that
> case, but it should still have a link detected.)
> 
> If that works, then I wonder if the userspace used in Amazon Linux 2 might
> have some kind of race where it's still using the previous incarnation of the
> device when you rmmod and insmod? Perhaps the previous delays made it
> difficult or impossible to trigger that race?
> 
> - Josh Triplett

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ