lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170721160937.GA22632@csclub.uwaterloo.ca>
Date:   Fri, 21 Jul 2017 12:09:37 -0400
From:   lsorense@...lub.uwaterloo.ca (Lennart Sorensen)
To:     Benjamin Poirier <bpoirier@...e.com>
Cc:     linux-kernel@...r.kernel.org, netdev@...r.kernel.org,
        intel-wired-lan@...ts.osuosl.org,
        Jeff Kirsher <jeffrey.t.kirsher@...el.com>
Subject: Re: commit 16ecba59 breaks 82574L under heavy load.

On Fri, Jul 21, 2017 at 11:27:09AM -0400,  wrote:
> On Thu, Jul 20, 2017 at 04:44:55PM -0700, Benjamin Poirier wrote:
> > Could you please test the following patch and let me know if it:
> > 1) reduces the interrupt rate of the Other msi-x vector
> > 2) avoids the link flaps
> > or
> > 3) logs some dmesg warnings of the form "Other interrupt with unhandled [...]"
> > In this case, please paste icr values printed.
> 
> I will give it a try.

So test looks excellent.  Seems to only get interrupts when link state
actually changes now.

> Another odd behaviour I see is that the driver will hang in
> napi_synchronize on shutdown if there is traffic at the time (at least
> I think that's the trigger, maybe the trigger is if there has been an
> overload of traffic and the backlog in napi was used).
> 
> From doing some searching, this seems to be a problem that has plagued
> some people for years with this driver.
> 
> I am having trouble figuring out exactly what napi_synchronize is waiting
> for and who is supposed to toggle the flag it is waiting on.  The flag
> appears to work backwards from what I would have expected it to do.
> I see lots of places that can set the bit, but only napi_enable seems
> to clear it again, and I don't see how that would get called for all
> the places that potentially set the bit.

I just realized NAPI_STATE_SCHED and NAPIF_STATE_SCHED are the same
thing and I need to look at both of those.

Still something seems odd in some corner case where napi gets stuck and
you can't close the port anymore due to napi_synchronize never being
able to finish.  Some traffic pattern causes that SCHED state bit to
get into the wrong state and nothing ever clears it.  Even managed to
see it get stuck so it never passed traffic again and hung on shutdown.
The napi poll was never called again.

-- 
Len Sorensen

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ