lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 25 Mar 2009 08:33:49 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	David Miller <davem@...emloft.net>
Cc:	herbert@...dor.apana.org.au, r.schwebel@...gutronix.de,
	torvalds@...ux-foundation.org, blaschka@...ux.vnet.ibm.com,
	tglx@...utronix.de, a.p.zijlstra@...llo.nl,
	linux-kernel@...r.kernel.org, kernel@...gutronix.de
Subject: Re: Revert "gro: Fix legacy path napi_complete crash",


* David Miller <davem@...emloft.net> wrote:

> From: Herbert Xu <herbert@...dor.apana.org.au>
> Date: Wed, 25 Mar 2009 08:23:03 +0800
> 
> > On Tue, Mar 24, 2009 at 02:36:22PM -0700, David Miller wrote:
> > >
> > > I think the problem is that we need to do the GRO flush before the
> > > list delete and clearing the NAPI_STATE_SCHED bit.
> > 
> > Well first of all GRO shouldn't even be on in Ingo's case, unless
> > he enabled it by hand with ethtool.  Secondly the only thing that
> > touches the GRO state for the legacy path is process_backlog, and
> > since this is per-cpu, I can't see how another instance can run
> > while the first is still going.
> 
> Right.
> 
> I think the conditions Ingo is running under is that both loopback 
> (using legacy paths) and his NAPI based device (forcedeth) are 
> processing a lot of packets at the same time.
> 
> Another thing that seems to be critical is he can only trigger 
> this on UP, which means that we don't have the damn APIC 
> potentially moving the cpu target of the forcedeth interrupts 
> around.  And this means also that all the processing will be on 
> one cpu's backlog queue only.

I tested the plain revert i sent in the original report overnight 
(with about 12 hours of combined testing time), and all systems held 
up fine. The system that would reproduce the bug within 10-20 
iterations did 210 successful iterations. Other systems held up fine 
too.

So if there's no definitive resolution for the real cause of the 
bug, the plain revert looks like an acceptable interim choice for 
.29.1 - at least as far as my systems go.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ