linux-kernel - Re: Revert "gro: Fix legacy path napi

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090325073349.GF25833@elte.hu>
Date:	Wed, 25 Mar 2009 08:33:49 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	David Miller <davem@...emloft.net>
Cc:	herbert@...dor.apana.org.au, r.schwebel@...gutronix.de,
	torvalds@...ux-foundation.org, blaschka@...ux.vnet.ibm.com,
	tglx@...utronix.de, a.p.zijlstra@...llo.nl,
	linux-kernel@...r.kernel.org, kernel@...gutronix.de
Subject: Re: Revert "gro: Fix legacy path napi_complete crash",


* David Miller <davem@...emloft.net> wrote:

> From: Herbert Xu <herbert@...dor.apana.org.au>
> Date: Wed, 25 Mar 2009 08:23:03 +0800
> 
> > On Tue, Mar 24, 2009 at 02:36:22PM -0700, David Miller wrote:
> > >
> > > I think the problem is that we need to do the GRO flush before the
> > > list delete and clearing the NAPI_STATE_SCHED bit.
> > 
> > Well first of all GRO shouldn't even be on in Ingo's case, unless
> > he enabled it by hand with ethtool.  Secondly the only thing that
> > touches the GRO state for the legacy path is process_backlog, and
> > since this is per-cpu, I can't see how another instance can run
> > while the first is still going.
> 
> Right.
> 
> I think the conditions Ingo is running under is that both loopback 
> (using legacy paths) and his NAPI based device (forcedeth) are 
> processing a lot of packets at the same time.
> 
> Another thing that seems to be critical is he can only trigger 
> this on UP, which means that we don't have the damn APIC 
> potentially moving the cpu target of the forcedeth interrupts 
> around.  And this means also that all the processing will be on 
> one cpu's backlog queue only.

I tested the plain revert i sent in the original report overnight 
(with about 12 hours of combined testing time), and all systems held 
up fine. The system that would reproduce the bug within 10-20 
iterations did 210 successful iterations. Other systems held up fine 
too.

So if there's no definitive resolution for the real cause of the 
bug, the plain revert looks like an acceptable interim choice for 
.29.1 - at least as far as my systems go.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/