lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 24 Mar 2009 23:01:11 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	David Miller <davem@...emloft.net>
Cc:	herbert@...dor.apana.org.au, r.schwebel@...gutronix.de,
	torvalds@...ux-foundation.org, blaschka@...ux.vnet.ibm.com,
	tglx@...utronix.de, a.p.zijlstra@...llo.nl,
	linux-kernel@...r.kernel.org, kernel@...gutronix.de
Subject: Re: Revert "gro: Fix legacy path napi_complete crash",


* David Miller <davem@...emloft.net> wrote:

> From: Ingo Molnar <mingo@...e.hu>
> Date: Tue, 24 Mar 2009 21:54:44 +0100
> 
> > * Ingo Molnar <mingo@...e.hu> wrote:
> > 
> > > > Same forcedeth box i reported before. Config below. (note: if 
> > > > you want to use it you need to run it through 'make oldconfig', 
> > > > with all defaults accepted)
> > > 
> > > Hm, i just had a test failure (hung interface) with this too.
> > > 
> > > I'll go back to the original straight revert of "303c6a0: gro: Fix 
> > > legacy path napi_complete crash", and will test it overnight - to 
> > > establish a baseline of stability again. (to make sure there are 
> > > no other bugs interacting)
> > 
> > FYI, this plain revert is holding up fine in my tests so far - 50 
> > random iterations - the previous one failed after 5 iterations.
> 
> Something must be up with respect to letting interrupts in during 
> certain windows of time, or similar.
> 
> I'll take a look at this and hopefully Herbert or myself will be 
> able to figure it out.

It definitely did not show usual patterns of bug behavior - i'd have 
found it yesterday morning if it did.

I spent most of the time trying to find a reliable reproducer 
.config and system. Sometimes the bug went away with a minor change 
in the .config. Until today i didnt even suspect a mainline change 
causing this.

Also, note that i have reduced the probability of UP kernels in my 
randconfigs artificially to about 12.5% (it is 50% upstream). Still, 
despite that measure, the 'best' .config i found was an UP config - 
i dont think that's an accident. Also, i had to fully saturate the 
target CPU over gigabit to hit the bug best.

Which suggests to me (empirically) that it's indeed a race and that 
it needs a saturated system with lots of IRQs to trigger, and 
perhaps that it needs saturated/overloaded network device queues and 
complex userspace/softirq/hardirq interactions.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ