linux-kernel - Re: [Bug #14141] order 2 page allocation failures in iwlagn

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <200910141510.11059.elendil@planet.nl>
Date:	Wed, 14 Oct 2009 15:10:08 +0200
From:	Frans Pop <elendil@...net.nl>
To:	Mel Gorman <mel@....ul.ie>
Cc:	David Rientjes <rientjes@...gle.com>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	"Rafael J. Wysocki" <rjw@...k.pl>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Kernel Testers List <kernel-testers@...r.kernel.org>,
	Pekka Enberg <penberg@...helsinki.fi>,
	Reinette Chatre <reinette.chatre@...el.com>,
	Bartlomiej Zolnierkiewicz <bzolnier@...il.com>,
	Karol Lewandowski <karol.k.lewandowski@...il.com>,
	Mohamed Abbas <mohamed.abbas@...el.com>,
	"John W. Linville" <linville@...driver.com>, linux-mm@...ck.org
Subject: Re: [Bug #14141] order 2 page allocation failures in iwlagn

On Wednesday 14 October 2009, Mel Gorman wrote:
> I think this is very significant. Either that change needs to be backed
> out or more likely, __GFP_NOWARN needs to be specified and warnings
> *only* printed when the RX buffers are really low. My expectation would
> be that some GFP_ATOMIC allocations fail during refill but the fact they
> fail wakes kswapd to reclaim order-2 pages while the RX buffers in the
> pool are consumed.

Sorry I did not actually mention this, but the SKB failures I get with .32 
have loads of the "Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 
free buffers remaining." errors. That's why I don't think your patch will 
help anything.

zgrep "Only 0 free buffers remaining" /var/log/kern.log* | wc -l
84

OK, they are all GPF_ATOMIC and not GPF_KERNEL, but they also almost all 
have "0 free buffers"! Next to the 84 warnings for 0 remaining I only have 
one with "3 free buffers" and one with "1 free buffers".

And that does not even count the rate limitting:
Oct 12 20:15:07 aragorn kernel: __ratelimit: 45 callbacks suppressed
Oct 12 20:25:19 aragorn kernel: __ratelimit: 27 callbacks suppressed
Oct 12 20:25:20 aragorn kernel: __ratelimit: 2 callbacks suppressed

Attached the kernel log for one test I did with .32.

> > In both cases I no longer get SKB errors, but instead (?) I get
> > firmware errors:
> > iwlagn 0000:10:00.0: Microcode SW error detected.  Restarting
> > 0x2000000.
>
> I am no wireless expert, but that looks like an separate problem to me.
> I don't see how an allocation failure could trigger errors in the
> microcode.

Yes, it is a separate problem, but it is still significant that reverting 
that patch triggers them in the extreme swap situation.

> > With your patch on .32-rc4 I still get the SKB errors, so it does not
> > seem to help. The only change there may have been is that the desktop
> > was frozen longer than without the patch, but that is an impression,
> > not a hard fact.
>
> Actually, that's fairly interesting and I think justifies pushing the
> patch. Direct reclaim can stall processes in a user-visible manner which
> kswapd is meant to avoid in the majority of cases but is tricky to
> quantify without instrumenting the kernel to measure direct reclaim
> frequency and latency (I have WIP tracepoints for this but it's still a
> WIP). If you notice shorter stalls with the patch applied, it means that
> kswapd really did need to be informed of the problems.

No, I thought I saw _longer_ stalls with your patch applied...

> There still has not been a mm-change identified that makes fragmentation
> significantly worse.

My bisection shows a very clear point, even if not an individual commit, in 
the 'akpm' merge where SKB errors suddenly become *much* more frequent and 
easy to trigger.
I'm sorry to say this, but the fact that nothing has been identified yet is 
IMO the result of a lack of effort, not because there is no such change.

> The majority of the wireless reports have been in 
> this driver and I think we have the problem commit there. The only other
> is a firmware loading problem in e100 after resume that fails to make an
> atomic order-5 fail.

Not exactly true. Bartlomiej's report was about ipw2200, so there are at 
least 3 different drivers involved, two wireless and one wired. Besides 
that one report is related to heavy swap, one to resume and one to driver 
reload.
So it's much more likely that there is some common regression (in mm) that 
affected all three than that there are three unrelated regressions.
And although both of the others did extremely high allocations, they both 
started appearing in the same timeframe. And Bart's very first report 
linked it to mm changes.

> It's possible that something has changed in resume 
> in the 2.6.31 window there - maybe something like drivers now reload
> during resume where they didn't previously or less memory being pushed
> to swap during resume.

IMO you're sticking your head in the sand here. 
I'm not saying that mm is the only issue here, but I'm convinced that there 
_is_ an mm change that has contributed in a major way to these issues, 
even if we've not yet been able to identify it.

> -			    net_ratelimit())
> +			    net_ratelimit()) {
>  				IWL_CRIT(priv, "Failed to allocate SKB buffer with %s. Only %u free
> buffers remaining.\n", priority == GFP_ATOMIC ?  "GFP_ATOMIC" :
> "GFP_KERNEL",

Haven't you broken the test 'priority == GFP_ATOMIC' here by setting 
priority to GFP_ATOMIC|__GFP_NOWARN?

Cheers,
FJP


View attachment "kern.log" of type "text/x-log" (114901 bytes)