lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <200910132238.40867.elendil@planet.nl>
Date:	Tue, 13 Oct 2009 22:38:37 +0200
From:	Frans Pop <elendil@...net.nl>
To:	Mel Gorman <mel@....ul.ie>
Cc:	David Rientjes <rientjes@...gle.com>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	"Rafael J. Wysocki" <rjw@...k.pl>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Kernel Testers List <kernel-testers@...r.kernel.org>,
	Pekka Enberg <penberg@...helsinki.fi>,
	Reinette Chatre <reinette.chatre@...el.com>,
	Bartlomiej Zolnierkiewicz <bzolnier@...il.com>,
	Karol Lewandowski <karol.k.lewandowski@...il.com>,
	Mohamed Abbas <mohamed.abbas@...el.com>,
	"John W. Linville" <linville@...driver.com>, linux-mm@...ck.org
Subject: Re: [Bug #14141] order 2 page allocation failures in iwlagn

On Monday 12 October 2009, Frans Pop wrote:
> On Monday 12 October 2009, Mel Gorman wrote:
> > but after some digging around in this general area, I saw this patch
> >
> > 4752c93c30 iwlcore: Allow skb allocation from tasklet
>
> That is v2.6.30-rc6-773-g4752c93, which is part of the first wireless
> merge I tested and where I saw no issues. But see below.
>
> > This patch increases the number of GFP_ATOMIC allocations that can
> > occur by allocating GFP_ATOMIC in some cases and GFP_KERNEL in others.
> > Previously, only GFP_KERNEL was used and I didn't realise this
> > allocation method was so recent. Problems of this sort have cropped up
> > before and while there are later changes that suppress some of these
> > warnings, I believe this is a strong candidate for where the
> > allocation failures started appearing.

I have tried reverting this patch and that does make a significant 
difference, but the results are still not really conclusive.
I tested the revert on top of:
- the first net-next-2.6 merge (2ed0e21), i.e. before the mm merge
- 2.6.31.1

In both cases I no longer get SKB errors, but instead (?) I get firmware 
errors:
iwlagn 0000:10:00.0: Microcode SW error detected.  Restarting 0x2000000.

So on the wireless side it does look as if there is more than one change 
involved. Remember that with .30 I don't get any errors, only relatively 
mild latencies and skips in the music.

> I really do think that v2.6.30-rc6-1032-g7ba10a8 could play a role here.
> That's a fix for v2.6.30-rc1-1131-gaa837e1. So that bug was introduced
> _before_ the merge 82d0481 and may thus well explain both the latencies
> I saw _and_ why that merge tested without problems. And that would also
> go a long way to explain my test results.
> So I'm going to retest 82d0481 with 7ba10a8 cherry-picked on top.
                         ^^^^^^^-- should be 45ea4ea

I've tried this but still don't get any SKB errors, so that bug does not 
seem to make a difference.

> > > BISECTION of akpm (mm) MERGE
> > > ----------------------------
> > While I didn't spot anything too out of the ordinary here, they did
> > occur shortly after a number of other page allocator related patches.
> > One small thing I noticed there is that kswapd is getting woken up
> > less now than it did previously. Generally, I wouldn't have expected
> > it to make a difference but it's possible that kswapd is not being
> > woken up to reclaim at a higher order than it was previously. I have a
> > patch for this below. It'd be nice if you could apply it and see do
> > fewer allocation failures occur on current mainline.
>
> I'll give that patch a try and report back.

With your patch on .32-rc4 I still get the SKB errors, so it does not seem 
to help. The only change there may have been is that the desktop was 
frozen longer than without the patch, but that is an impression, not a 
hard fact.


Although identifying the problem on the wireless side is important, I still 
feel that tracing the mm change should have priority as it influences much 
more than just iwlagn, as the other reports prove.

> > After that, could you edit drivers/net/wireless/iwlwifi/iwl-rx.c and
> > make the GFP_ATOMIC in iwl_rx_replenish_now() GFP_ATOMIC|__GFP_NOWARN
> > and see do any of the "serious" allocation failure messages appear.

For the above reason I've not yet tried this. It seems to me that this 
change will not really solve anything, but just suppress errors.

Cheers,
FJP
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ