[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20120109172548.GJ27881@csn.ul.ie>
Date: Mon, 9 Jan 2012 17:25:48 +0000
From: Mel Gorman <mel@....ul.ie>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: Russell King - ARM Linux <linux@....linux.org.uk>,
KOSAKI Motohiro <kosaki.motohiro@...il.com>,
Gilad Ben-Yossef <gilad@...yossef.com>,
linux-kernel@...r.kernel.org, Chris Metcalf <cmetcalf@...era.com>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Frederic Weisbecker <fweisbec@...il.com>, linux-mm@...ck.org,
Pekka Enberg <penberg@...nel.org>,
Matt Mackall <mpm@...enic.com>,
Sasha Levin <levinsasha928@...il.com>,
Rik van Riel <riel@...hat.com>,
Andi Kleen <andi@...stfloor.org>,
Alexander Viro <viro@...iv.linux.org.uk>,
linux-fsdevel@...r.kernel.org, Avi Kivity <avi@...hat.com>
Subject: Re: [PATCH v5 7/8] mm: Only IPI CPUs to drain local pages if they
exist
On Thu, Jan 05, 2012 at 03:19:19PM -0800, Andrew Morton wrote:
> On Thu, 5 Jan 2012 22:31:06 +0000
> Mel Gorman <mel@....ul.ie> wrote:
>
> > On Thu, Jan 05, 2012 at 02:06:45PM -0800, Andrew Morton wrote:
> > > On Thu, 5 Jan 2012 16:17:39 +0000
> > > Mel Gorman <mel@....ul.ie> wrote:
> > >
> > > > mm: page allocator: Guard against CPUs going offline while draining per-cpu page lists
> > > >
> > > > While running a CPU hotplug stress test under memory pressure, I
> > > > saw cases where under enough stress the machine would halt although
> > > > it required a machine with 8 cores and plenty memory. I think the
> > > > problems may be related.
> > >
> > > When we first implemented them, the percpu pages in the page allocator
> > > were of really really marginal benefit. I didn't merge the patches at
> > > all for several cycles, and it was eventually a 49/51 decision.
> > >
> > > So I suggest that our approach to solving this particular problem
> > > should be to nuke the whole thing, then see if that caused any
> > > observeable problems. If it did, can we solve those problems by means
> > > other than bringing the dang things back?
> > >
> >
> > Sounds drastic.
>
> Wrong thinking ;)
>
:)
> Simplifying the code should always be the initial proposal. Adding
> more complexity on top is the worst-case when-all-else-failed option.
> Yet we so often reach for that option first :(
>
Enngghh, I really want to agree with you but reducing lock contention
has been such an important goal for a long time that I am really loathe
to just rip it out and hope for the best.
> > It would be less controversial to replace this patch
> > with a version that calls get_online_cpu() in drain_all_pages() but
> > remove the call to drain_all_pages() call from the page allocator on
> > the grounds it is not safe against CPU hotplug and to hell with the
> > slightly elevated allocation failure rates and stalls. That would avoid
> > the try_get_online_cpus() crappiness and be less complex.
>
> If we can come up with a reasonably simple patch which improves or even
> fixes the problem then I suppose there is some value in that, as it
> provides users of earlier kernels with something to backport if they
> hit problems.
>
I'm preparing a patch that is a simplier fix but not sending an IPI at
all. There is also a sysfs fix that is necessary for tests to complete
successfully. The details will be in the series.
> But the social downside of that is that everyone would shuffle off
> towards other bright and shiny things and we'd be stuck with more
> complexity piled on top of dubiously beneficial code.
>
> > If you really want to consider deleting the per-cpu allocator, maybe
> > it could be a LSF/MM topic?
>
> eek, spare me.
>
It was worth a shot.
> Anyway, we couldn't discuss such a topic without data. Such data would
> be obtained by deleting the code and measuring the results. Which is
> what I just said ;)
>
Crap. ok. I've added a TODO list to implement a patch that removes it.
It is at a lower priority than removing lumpy reclaim though -
eventally this TODO list will start shrinking. I'll need to put
some thought into how it can be tested but even then I probably am
not the best person to test it. I don't have regular access to a 2+
socket machine to test NUMA effects for example.
> > Personally I would be wary of deleting
> > it but mostly because I lack regular access to the type of hardware
> > to evaulate whether it was safe to remove or not. Minimally, removing
> > the per-cpu allocator could make the zone lock very hot even though slub
> > probably makes it very hot already.
>
> Much of the testing of the initial code was done on mbligh's weirdass
> NUMAq box: 32-way 386 NUMA which suffered really badly if there were
> contention issues. And even on that box, the code was marginal. So
> I'm hopeful that things will be similar on current machines. Of
> course, it's possible that calling patterns have changed in ways which
> make the code more beneficial than it used to be.
>
Core counts are also higher and some workloads might be more
allocator intensive than they used to be - netperf and network-related
allocations for socket receive might be a problem for example.
> But this all ties into my proposal yesterday to remove
> mm/swap.c:lru_*_pvecs. Most or all of the heavy one-page-at-a-time
> code can pretty easily be converted to operate on batches of pages.
>
> Folowing on from that, it should be pretty simple to extend the
> batching down into the page freeing. Look at put_pages_list() and
> weep. And stuff like free_hot_cold_page_list() which could easily free
> the pages directly whilebatching the locking.
>
> Page freeing should be relatively straightforward. Batching page
> allocation is hard in some cases (anonymous pagefaults).
>
Page faulting would certainly be hard to batch but it would only be
really a big problem if they are intensive enough and on enough CPUs to
cause zone lock contention that was a problem.
> Please do note that the above suggestions are only needed if removing
> the pcp lists causes a problem! It may not.
>
True.
--
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists