[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOtvUMdRa74aLUufHUSowzvk2mZEGTVT+jXm_vp3OSBLipzW-g@mail.gmail.com>
Date: Fri, 30 Dec 2011 22:16:40 +0200
From: Gilad Ben-Yossef <gilad@...yossef.com>
To: Mel Gorman <mgorman@...e.de>
Cc: linux-kernel@...r.kernel.org, Chris Metcalf <cmetcalf@...era.com>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Frederic Weisbecker <fweisbec@...il.com>,
Russell King <linux@....linux.org.uk>, linux-mm@...ck.org,
Pekka Enberg <penberg@...nel.org>,
Matt Mackall <mpm@...enic.com>,
Sasha Levin <levinsasha928@...il.com>,
Rik van Riel <riel@...hat.com>,
Andi Kleen <andi@...stfloor.org>
Subject: Re: [PATCH v4 5/5] mm: Only IPI CPUs to drain local pages if they exist
On Fri, Dec 30, 2011 at 5:04 PM, Mel Gorman <mgorman@...e.de> wrote:
>
> On Sun, Dec 25, 2011 at 11:39:59AM +0200, Gilad Ben-Yossef wrote:
>
>
>
> CONFIG_CPUMASK_OFFSTACK is force enabled if CONFIG_MAXSMP on x86. This
> may be the case for some server-orientated distributions. I know
> SLES enables this option for x86-64 at least. Debian does not but
> might in the future. I don't know about RHEL but it should be checked.
> Either way, we cannot depend on CONFIG_CPUMASK_OFFSTACK being disabled
> (it's enabled on my laptop for example due to the .config it is based
> on). That said, breaking the link between MAXSMP and OFFSTACK may be
> an option.
>
Yes, I know and it is enabled for RHEL as well, I believe.
The point is, MAXSMP is enabled in the enterprise distribution in
order to support
the massively multi-core systems. Reducing cross CPU interference is important
to these very systems.
In fact, since CONFIG_CPUMASK_OFFSTACK has a price on its own, the fact
that distros enable it (via MAXSMP) is proof in my eyes that the distros find
massively multi-core systems important :-)
That being said, the patch only has value if it actually reduces cross
CPU IPI and
does not incur a bigger price, otherwise of course it should be dropped.
>
> > For CONFIG_CPUMASK_OFFSTACK=y but when we got to drain_all_pages from
> > the memory
> > hotplug or the memory failure code path (the code other code path that
> > call drain_all_pages),
> > there is no inherent memory pressure, so we should be OK.
> >
>
> It's the memory failure code path after direct reclaim failed. How
> can you say there is no inherent memory pressure?
>
Bah.. you are right. Memory allocation will cause memory migration to
the remaining active memory areas, so yes, it's a memory pressure.
Point taken. My bad.
>
> > The thing is, if you are at CPUMASK_OFFSTACK=y, you are saying
> > that you optimize for the large number of CPU case, otherwise it doesn't
> > make sense - you can represent 32 CPU in the space it takes to
> > hold the pointer to the cpumask (on 32bit system) etc.
> >
> > If you are at CPUMASK_OFFSTACK=n you (almost) didn't pay anything.
> >
> <snip>
>
> It's the CPUMASK_OFFSTACK=y case I worry about as it is enabled on
> at least one server-orientated distribution and probably more.
>
Sure, because they care about performance (or even just plain working) on
massively multi-core systems. Something this patch set aims to get to work
better.
>
> > I think of it more of as a CPU isolation feature then pure performance.
> > If you have a system with a couple of dozens of CPUs (Tilera, SGI, Cavium
> > or the various virtual NUMA folks) you tend to want to break up the system
> > into sets of CPUs that work of separate tasks.
> >
>
> Even with the CPUs isolated, how often is it the case that many of
> the CPUs have 0 pages on their per-cpu lists? I checked a bunch of
> random machines just there and in every case all CPUs had at least
> one page on their per-cpu list. In other words I believe that in
> many cases the exact same number of IPIs will be used but with the
> additional cost of allocating a cpumask.
>
A common usage scenario with systems with lots of cores is to isolate
a group of cores to run a (almost) totally CPU bound task to each CPU
of the set. Those tasks rarely call into the kernel, they just crunch numbers
and they end up have 0 per-cpu set more often then you think.
But you are right that it is a specific use case. The question is what is the
cost in other use cases.
>
> <snip>
>
> I'm still generally uncomfortable with the allocator allocating memory
> while it is known memory is tight.
>
Got you.
>
> As a way of mitigating that, I would suggest this is done in two
> passes. The first would check if at least 50% of the CPUs have no pages
> on their per-cpu list. Then and only then allocate the per-cpu mask to
> limit the IPIs. Use a separate patch that counts in /proc/vmstat how
> many times the per-cpu mask was allocated as an approximate measure of
> how often this logic really reduces the number of IPI calls in practice
> and report that number with the patch - i.e. this patch reduces the
> number of times IPIs are globally transmitted by X% for some workload.
>
Great idea. I like it - and I guess the 50% could be configurable.
Will do and report.
Gilad
>
> --
>
> Mel Gorman
> SUSE Labs
--
Gilad Ben-Yossef
Chief Coffee Drinker
gilad@...yossef.com
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388
http://benyossef.com
"Unfortunately, cache misses are an equal opportunity pain provider."
-- Mike Galbraith, LKML
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists