[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20070727012214.GB13939@wotan.suse.de>
Date: Fri, 27 Jul 2007 03:22:14 +0200
From: Nick Piggin <npiggin@...e.de>
To: "Siddha, Suresh B" <suresh.b.siddha@...el.com>
Cc: Ingo Molnar <mingo@...e.hu>, linux-kernel@...r.kernel.org,
akpm@...ux-foundation.org
Subject: Re: [patch] sched: introduce SD_BALANCE_FORK for ht/mc/smp domains
On Thu, Jul 26, 2007 at 03:34:56PM -0700, Suresh B wrote:
> On Fri, Jul 27, 2007 at 12:18:30AM +0200, Ingo Molnar wrote:
> >
> > * Siddha, Suresh B <suresh.b.siddha@...el.com> wrote:
> >
> > > Introduce SD_BALANCE_FORK for HT/MC/SMP domains.
> > >
> > > For HT/MC, as caches are shared, SD_BALANCE_FORK is the right thing to
> > > do. Given that NUMA domain already has this flag and the scheduler
> > > currently doesn't have the concept of running threads belonging to a
> > > process as close as possible(i.e., forking may keep close, but
> > > periodic balance later will likely take them far away), introduce
> > > SD_BALANCE_FORK for SMP domain too.
> > >
> > > Signed-off-by: Suresh Siddha <suresh.b.siddha@...el.com>
> >
> > i'm not opposed to this fundamentally, but it would be nice to better
> > map the effects of this change: do you have any particular workload
> > under which you've tested this and under which you've seen it makes a
> > difference? I'd expect this to improve fork-intense half-idle workloads
> > perhaps - things like a make -j3 on a 4-core CPU.
>
> They might be doing more exec's and probably covered by exec balance.
>
> There was a small pthread test case which was calculating the time to create
> all the threads and how much time each thread took to start running. It
> appeared as if the threads ran sequentially one after another on a DP system
> with four cores leading to this SD_BALANCE_FORK observation.
If it helps throughput in a non-trivial microbenchmark it would be
helpful. I'm not against it either really, but keep in mind that it
can make fork more expensive and less scalable; the reason we do it
for the NUMA domain is because today we're basically screwed WRT
existing working set if we have to migrage processes over nodes. It
is really important to try to minimise that any way we possibly can.
When (if) we get NUMA page replication and automatic migration going,
I will be looking at whether we can make NUMA migration more
aggressive (and potentially remove SD_BALANCE_FORK). Not that either
replication or migration help with kernel allocations, nor are they
cheap, so NUMA placement will always be worth spending more cycles
on to get right.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists