linux-kernel - Re: RT sched: cpupri_vec lock contention with def_root

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20081104144017.GB30855@sgi.com>
Date:	Tue, 4 Nov 2008 08:40:17 -0600
From:	Dimitri Sivanich <sivanich@....com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Gregory Haskins <ghaskins@...ell.com>,
	linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...e.hu>
Subject: Re: RT sched: cpupri_vec lock contention with def_root_domain and no load balance

On Tue, Nov 04, 2008 at 03:36:33PM +0100, Peter Zijlstra wrote:
> On Tue, 2008-11-04 at 09:34 -0500, Gregory Haskins wrote:
> > Gregory Haskins wrote:
> > > Peter Zijlstra wrote:
> > >   
> > >> On Mon, 2008-11-03 at 15:07 -0600, Dimitri Sivanich wrote:
> > >>   
> > >>     
> > >>> When load balancing gets switched off for a set of cpus via the
> > >>> sched_load_balance flag in cpusets, those cpus wind up with the
> > >>> globally defined def_root_domain attached.  The def_root_domain is
> > >>> attached when partition_sched_domains calls detach_destroy_domains().
> > >>> A new root_domain is never allocated or attached as a sched domain
> > >>> will never be attached by __build_sched_domains() for the non-load
> > >>> balanced processors.
> > >>>
> > >>> The problem with this scenario is that on systems with a large number
> > >>> of processors with load balancing switched off, we start to see the
> > >>> cpupri->pri_to_cpu->lock in the def_root_domain becoming contended.
> > >>> This starts to become much more apparent above 8 waking RT threads
> > >>> (with each RT thread running on it's own cpu, blocking and waking up
> > >>> continuously).
> > >>>
> > >>> I'm wondering if this is, in fact, the way things were meant to work,
> > >>> or should we have a root domain allocated for each cpu that is not to
> > >>> be part of a sched domain?  Note the the def_root_domain spans all of
> > >>> the non-load-balanced cpus in this case.  Having it attached to cpus
> > >>> that should not be load balancing doesn't quite make sense to me.
> > >>>     
> > >>>       
> > >> It shouldn't be like that, each load-balance domain (in your case a
> > >> single cpu) should get its own root domain. Gregory?
> > >>   
> > >>     
> > >
> > > Yeah, this sounds broken.  I know that the root-domain code was being
> > > developed coincident to some upheaval with the cpuset code, so I suspect
> > > something may have been broken from the original intent.  I will take a
> > > look.
> > >
> > > -Greg
> > >
> > >   
> > 
> > After thinking about it some more, I am not quite sure what to do here. 
> > The root-domain code was really designed to be 1:1 with a disjoint
> > cpuset.  In this case, it sounds like all the non-balanced cpus are
> > still in one default cpuset.  In that case, the code is correct to place
> > all those cores in the singleton def_root_domain.  The question really
> > is: How do we support the sched_load_balance flag better?
> > 
> > I suppose we could go through the scheduler code and have it check that
> > flag before consulting the root-domain.  Another alternative is to have
> > the sched_load_balance=false flag create a disjoint cpuset.  Any thoughts?
> 
> Hmm, but you cannot disable load-balance on a cpu without placing it in
> an cpuset first, right?
> 
> Or are folks disabling load-balance bottom-up, instead of top-down?
> 
> In that case, I think we should dis-allow that.

When I see this behavior, I am creating cpusets containing these non load balancing cpus.  Whether I create a single cpuset for each one, or one cpuset for all of them, the root domain ends up being the def_root_domain with no sched domain attached once I set both the root cpuset and created cpuset's sched_load_balance flags to 0.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/