linux-kernel - Re: [RFC] sched/fair: hard lockup in sched_cfs_period

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190306162313.GB8786@pauld.bos.csb>
Date:   Wed, 6 Mar 2019 11:23:13 -0500
From:   Phil Auld <pauld@...hat.com>
To:     bsegall@...gle.com
Cc:     mingo@...hat.com, peterz@...radead.org,
        linux-kernel@...r.kernel.org
Subject: Re: [RFC]  sched/fair: hard lockup in sched_cfs_period_timer

On Tue, Mar 05, 2019 at 12:45:34PM -0800 bsegall@...gle.com wrote:
> Phil Auld <pauld@...hat.com> writes:
> 
> > Interestingly, if I limit the number of child cgroups to the number of 
> > them I'm actually putting processes into (16 down from 2500) the problem
> > does not reproduce.
> 
> That is indeed interesting, and definitely not something we'd want to
> matter. (Particularly if it's not root->a->b->c...->throttled_cgroup or
> root->throttled->a->...->thread vs root->throttled_cgroup, which is what
> I was originally thinking of)
> 

The locking may be a red herring.

The setup is root->throttled->a where a is 1-2500. There are 4 threads in
each of the first 16 a groups.  The parent, throttled, is where the 
cfs_period/quota_us are set. 

I wonder if the problem is the walk_tg_tree_from() call in unthrottle_cfs_rq(). 

The distribute_cfg_runtime looks to be O(n * m) where n is number of 
throttled cfs_rqs and m is the number of child cgroups. But I'm not 
completely clear on how the hierarchical cgroups play together here. 

I'll pull on this thread some. 

Thanks for your input.

Cheers,
Phil

--