linux-kernel - Re: [PATCH] sched: tg_set_cfs_bandwidth() causes rq->lock deadlock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20140519103247.GX30445@twins.programming.kicks-ass.net>
Date:	Mon, 19 May 2014 12:32:47 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Roman Gushchin <klamm@...dex-team.ru>
Cc:	bsegall@...gle.com, linux-kernel@...r.kernel.org, pjt@...gle.com,
	chris.j.arges@...onical.com, gregkh@...uxfoundation.org
Subject: Re: [PATCH] sched: tg_set_cfs_bandwidth() causes rq->lock deadlock

On Fri, May 16, 2014 at 12:38:21PM +0400, Roman Gushchin wrote:

> I still think, there is a deadlock. I'll try to explain.
> Three CPUs must be involved:
> CPU0				CPU1				CPU2
> take rq->lock			period timer fired		
> ...				take cfs_b lock
> ...				...				tg_set_cfs_bandwidth()
> throttle_cfs_rq()		release cfs_b lock		take cfs_b lock
> ...				distribute_cfs_runtime()	timer_active = 0
> take cfs_b->lock		wait for rq->lock		...
> __start_cfs_bandwidth()	
> {wait for timer callback
>  break if timer_active == 1}
> 
> So, CPU0 and CPU1 are deadlocked.

OK, so can someone explain this ->timer_active thing? esp. what's the
'obvious' difference with hrtimer_active()?


Ideally we'd change the lot to not have this, but if we have to keep it
we'll need to make it lockdep visible because all this stinks

Content of type "application/pgp-signature" skipped