linux-kernel - Re: [PATCH] sched/deadline: Unthrottle PI boosted threads while enqueuing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200918060026.GC261845@localhost.localdomain>
Date:   Fri, 18 Sep 2020 08:00:26 +0200
From:   Juri Lelli <juri.lelli@...hat.com>
To:     Daniel Bristot de Oliveira <bristot@...hat.com>
Cc:     Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Mark Simmons <msimmons@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] sched/deadline: Unthrottle PI boosted threads while
 enqueuing

Hi Daniel,

On 16/09/20 09:06, Daniel Bristot de Oliveira wrote:
> stress-ng has a test (stress-ng --cyclic) that creates a set of threads
> under SCHED_DEADLINE with the following parameters:
> 
>     dl_runtime   =  10000 (10 us)
>     dl_deadline  = 100000 (100 us)
>     dl_period    = 100000 (100 us)
> 
> These parameters are very aggressive. When using a system without HRTICK
> set, these threads can easily execute longer than the dl_runtime because
> the throttling happens with 1/HZ resolution.
> 
> During the main part of the test, the system works just fine because
> the workload does not try to run over the 10 us. The problem happens at
> the end of the test, on the exit() path. During exit(), the threads need
> to do some cleanups that require real-time mutex locks, mainly those
> related to memory management, resulting in this scenario:
> 
> Note: locks are rt_mutexes...
>  ------------------------------------------------------------------------
>     TASK A:		TASK B:				TASK C:
>     activation
> 							activation
> 			activation
> 
>     lock(a): OK!	lock(b): OK!
>     			<overrun runtime>
>     			lock(a)
>     			-> block (task A owns it)
> 			  -> self notice/set throttled
>  +--<			  -> arm replenished timer
>  |    			switch-out
>  |    							lock(b)
>  |    							-> <C prio > B prio>
>  |    							-> boost TASK B
>  |  unlock(a)						switch-out
>  |  -> handle lock a to B
>  |    -> wakeup(B)
>  |      -> B is throttled:
>  |        -> do not enqueue
>  |     switch-out
>  |
>  |
>  +---------------------> replenishment timer
> 			-> TASK B is boosted:
> 			  -> do not enqueue
>  ------------------------------------------------------------------------
> 
> BOOM: TASK B is runnable but !enqueued, holding TASK C: the system
> crashes with hung task C.
> 
> This problem is avoided by removing the throttle state from the boosted
> thread while boosting it (by TASK A in the example above), allowing it to
> be queued and run boosted.
> 
> The next replenishment will take care of the runtime overrun, pushing
> the deadline further away. See the "while (dl_se->runtime <= 0)" on
> replenish_dl_entity() for more information.
> 
> Signed-off-by: Daniel Bristot de Oliveira <bristot@...hat.com>
> Reported-by: Mark Simmons <msimmons@...hat.com>
> Reviewed-by: Juri Lelli <juri.lelli@...hat.com>
> Tested-by: Mark Simmons <msimmons@...hat.com>
> Cc: Ingo Molnar <mingo@...hat.com>
> Cc: Peter Zijlstra <peterz@...radead.org>
> Cc: Juri Lelli <juri.lelli@...hat.com>
> Cc: Vincent Guittot <vincent.guittot@...aro.org>
> Cc: Dietmar Eggemann <dietmar.eggemann@....com>
> Cc: Steven Rostedt <rostedt@...dmis.org>
> Cc: Ben Segall <bsegall@...gle.com>
> Cc: Mel Gorman <mgorman@...e.de>
> Cc: Daniel Bristot de Oliveira <bristot@...hat.com>
> Cc: linux-kernel@...r.kernel.org
> 
> ---

Thanks for this fix.

Acked-by: Juri Lelli <juri.lelli@...hat.com>

Best,
Juri