linux-kernel - Re: [PATCH v3 2/5] sched/deadline: Fix reclaim inaccuracy with SMP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230520115857.3838a7ea@nowhere>
Date:   Sat, 20 May 2023 11:58:57 +0200
From:   luca abeni <luca.abeni@...tannapisa.it>
To:     Vineeth Remanan Pillai <vineeth@...byteword.org>
Cc:     Juri Lelli <juri.lelli@...hat.com>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Joel Fernandes <joel@...lfernandes.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Valentin Schneider <vschneid@...hat.com>,
        Jonathan Corbet <corbet@....net>, linux-kernel@...r.kernel.org,
        linux-doc@...r.kernel.org
Subject: Re: [PATCH v3 2/5] sched/deadline: Fix reclaim inaccuracy with SMP

Hi again,

On Fri, 19 May 2023 12:12:50 -0400
Vineeth Remanan Pillai <vineeth@...byteword.org> wrote:
[...]
> - cpu util may go to 100% when we have tasks with large bandwidth
> close to Umax
> 
> As an eg. for issue 1, three tasks - (7,10) (3,10) and (1,10):
> TID[590]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 95.20
> TID[591]: RECLAIM=1, (r=3ms, d=10ms, p=10ms), Util: 81.94
> TID[592]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 27.19
> 
> re. issue 2, four tasks with same reservation (7,10), tasks tries
> to reclaim leading to 100% cpu usage on all three cpus and leads to
> system hang.

I just tried to repeat this test on a VM with 3 CPUs, and I can
reproduce the stall (100% of CPU time reclaimed by SCHED_DEADLINE
tasks, with no possibility for the other tasks to execute) when I use
	dq = -(max{u_i / Umax, (Umax - Uinact - Uextra)}) * dt

But when I use
	dq = -(max{u_i, (Umax - Uinact - Uextra)} / Umax) * dt
everything works as expected, the 4 tasks reclaim 95% of the CPU
time and my shell is still active...
(so, I cannot reproduce the starvation issue with this equation)

So, I now think the second one is the correct equation to be used.



				Luca