linux-kernel - Re: [PATCH v3 2/5] sched/deadline: Fix reclaim inaccuracy with SMP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAO7JXPgXi8q02HBeBR_RLWmODd9uQBH_UMCYgVQwbf+FX=Qxkg@mail.gmail.com>
Date:   Mon, 22 May 2023 15:22:52 -0400
From:   Vineeth Remanan Pillai <vineeth@...byteword.org>
To:     luca abeni <luca.abeni@...tannapisa.it>
Cc:     Juri Lelli <juri.lelli@...hat.com>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Joel Fernandes <joel@...lfernandes.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Valentin Schneider <vschneid@...hat.com>,
        Jonathan Corbet <corbet@....net>, linux-kernel@...r.kernel.org,
        linux-doc@...r.kernel.org
Subject: Re: [PATCH v3 2/5] sched/deadline: Fix reclaim inaccuracy with SMP

Hi Luca,

Merging the last two mails in this reply :-)

> So, we are wasting 181.3 - 95 = 86.3% of CPU time, which 590 cannot
> reclaim (because it cannot execute simultaneously on 2 CPUs).
>
Correct. Thanks for explaining it in detail, I was tracing the scheduler
and verified this pattern you explained.

> Now that the problem is more clear to me, I am trying to understand a
> possible solution (as you mention, moving some extra bandwidth from the
> 590's CPU will fix this problem... But I am not sure if this dynamic
> extra bandwidth migration is feasible in practice without introducing
> too much overhead)
>
> I'll look better at your new proposal.
>
The idea that I mentioned tries to solve this problem in a best effort
way: If global load is high, use the global "Uextra = rq->dl.extra_bw"
and "Umax = rq->dl.this_bw + rq->dl.extra_bw". Otherwise use the local
values "Umax= rq->dl.max_bw", "Uextra= rq->dl.max_bw - rq->dl.this_bw".
This is still not perfect, but tries to reclaim very close to maximum
allowed limit almost always.

Please have a look when you get a chance :-).

>
> I just tried to repeat this test on a VM with 3 CPUs, and I can
> reproduce the stall (100% of CPU time reclaimed by SCHED_DEADLINE
> tasks, with no possibility for the other tasks to execute) when I use
>         dq = -(max{u_i / Umax, (Umax - Uinact - Uextra)}) * dt
>
> But when I use
>         dq = -(max{u_i, (Umax - Uinact - Uextra)} / Umax) * dt
> everything works as expected, the 4 tasks reclaim 95% of the CPU
> time and my shell is still active...
> (so, I cannot reproduce the starvation issue with this equation)
>
Sorry about this confusion, yes you are right, there is no stall with
this equation. The only issue is the lesser reclaim when the load is
less and tasks have different bandwidth requirements.

> So, I now think the second one is the correct equation to be used.
>
Thanks for confirming.

I think it probably makes sense to get the fix for the equation to go
in as a first step and then we can investigate more about the second
issue (less reclaiming with less load and different bandwidth) and
fix it separately. What do you think? I shall send the next iteration
with the fix for the equation alone if its okay with you.

Thanks,
Vineeth