linux-kernel - Re: [PATCH v3 2/5] sched/deadline: Fix reclaim inaccuracy with SMP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAO7JXPhVcFJ2+8dpw8Ns4xVr622G8puqdcbSjy23THpahJTmUA@mail.gmail.com>
Date:   Fri, 19 May 2023 22:15:35 -0400
From:   Vineeth Remanan Pillai <vineeth@...byteword.org>
To:     Dietmar Eggemann <dietmar.eggemann@....com>
Cc:     luca.abeni@...tannapisa.it, Juri Lelli <juri.lelli@...hat.com>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Joel Fernandes <joel@...lfernandes.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Valentin Schneider <vschneid@...hat.com>,
        Jonathan Corbet <corbet@....net>, linux-kernel@...r.kernel.org,
        linux-doc@...r.kernel.org, youssefesmat@...gle.com
Subject: Re: [PATCH v3 2/5] sched/deadline: Fix reclaim inaccuracy with SMP

Hi Dietmar,

On Fri, May 19, 2023 at 1:56 PM Dietmar Eggemann
<dietmar.eggemann@....com> wrote:

> > TID[730]: RECLAIM=1, (r=8ms, d=10ms, p=10ms), Util: 95.05
> > TID[731]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 31.34
> > TID[732]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 3.16
>
> What does this 'Util: X' value stand for? I assume it's the utilization
> of the task? How do you obtain it?
>
Yes, it is the utilization of the task. I calculate it by dividing the
cputime with elapsed time(using clock_gettime(2)).

> I see that e.g. TID[731] should run 1ms each 10ms w/o grub and with grub
> the runtime could be potentially longer since 'scaled_delta_exec < delta'.
>
Yes correct. GRUB(Greedy Reclamation of Unused Bandwidth) algorithm
is used here for deadline tasks that needs to run longer than their
runtime when needed. sched_setattr allows a flag SCHED_FLAG_RECLAIM
to indicate that the task would like to reclaim unused bandwidth of a
cpu if available. For those tasks, 'runtime' is depreciated using the
GRUB formula and it allows it to run for longer and reclaim the free
bandwidth of the cpu. The GRUB implementation in linux allows a task
to reclaim upto RT capacity(95%) and depends on the free bandwidth
of the cpu. So TID[731] theoretically should run for 95ms as it is
the only task in the cpu, but it doesn't get to run that long.

> I don't get this comment in update_curr_dl():
>
> 1325    /*
> 1326     * For tasks that participate in GRUB, we implement GRUB-PA: the
> 1327     * spare reclaimed bandwidth is used to clock down frequency.
> 1328     *
>
> It looks like dl_se->runtime is affected and with 'scaled_delta_exec <
> delta' the task runs longer than dl_se->dl_runtime?
>
Yes. As mentioned above, GRUB allows the task to run longer by slowing
down the depreciation of "dl_se->dl_runtime". scaled_delta_exec is
calculated by the GRUB formula explained in the paper [1] & [2].

> I did the test discussed later in this thread with:
>
> 3 [3/100] tasks (dl_se->dl_bw = (3 << 20)/100 = 31457) on 3 CPUs
>
> factor = scaled_delta_exec/delta
>
> - existing grub
>
> rq->dl.bw_ratio = ( 100 << 8 ) / 95 = 269
> rq->dl.extra_bw = ( 95 << 20 ) / 100 = 996147
>
> cpu=2 curr->[thread0-2 1715] delta=2140100 this_bw=31457
> running_bw=31457 extra_bw=894788 u_inact=0 u_act_min=33054 u_act=153788
> scaled_delta_exec=313874 factor=0.14
>
> - your solution patch [1-2]
>
> cpu=2 curr->[thread0-0 1676] delta=157020 running_bw=31457 max_bw=996147
> res=4958 factor=0.03
>
> You say that GRUB calculation is inaccurate and that this inaccuracy
> gets larger as the bandwidth of tasks becomes smaller.
>
> Could you explain this inaccuracy on this example?
>
According to GRUB, we should be able to reclaim the unused bandwidth
for the running task upto RT limits(95%). In this example we have a
task with 3ms runtime and 100ms runtime on a cpu. So it is supposed
to run for 95ms before it is throttled.
Existing implementation's factor = 0.14 and 3ms is depreciated by
this factor. So it gets to run for "3 / 0.14 ~= 22ms". This is the
inaccuracy that the patch is trying to solve. With the patch, the
factor is .03166 and runtime = "3 / 0.03166 ~= 95ms"

Hope this clarifies.

Thanks,
Vineeth

[1]: Abeni, Luca & Lipari, Giuseppe & Parri, Andrea & Sun, Youcheng.
     (2015). Parallel and sequential reclaiming in multicore
     real-time global scheduling.

[2]: G. Lipari and S. Baruah, "Greedy reclamation of unused bandwidth
     in constant-bandwidth servers," Proceedings 12th Euromicro
     Conference on Real-Time Systems. Euromicro RTS 2000, Stockholm,
     Sweden, 2000, pp. 193-200, doi: 10.1109/EMRTS.2000.854007.