linux-kernel - Re: [PATCH 1/2] sched/deadline: accurate reclaim bandwidth for GRUB

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230510090735.68f62cd4@nowhere>
Date:   Wed, 10 May 2023 09:07:35 +0200
From:   luca abeni <luca.abeni@...tannapisa.it>
To:     Vineeth Remanan Pillai <vineeth@...byteword.org>
Cc:     Juri Lelli <juri.lelli@...hat.com>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Joel Fernandes <joel@...lfernandes.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Valentin Schneider <vschneid@...hat.com>,
        Jonathan Corbet <corbet@....net>, linux-kernel@...r.kernel.org,
        linux-doc@...r.kernel.org
Subject: Re: [PATCH 1/2] sched/deadline: accurate reclaim bandwidth for GRUB

Hi,

after thinking more about this (and re-reading the code, and your patch
:), I try to comment again:

On Tue, 9 May 2023 23:53:28 -0400
Vineeth Remanan Pillai <vineeth@...byteword.org> wrote:

> On Tue, May 9, 2023 at 4:54 PM luca abeni
> <luca.abeni@...tannapisa.it> wrote:
> 
> > > Yes, this is the approximation I was mentioning... Instead of
> > > using a division, I approximated it with a different equation
> > > using a sum.  
> >
> > Sorry, ignore this comment (and the following); I misinterpreted the
> > code (and my old notes).
> >
> > I do not understand why the "max{}" doe not work well, I need to
> > double think about it.
> >  
> I was thinking more about this and was doing some more digging into
> this. I was also wrong about min{}. Giving it some more thought, I
> think (U/Umax) is indeed the only equation we need and it will take
> care of caping the reclaiming at Umax.

Uhm... I am not sure about this: for 1 single task on 1 CPU, yes,
u/Umax does the right thing. But when there are multiple tasks on the
CPU, I think it can cause issues (because every task ends up trying to
execute for Umax).

The "max{}" comes from the original multi-processor GRUB algorithm:
https://arxiv.org/pdf/1512.01984.pdf (see Equation (13) - in that
equation, the part we call u_extra is included in Uinact[p])

the "1 - u_inact - u_extra" part is needed to make sure that the
real-time guarantees are not broken by the reclaiming mechanism... But
it can end up with a task trying to consume too much time on a single
CPU, hence the "u/Umax" term in the "max{}" is needed to make sure that
a task will not consume more than Umax of a CPU.

Now, if we have one single task on a CPU u/Umax will always be larger
than the other term... But when we have multiple tasks the other term
is needed too.


> The reason why it was not
> working is because of the loss of precision when we did the inverse.

I agree


> I tried replacing (delta * running_bw * bw_ratio) by
> div64_u64(delta * running_bw, Umax) and it worked as expected and
> reclaimed only till Umax with only SCHED_FLAG_RECLAIM tasks. As an
> example a task with reservation (1, 100) and RT capacity 95%, and
> delta = 4ms, we get scaled_delta as
> delta * running_bw * bw_ratio ~= .040000 (roughly)
> div64_u64(delta * running_bw, Umax) ~= .04210526 (roughly)
> 
> This caused the inverse logic to consume ~99% bw, while the other
> one consumed ~95% as expected.
> 
> I still could not figure out why min{} works. As you mentioned in
> the previous thread, its the loss of precision thats the culprit and
> I think we only need U/Umax if we have enough precision. This along
> with accounting for both type of tasks will be the solution.

I agree with this.


(BTW, when considering multiple tasks on multiple CPUs, another
potential problem is given by u_extra... Now that I remember all the
details, u_extra is not "Umax - this_bw" - this is true when we consider
only one CPU, but is is "Umax - sum(u_i)/m" (where "sum(u_i)" is the
sum of the bandwidths of all the SCHED_DEADLINE tasks in the root
domain, and "m" is the number of CPUs in the root domain)... So, the
reclaimable CPU time is distributed uniformly on all the CPUs and this
could create some issues. But let's see what happens after the div64
fix and the SCHED_FLAG_RECLAIM fix)


			Thanks,
				Luca

> 
> I will look deeper into any performance issues with using div64_u64
> over multiplication and shall let you know soon.
> 
> Thanks,
> Vineeth