linux-kernel - Re: [RFC v4 0/6] CPU reclaiming for SCHED

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170111133926.7ec0a5b0@luca>
Date:   Wed, 11 Jan 2017 13:39:26 +0100
From:   Luca Abeni <luca.abeni@...tannapisa.it>
To:     Juri Lelli <juri.lelli@....com>
Cc:     Daniel Bristot de Oliveira <bristot@...hat.com>,
        linux-kernel@...r.kernel.org,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Claudio Scordino <claudio@...dence.eu.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Tommaso Cucinotta <tommaso.cucinotta@...up.it>
Subject: Re: [RFC v4 0/6] CPU reclaiming for SCHED_DEADLINE

Hi Juri,
(I reply from my new email address)

On Wed, 11 Jan 2017 12:19:51 +0000
Juri Lelli <juri.lelli@....com> wrote:
[...]
> > > For example, with my taskset, with a hypothetical perfect balance
> > > of the whole runqueue, one possible scenario is:
> > >
> > >    CPU    0    1     2     3
> > > # TASKS   3    3     3     2
> > >
> > > In this case, CPUs 0 1 2 are with 100% of local utilization.
> > > Thus, the current task on these CPUs will have their runtime
> > > decreased by GRUB. Meanwhile, the luck tasks in the CPU 3 would
> > > use an additional time that they "globally" do not have - because
> > > the system, globally, has a load higher than the 66.6...% of the
> > > local runqueue. Actually, part of the time decreased from tasks
> > > on [0-2] are being used by the tasks on 3, until the next
> > > migration of any task, which will change the luck tasks... but
> > > without any guaranty that all tasks will be the luck one on every
> > > activation, causing the problem.
> > >
> > > Does it make sense?  
> > 
> > Yes; but my impression is that gEDF will migrate tasks so that the
> > distribution of the reclaimed CPU bandwidth is almost uniform...
> > Instead, you saw huge differences in the utilisations (and I do not
> > think that "compressing" the utilisations from 100% to 95% can
> > decrease the utilisation of a task from 33% to 25% / 26%... :)
> >  
> 
> I tried to replicate Daniel's experiment, but I don't see such a
> skewed allocation. They get a reasonably uniform bandwidth and the
> trace looks fairly good as well (all processes get to run on the
> different processors at some time).

With some effort, I replicated the issue noticed by Daniel... I think
it also depends on the CPU speed (and on good or bad luck :), but the
"unfair" CPU allocation can actually happen.
I am working on a fix (based on the m-grub modifications proposed at
last April's SAC - in my original patchset, I over-simplified the
algorithm).


> > I suspect there is something more going on here (might be some bug
> > in one of my patches). I am trying to better understand what
> > happened. 
> 
> However, playing with this a bit further, I found out one thing that
> looks counter-intuitive (at least to me :).
> 
> Simplifying Daniel's example, let's say that we have one 10/30 task
> running on a CPU with a 500/1000 global limit. Applying grub_reclaim()
> formula we have:
> 
>  delta_exec = delta * (0.5 + 0.333) = delta * 0.833
> 
> Which in practice means that 1ms of real delta (at 1000HZ) corresponds
> to 0.833ms of virtual delta. Considering this, a 10ms (over 30ms)
> reservation gets "extended" to ~12ms (over 30ms), that is to say the
> task consumes 0.4 of the CPU's bandwidth. top seems to back what I'm
> saying, but am I still talking nonsense? :)

You are right; my "Do not reclaim the whole CPU bandwidth" patch is an
approximation... I hoped that this approximation could be more precise
than what it really is.
I used the "Uact + unreclaimable utilization" equation to avoid
divisions in grub_reclaim(), but the equation should really be "Uact /
reclaimable utilization"... So, in your example it is
	delta * 0.3333 / 0.5 = delta * 0.6666
that results in 15ms over 30ms, as expected.

I'll fix that patch for the next submission.

> I was expecting that the task could consume 0.5 worth of bandwidth
> with the given global limit. Is the current behaviour intended?
> 
> If we want to change this behaviour maybe something like the following
> might work?
> 
>  delta_exec = (delta * to_ratio((1ULL << 20) - rq->dl.non_deadline_bw,
>                                 rq->dl.running_bw)) >> 20
My current patch does
	(delta * rq->dl.running_bw * rq->dl.deadline_bw_inv) >> 20 >> 8;
where rq->dl.deadline_bw_inv has been set to
	to_ratio(global_rt_runtime(), global_rt_period()) >> 12;
	
This seems to work fine, and should introduce less overhead than
to_ratio().


		Thanks,
			Luca