[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKknFTCFp-PZ7_5PO7G=S2HmHhBPCj7m9eboEJhuspToLa9wRA@mail.gmail.com>
Date: Wed, 4 Jan 2017 19:30:20 +0100
From: Luca Abeni <luca.abeni@...tn.it>
To: Daniel Bristot de Oliveira <bristot@...hat.com>
Cc: linux-kernel@...r.kernel.org,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>,
Juri Lelli <juri.lelli@....com>,
Claudio Scordino <claudio@...dence.eu.com>,
Steven Rostedt <rostedt@...dmis.org>,
Tommaso Cucinotta <tommaso.cucinotta@...up.it>
Subject: Re: [RFC v4 0/6] CPU reclaiming for SCHED_DEADLINE
2017-01-04 19:00 GMT+01:00, Daniel Bristot de Oliveira <bristot@...hat.com>:
[...]
>>>>> Some tasks start to use more CPU time, while others seems to use less
>>>>> CPU than it was reserved for them. See the task 14926, it is using
>>>>> only 23.8 % of the CPU, which is less than its 10/30 reservation.
>>>>
>>>> What happened here is that some runqueues have an active utilisation
>>>> larger than 0.95. So, GRUB is decreasing the amount of time received by
>>>> the tasks on those runqueues to consume less than 95%... This is the
>>>> reason for the effect you noticed below:
>>>
>>> I see. But, AFAIK, the Linux's sched deadline measures the load
>>> globally, not locally. So, it is not a problem having a load > than 95%
>>> in the local queue if the global queue is < 95%.
>>>
>>> Am I missing something?
>>
>> The version of GRUB reclaiming implemented in my patches tracks a
>> per-runqueue "active utilization", and uses it for reclaiming.
>
> I _think_ that this might be (one of) the source(s) of the problem...
I agree that this can cause some problems, but I am not sure if it
justifies the huge difference in utilisations you observed
> Just exercising...
>
> For example, with my taskset, with a hypothetical perfect balance of the
> whole runqueue, one possible scenario is:
>
> CPU 0 1 2 3
> # TASKS 3 3 3 2
>
> In this case, CPUs 0 1 2 are with 100% of local utilization. Thus, the
> current task on these CPUs will have their runtime decreased by GRUB.
> Meanwhile, the luck tasks in the CPU 3 would use an additional time that
> they "globally" do not have - because the system, globally, has a load
> higher than the 66.6...% of the local runqueue. Actually, part of the
> time decreased from tasks on [0-2] are being used by the tasks on 3,
> until the next migration of any task, which will change the luck
> tasks... but without any guaranty that all tasks will be the luck one on
> every activation, causing the problem.
>
> Does it make sense?
Yes; but my impression is that gEDF will migrate tasks so that the
distribution of the reclaimed CPU bandwidth is almost uniform...
Instead, you saw huge differences in the utilisations (and I do not
think that "compressing" the utilisations from 100% to 95% can
decrease the utilisation of a task from 33% to 25% / 26%... :)
I suspect there is something more going on here (might be some bug in
one of my patches). I am trying to better understand what happened.
> If it does, this let me think that only with the global track of
> utilization we will achieve the correct result... but I may be missing
> something... :-).
Of course tracking the global active utilisation can be a solution,
but I also want to better understand what is wrong with the current
approach.
Thanks,
Luca
Powered by blists - more mailing lists