linux-kernel - Re: [RFC PATCH 1/6] sched/uclamp: Track uclamped util_avg in sched

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <7790c176-6af3-458c-9995-8bceb73b64bc@arm.com>
Date:   Tue, 14 Nov 2023 13:59:02 +0100
From:   Dietmar Eggemann <dietmar.eggemann@....com>
To:     Hongyan Xia <hongyan.xia2@....com>, Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Juri Lelli <juri.lelli@...hat.com>
Cc:     Qais Yousef <qyousef@...alina.io>,
        Morten Rasmussen <morten.rasmussen@....com>,
        Lukasz Luba <lukasz.luba@....com>,
        Christian Loehle <christian.loehle@....com>,
        linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 1/6] sched/uclamp: Track uclamped util_avg in
 sched_avg

On 09/11/2023 17:05, Hongyan Xia wrote:
> On 31/10/2023 15:52, Dietmar Eggemann wrote:
>> On 04/10/2023 11:04, Hongyan Xia wrote:
>>> From: Hongyan Xia <hongyan.xia2@....com>
>>
>> [...]
>>
>>> @@ -6445,6 +6450,21 @@ static int sched_idle_cpu(int cpu)
>>>   }
>>>   #endif
>>>   +void ___update_util_avg_uclamp(struct sched_avg *avg, struct
>>> sched_entity *se);
>>
>> IMHO, `struct sched_avg *avg` can only be the one of a cfs_rq. So
>> passing a cfs_rq would eliminate the question whether this can be from
>> another se.
> 
> At the moment, yes, the avg can only come from cfs_rq. The reason why I
> kept sched_avg is that once we have sum aggregation for RT tasks, it's
> very likely we will end up calling this function on rt_rq->avg, so
> having just sched_avg here will work for RT in the future.

Ah, OK. IMHO would be better to use cfs_rq for now and widen this
interface once RT is covered.

[...]

>> Question for me is why can't you integrate the util_avg_uclamp signals
>> for se's and cfs_rq's/rq's much closer into existing PELT functions?
> 
> So the problem is that when we enqueue the task (say UCLAMP_MIN of 200),
> at that exact moment se->on_rq is false, so we only decay and not
> enforce UCLAMP_MIN. Further up in the hierarchy we do update_load_avg(),
> but the se of the task has already been processed so UCLAMP_MIN has not
> taken any effect. To make sure UCLAMP_MIN is immediately effective, I
> just re-evaluate the whole hierarchy from bottom to top.
> 
> I probably didn't quite catch what you said here. Could you elaborate a
> bit on what 'much closer' means?

I see. But can we not use DO_ATTACH (and DO_DETACH) for this?

(flags & DO_ATTACH) characterizes the enqueuing of the task. So with
DO_ATTACH (and so !se->on_rq (and cfs_rq->curr != se)) we could (a)
decay the task and (b) add it to the cfs_rq in enqueue_entity() ->
update_load_avg(..., | DO_ATTACH).

Like we do for PELT and a wakeup migration enqueuing (a)

update_load_avg()

  __update_load_avg_se()

    if (___update_load_sum(..., cfs_rq->curr == se)
                                ^^^^^^^^^^^^^^^^^^
                                'int running' for utilization

      ___update_load_avg()

  if (!se->avg.last_update_time && (flags & DO_ATTACH))
      ^^^^^^^^^^^^^^^^^^^^^^^^^
      wakeup migration

    attach_entity_load_avg()

Notice, for PELT we attach/detach to/from the cfs_rq which gives us
blocked contribution. For util_avg_clamped we do enqueue/dequeue, so
only runnable contribution.