linux-kernel - Re: Re: [PATCH 1/4] sched/eevdf: Fix vruntime adjustment on reweight

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <7f179261-72a7-41d9-aa79-6ba8cc3c4286@bytedance.com>
Date:   Thu, 16 Nov 2023 15:11:04 +0800
From:   Abel Wu <wuyun.abel@...edance.com>
To:     Yiwei Lin <s921975628@...il.com>
Cc:     Barry Song <21cnbao@...il.com>,
        Benjamin Segall <bsegall@...gle.com>,
        Chen Yu <yu.c.chen@...el.com>,
        Daniel Jordan <daniel.m.jordan@...cle.com>,
        "Gautham R . Shenoy" <gautham.shenoy@....com>,
        Joel Fernandes <joel@...lfernandes.org>,
        K Prateek Nayak <kprateek.nayak@....com>,
        Mike Galbraith <efault@....de>,
        Qais Yousef <qyousef@...alina.io>,
        Tim Chen <tim.c.chen@...ux.intel.com>,
        Yicong Yang <yangyicong@...wei.com>,
        Youssef Esmat <youssefesmat@...omium.org>,
        linux-kernel@...r.kernel.org,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Valentin Schneider <valentin.schneider@....com>
Subject: Re: Re: [PATCH 1/4] sched/eevdf: Fix vruntime adjustment on reweight

On 11/16/23 2:51 PM, Yiwei Lin Wrote:
> 
> On 11/16/23 13:07, Abel Wu wrote:
>> On 11/16/23 12:48 PM, Abel Wu Wrote:
>>> On 11/15/23 11:36 PM, Yiwei Lin Wrote:
>>>>
>>>>> @@ -3712,8 +3811,17 @@ static void reweight_entity(struct cfs_rq *cfs_rq, struct sched_entity *se,
>>>>>       enqueue_load_avg(cfs_rq, se);
>>>>>       if (se->on_rq) {
>>>>>           update_load_add(&cfs_rq->load, se->load.weight);
>>>>> -        if (cfs_rq->curr != se)
>>>>> -            avg_vruntime_add(cfs_rq, se);
>>>>> +        if (!curr) {
>>>>> +            /*
>>>>> +             * The entity's vruntime has been adjusted, so let's check
>>>>> +             * whether the rq-wide min_vruntime needs updated too. Since
>>>>> +             * the calculations above require stable min_vruntime rather
>>>>> +             * than up-to-date one, we do the update at the end of the
>>>>> +             * reweight process.
>>>>> +             */
>>>>> +            __enqueue_entity(cfs_rq, se);
>>>>> +            update_min_vruntime(cfs_rq);
>>>>> +        }
>>>>>       }
>>>>>   }
>>>> Sorry if I am asking stupid question...... It looks like reweight_entity() may have chance to change the weight of cfs_rq->curr entity, but we'll never update_min_vruntime() when reweighting it. Is there any reason that we can skip the update_min_vruntime() for this case?
>>>
>>> No, you are right!
>>
>> I was intended to update_min_vruntime() if se->on_rq and no matter
>> it is curr or not, just as you suggested. But after a second thought
>> I wonder if it is necessary to update *NOW*, since we will always
>> update_curr() before making any change to cfs_rq. Thoughts?
> I lost the fact that we'll update_min_vruntime() every time we update_curr(). Because of this fact, we can indeed wait until we need the correct min_vruntime and update_min_vruntime() then. The only consideration that I came up with is that the sched_debug may not be able to reflect the accurate min_vruntime in time. But this may not be a big problem.
> 
> Further, I have another advanced thought we can remove the update_min_vruntime() here in the reweight_entity() directly to save more time. The reason that I think this is because min_vruntime is not for normalization of vruntime as before which is required on CFS, so we will always update_curr() for the latest min_vruntime before using it. Also, the update_min_vruntime() in dequeue_entity() may also be removed as the reason, i.e. just do update_min_vruntime() in update_curr() to simplify. What do you think?

Yes, this is also exactly what I am thinking about. As task placement
now adopts lag-based solution which is irrespective of min_vruntime,
and also based on the fact that it is only used as a base offset for
calculating avg_vruntime (in order to avoid overflow), we probably
can update it in a more relaxed way e.g. in ticks. If relaxed update
works, there seems still work to be done first:

   1) the priority of core pick when core scheduling needs to change
      to deadline-based solution;
   2) need to make sure not overflow in NOHZ_FULL mode

Just some first thoughts come into my mind :)

Thanks,
	Abel