linux-kernel - Re: [PATCH] sched/fair: simplify __calc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAG5MgCpW3xwSGgF6VBPiSMkwOz793NO4_nLNhJeqyBhs+jN30w@mail.gmail.com>
Date: Tue, 12 Mar 2024 16:25:31 -0700
From: Dawei Li <daweilics@...il.com>
To: Pierre Gondois <pierre.gondois@....com>
Cc: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>, 
	Juri Lelli <juri.lelli@...hat.com>, Vincent Guittot <vincent.guittot@...aro.org>, 
	Dietmar Eggemann <dietmar.eggemann@....com>, Steven Rostedt <rostedt@...dmis.org>, 
	Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, 
	Daniel Bristot de Oliveira <bristot@...hat.com>, Valentin Schneider <vschneid@...hat.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] sched/fair: simplify __calc_delta()

Hi Pierre,
Thank you for the review!

On Tue, Mar 12, 2024 at 6:18 AM Pierre Gondois <pierre.gondois@....com> wrote:
>
> Hello Dawei,
>
> On 3/6/24 23:28, Dawei Li wrote:
> > Based on how __calc_delta() is called now, the input parameter, weight
> > is always NICE_0_LOAD. I think we don't need it as an input parameter
> > now?
>
> Maybe
>    5e963f2bd4654a202a8a05aa3a86cb0300b10e6c ("sched/fair: Commit to EEVDF")
> should be referenced to explain that the case where (weight =< lw.weight)
> doesn't exist anymore and that NICE_0_LOAD could be incorporated in
> __calc_delta() directly.
>
>
> Also I think indirect forms are preferred in general:
> "I think we don't need it as an input parameter now ?" ->
> "The 'weight' parameter doesn't seem to be required anymore"
> (same note for the whole commit message)
>
> >
> > Also, when weight is always NICE_0_LOAD, the initial fact value is
> > always 2^10, and the first fact_hi will always be 0. Thus, we can get
> > rid of the first if bock.
> >
> > The previous comment "(delta_exec * (weight * lw->inv_weight)) >>
> > WMULT_SHIFT" seems to be assuming that lw->weight * lw->inv_weight is
> > always (approximately) equal to 2^WMULT_SHIFT. However, when
> > CONFIG_64BIT is set, lw->weight * lw->inv_weight is (approximately)
> > equal to 2^WMULT_SHIFT * 2^10. What remains true for both CONFIG_32BIT
> > and CONFIG_64BIT is: scale_load_down(lw->weight) * lw->inv_weight is
> > (approximately) equal to 2^WMULT_SHIFT. (Correct me if I am wrong.)
>
> I think the comment is more about explaining that:
>    X * lw.weight
> equals:
>    X * lw->inv_weight >> WMULT_SHIFT
>
I assume you mean
X / lw->weight
equals:
X * lw->inv_weight >> WMULT_SHIFT
However, this is not always true, and that's why I'd like to revise
it. It is true for
CONFIG_32BIT. However, For CONFIG_64BIT, we have lw->weight * lw->inv_weight =
2**WMULT_SHIFT * 2**10. Thus,
X / lw->weight
equals:
X * lw->inv_weight >> (WMULT_SHIFT + 10)


> Also, if CONFIG_64BIT is set, we should have:
>    weight / lw.weight == scale_load_down(lw->weight) * 2**10 * lw->inv_weight
>

weight / lw->weight should be equal to scale_load_down(weight) /
scale_load_down(lw->weight)
= scale_load_down(weight) * lw->inv_weight / (2**WMULT_SHIFT)
Right?

> So IIUC, either both lines should be update, either none.
> (meaning that:
>    delta_exec * NICE_0_LOAD / lw->weight
> should be changed to
>    delta_exec * scale_load_down(NICE_0_LOAD) / lw->weight

I think this is not correct? scale_load_down(NICE_0_LOAD) is the true
weight, as mapped
directly from the task's nice/priority value, while lw->weight is the
scaled_up load.
Their units/scales don't match.

I am quite new to the source code. I could be wrong. But would like to
see more clarifications
on this.

> )
>
> I assume it's better to let the comment as is.
>
>
> >
> > Also updated the comment for calc_delta_fair() to make it more
> > accurate.
> >
> > Signed-off-by: Dawei Li <daweilics@...il.com>
> > ---
> >   kernel/sched/fair.c | 29 ++++++++++-------------------
> >   1 file changed, 10 insertions(+), 19 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 6a16129f9a5c..c5cdb15f7d62 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -252,32 +252,23 @@ static void __update_inv_weight(struct load_weight *lw)
> >   }
> >
> >   /*
> > - * delta_exec * weight / lw.weight
> > + * delta_exec * NICE_0_LOAD / lw->weight
> >    *   OR
> > - * (delta_exec * (weight * lw->inv_weight)) >> WMULT_SHIFT
> > + * (delta_exec * scale_load_down(NICE_0_LOAD) * lw->inv_weight) >> WMULT_SHIFT
> >    *
> > - * Either weight := NICE_0_LOAD and lw \e sched_prio_to_wmult[], in which case
> > - * we're guaranteed shift stays positive because inv_weight is guaranteed to
> > - * fit 32 bits, and NICE_0_LOAD gives another 10 bits; therefore shift >= 22.
> > - *
> > - * Or, weight =< lw.weight (because lw.weight is the runqueue weight), thus
> > - * weight/lw.weight <= 1, and therefore our shift will also be positive.
> > + * We're guaranteed shift stays positive because inv_weight is guaranteed to
> > + * fit 32 bits, and scale_load_down(NICE_0_LOAD) gives another 10 bits;
> > + * therefore shift >= 22.
> >    */
> > -static u64 __calc_delta(u64 delta_exec, unsigned long weight, struct load_weight *lw)
> > +static u64 __calc_delta(u64 delta_exec, struct load_weight *lw)
> >   {
> > -     u64 fact = scale_load_down(weight);
> > -     u32 fact_hi = (u32)(fact >> 32);
> > +     u64 fact = scale_load_down(NICE_0_LOAD);
> > +     int fact_hi;
> >       int shift = WMULT_SHIFT;
> >       int fs;
>
> NIT: maybe re-ordering the variables to have a reverse tree
>
> Otherwise, the patch looks good to me,
> Regards,
> Pierre