linux-kernel - Re: [RFC PATCH v2 1/2] sched/fair: Introduce UTIL_FITS

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZTjBbgiSXj6bWE2O@chenyu5-mobl2.ccr.corp.intel.com>
Date:   Wed, 25 Oct 2023 15:19:10 +0800
From:   Chen Yu <yu.c.chen@...el.com>
To:     Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
CC:     Dietmar Eggemann <dietmar.eggemann@....com>,
        Peter Zijlstra <peterz@...radead.org>,
        <linux-kernel@...r.kernel.org>, Ingo Molnar <mingo@...hat.com>,
        Valentin Schneider <vschneid@...hat.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        "Vincent Guittot" <vincent.guittot@...aro.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Swapnil Sapkal <Swapnil.Sapkal@....com>,
        Aaron Lu <aaron.lu@...el.com>,
        "Tim Chen" <tim.c.chen@...el.com>,
        K Prateek Nayak <kprateek.nayak@....com>,
        "Gautham R . Shenoy" <gautham.shenoy@....com>, <x86@...nel.org>
Subject: Re: [RFC PATCH v2 1/2] sched/fair: Introduce UTIL_FITS_CAPACITY
 feature (v2)

On 2023-10-24 at 10:49:37 -0400, Mathieu Desnoyers wrote:
> On 2023-10-24 02:10, Chen Yu wrote:
> > On 2023-10-23 at 11:04:49 -0400, Mathieu Desnoyers wrote:
> > > On 2023-10-23 10:11, Dietmar Eggemann wrote:
> > > > On 19/10/2023 18:05, Mathieu Desnoyers wrote:
> > > 
> > > [...]
> > > > > +static unsigned long scale_rt_capacity(int cpu);
> > > > > +
> > > > > +/*
> > > > > + * Returns true if adding the task utilization to the estimated
> > > > > + * utilization of the runnable tasks on @cpu does not exceed the
> > > > > + * capacity of @cpu.
> > > > > + *
> > > > > + * This considers only the utilization of _runnable_ tasks on the @cpu
> > > > > + * runqueue, excluding blocked and sleeping tasks. This is achieved by
> > > > > + * using the runqueue util_est.enqueued.
> > > > > + */
> > > > > +static inline bool task_fits_remaining_cpu_capacity(unsigned long task_util,
> > > > > +						    int cpu)
> > > > 
> > > > Or like find_energy_efficient_cpu() (feec(), used in
> > > > Energy-Aware-Scheduling (EAS)) which uses cpu_util(cpu, p, cpu, 0) to get:
> > > > 
> > > >     max(util_avg(CPU + p), util_est(CPU + p))
> > > 
> > > I've tried using cpu_util(), but unfortunately anything that considers
> > > blocked/sleeping tasks in its utilization total does not work for my
> > > use-case.
> > > 
> > >  From cpu_util():
> > > 
> > >   * CPU utilization is the sum of running time of runnable tasks plus the
> > >   * recent utilization of currently non-runnable tasks on that CPU.
> > > 
> > 
> > I thought cpu_util() indicates the utilization decay sum of task that was once
> > "running" on this CPU, but will not sum up the "util/load" of the blocked/sleeping
> > task?
> > 
> > accumulate_sum()
> >      /* only the running task's util will be sum up */
> >      if (running)
> >         sa->util_sum += contrib << SCHED_CAPACITY_SHIFT;
> > 
> > WRITE_ONCE(sa->util_avg, sa->util_sum / divider);
> 
> The accumulation into the cfs_rq->avg.util_sum indeed only happens when the task
> is running, which means that the task does not actively contribute to increment
> the util_sum when it is blocked/sleeping.
> 
> However, when the task is blocked/sleeping, the task is still attached to the
> runqueue, and therefore its historic util_sum still contributes to the cfs_rq
> util_sum/util_avg. This completely differs from what happens when the task is
> migrated to a different runqueue, in which case its util_sum contribution is
> entirely removed from the cfs_rq util_sum:
> 
> static void
> enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
> {
> [...]
>         update_load_avg(cfs_rq, se, UPDATE_TG | DO_ATTACH)
> [...]
> 
> static void
> dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
> {
> [...]
>         if (entity_is_task(se) && task_on_rq_migrating(task_of(se)))
>                 action |= DO_DETACH;
> [...]
> 
> static inline void update_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
> {
> [...]
>         if (!se->avg.last_update_time && (flags & DO_ATTACH)) {
> 
>                 /*
>                  * DO_ATTACH means we're here from enqueue_entity().
>                  * !last_update_time means we've passed through
>                  * migrate_task_rq_fair() indicating we migrated.
>                  *
>                  * IOW we're enqueueing a task on a new CPU.
>                  */
>                 attach_entity_load_avg(cfs_rq, se);
>                 update_tg_load_avg(cfs_rq);
> 
>         } else if (flags & DO_DETACH) {
>                 /*
>                  * DO_DETACH means we're here from dequeue_entity()
>                  * and we are migrating task out of the CPU.
>                  */
>                 detach_entity_load_avg(cfs_rq, se);
>                 update_tg_load_avg(cfs_rq);
> [...]
> 
> In comparison, util_est_enqueue()/util_est_dequeue() are called from enqueue_task_fair()
> and dequeue_task_fair(), which include blocked/sleeping tasks scenarios. Therefore, util_est
> only considers runnable tasks in its cfs_rq->avg.util_est.enqueued.
>
> The current rq utilization total used for rq selection should not include historic
> utilization of all blocked/sleeping tasks, because we are taking a decision to bring
> back a recently blocked/sleeping task onto a runqueue at that point. Considering
> the historic util_sum from the set of other blocked/sleeping tasks still attached to that
> runqueue in the current utilization mistakenly makes the rq selection think that the rq is
> busier than it really is.
> 

Thanks for the description in detail, it is very helpful! Now I understand that using cpu_util()
could overestimate the busyness of the CPU in UTIL_FITS_CAPACITY.

> I suspect that cpu_util_without() is an half-successful attempt at solving this by removing
> the task p from the considered utilization, but it does not take into account scenarios where many
> other tasks happen to be blocked/sleeping as well.

Agree, those non-migrated tasks could contribute to cfs_rq's util_avg.

thanks,
Chenyu