linux-kernel - Re: [PATCH v2 2/4] powercap: idle_inject: Add prepare/complete callbacks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3e59c5216fad003f079224cd08a7da9b30f6365d.camel@linux.intel.com>
Date:   Wed, 21 Dec 2022 12:58:07 -0800
From:   srinivas pandruvada <srinivas.pandruvada@...ux.intel.com>
To:     Daniel Lezcano <daniel.lezcano@...aro.org>, rafael@...nel.org
Cc:     linux-pm@...r.kernel.org, linux-kernel@...r.kernel.org,
        rui.zhang@...el.com, amitk@...nel.org
Subject: Re: [PATCH v2 2/4] powercap: idle_inject: Add prepare/complete
 callbacks

Hi Daniel,

On Wed, 2022-12-21 at 15:52 +0100, Daniel Lezcano wrote:
> 
> Hi Srinivas,
> 
> On 30/11/2022 00:34, Srinivas Pandruvada wrote:
> > The actual idle percentage can be less than the desired because of
> > interrupts. Since the objective for CPU Idle injection is for
> > thermal
> > control, there should be some way to compensate for lost idle
> > percentage.
> > Some architectures provide interface to get actual idle percent
> > observed
> > by the hardware. So, the idle percent can be adjusted using the
> > hardware
> > feedback. For example, Intel CPUs provides package idle counters,
> > which
> > is currently used by intel powerclamp driver to adjust idle time.
> Can you provide an example in terms of timings?
> 
> I'm not getting how 'prepare' would do by returning a positive value
> to 
> skip the play_idle_precise() and what will do 'complete' ?
> 
intel_powerclamp has a logic where if the current idle percentage
observed from hardware is more than the desired target inject percent,
it skips calling play_idle().

For example if you want to inject 50% idle and system is naturally idle
for 60%, there is no use of calling play_idle in the idle injection
framework to induce more idle. In this way a workload can run
immediately.

So trying to emulate the same logic by using powercap/idle_inject
framework. So prepare() callback in the intel_powerclamp driver calls
the existing function to check if idle-inject should skip for this time
or not.

The complete() callback is to do just to adjust run duration. I don't
have a use case, but to add complementary callback to prepare() if
required.

Thanks,
Srinivas

> 
> > The only way this can be done currently is by monitoring hardware
> > idle
> > percent from a different software thread. This can be avoided by
> > adding
> > callbacks.
> > 
> > Add a capability to register a prepare and complete callback during
> > idle
> > inject registry. Add a new register function
> > idle_inject_register_full()
> > which also allows to register callbacks.
> > 
> > If they are not NULL, then prepare callback is called before
> > calling
> > play_idle_precise() and complete callback is called after calling
> > play_idle_precise().
> > 
> > If prepare callback is present and returns non 0 value then
> > play_idle_precise() is not called to avoid over compensation.
> > 
> > Signed-off-by: Srinivas Pandruvada
> > <srinivas.pandruvada@...ux.intel.com>
> > ---
> > v2
> > - Replace begin/end with prepare/complete
> > - Add new interface idle_inject_register_full with callbacks
> > - Update kernel doc
> > - Update commit description
> > 
> >   drivers/powercap/idle_inject.c | 62
> > +++++++++++++++++++++++++++++++---
> >   include/linux/idle_inject.h    |  4 +++
> >   2 files changed, 62 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/powercap/idle_inject.c
> > b/drivers/powercap/idle_inject.c
> > index dfa989182e71..f48e71501429 100644
> > --- a/drivers/powercap/idle_inject.c
> > +++ b/drivers/powercap/idle_inject.c
> > @@ -63,13 +63,31 @@ struct idle_inject_thread {
> >    * @idle_duration_us: duration of CPU idle time to inject
> >    * @run_duration_us: duration of CPU run time to allow
> >    * @latency_us: max allowed latency
> > + * @prepare: Callback function which is called before calling
> > + *             play_idle_precise()
> > + * @complete: Callback function which is called after calling
> > + *             play_idle_precise()
> >    * @cpumask: mask of CPUs affected by idle injection
> > + *
> > + * This structure is used to define per instance idle inject
> > device data. Each
> > + * instance has an idle duration, a run duration and mask of CPUs
> > to inject
> > + * idle.
> > + * Actual idle is injected by calling kernel scheduler interface
> > + * play_idle_precise(). There are two optional callbacks which the
> > caller can
> > + * register by calling idle_inject_register_full():
> > + * prepare() - This callback is called just before calling
> > play_idle_precise()
> > + *             If this callback returns non zero value then
> > + *             play_idle_precise() is not called. This means skip
> > injecting
> > + *             idle during this period.
> > + * complete() - This callback is called after calling
> > play_idle_precise().
> >    */
> >   struct idle_inject_device {
> >         struct hrtimer timer;
> >         unsigned int idle_duration_us;
> >         unsigned int run_duration_us;
> >         unsigned int latency_us;
> > +       int (*prepare)(unsigned int cpu);
> > +       void (*complete)(unsigned int cpu);
> >         unsigned long cpumask[];
> >   };
> >   
> > @@ -132,6 +150,7 @@ static void idle_inject_fn(unsigned int cpu)
> >   {
> >         struct idle_inject_device *ii_dev;
> >         struct idle_inject_thread *iit;
> > +       int ret;
> >   
> >         ii_dev = per_cpu(idle_inject_device, cpu);
> >         iit = per_cpu_ptr(&idle_inject_thread, cpu);
> > @@ -141,8 +160,18 @@ static void idle_inject_fn(unsigned int cpu)
> >          */
> >         iit->should_run = 0;
> >   
> > +       if (ii_dev->prepare) {
> > +               ret = ii_dev->prepare(cpu);
> > +               if (ret)
> > +                       goto skip;
> > +       }
> > +
> >         play_idle_precise(READ_ONCE(ii_dev->idle_duration_us) *
> > NSEC_PER_USEC,
> >                           READ_ONCE(ii_dev->latency_us) *
> > NSEC_PER_USEC);
> > +
> > +skip:
> > +       if (ii_dev->complete)
> > +               ii_dev->complete(cpu);
> >   }
> >   
> >   /**
> > @@ -295,17 +324,23 @@ static int idle_inject_should_run(unsigned
> > int cpu)
> >   }
> >   
> >   /**
> > - * idle_inject_register - initialize idle injection on a set of
> > CPUs
> > + * idle_inject_register_full - initialize idle injection on a set
> > of CPUs
> >    * @cpumask: CPUs to be affected by idle injection
> > + * @prepare: callback called before calling play_idle_precise()
> > + * @complete: callback called after calling play_idle_precise()
> >    *
> >    * This function creates an idle injection control device
> > structure for the
> > - * given set of CPUs and initializes the timer associated with
> > it.  It does not
> > - * start any injection cycles.
> > + * given set of CPUs and initializes the timer associated with it.
> > This
> > + * function also allows to register prepare() and complete()
> > callbacks.
> > + * It does not start any injection cycles.
> >    *
> >    * Return: NULL if memory allocation fails, idle injection
> > control device
> >    * pointer on success.
> >    */
> > -struct idle_inject_device *idle_inject_register(struct cpumask
> > *cpumask)
> > +
> > +struct idle_inject_device *idle_inject_register_full(struct
> > cpumask *cpumask,
> > +                                                    int
> > (*prepare)(unsigned int cpu),
> > +                                                    void
> > (*complete)(unsigned int cpu))
> >   {
> >         struct idle_inject_device *ii_dev;
> >         int cpu, cpu_rb;
> > @@ -318,6 +353,8 @@ struct idle_inject_device
> > *idle_inject_register(struct cpumask *cpumask)
> >         hrtimer_init(&ii_dev->timer, CLOCK_MONOTONIC,
> > HRTIMER_MODE_REL);
> >         ii_dev->timer.function = idle_inject_timer_fn;
> >         ii_dev->latency_us = UINT_MAX;
> > +       ii_dev->prepare = prepare;
> > +       ii_dev->complete = complete;
> >   
> >         for_each_cpu(cpu, to_cpumask(ii_dev->cpumask)) {
> >   
> > @@ -342,6 +379,23 @@ struct idle_inject_device
> > *idle_inject_register(struct cpumask *cpumask)
> >   
> >         return NULL;
> >   }
> > +EXPORT_SYMBOL_NS_GPL(idle_inject_register_full, IDLE_INJECT);
> > +
> > +/**
> > + * idle_inject_register - initialize idle injection on a set of
> > CPUs
> > + * @cpumask: CPUs to be affected by idle injection
> > + *
> > + * This function creates an idle injection control device
> > structure for the
> > + * given set of CPUs and initializes the timer associated with
> > it.  It does not
> > + * start any injection cycles.
> > + *
> > + * Return: NULL if memory allocation fails, idle injection control
> > device
> > + * pointer on success.
> > + */
> > +struct idle_inject_device *idle_inject_register(struct cpumask
> > *cpumask)
> > +{
> > +       return idle_inject_register_full(cpumask, NULL, NULL);
> > +}
> >   EXPORT_SYMBOL_NS_GPL(idle_inject_register, IDLE_INJECT);
> >   
> >   /**
> > diff --git a/include/linux/idle_inject.h
> > b/include/linux/idle_inject.h
> > index fb88e23a99d3..e18d89793490 100644
> > --- a/include/linux/idle_inject.h
> > +++ b/include/linux/idle_inject.h
> > @@ -13,6 +13,10 @@ struct idle_inject_device;
> >   
> >   struct idle_inject_device *idle_inject_register(struct cpumask
> > *cpumask);
> >   
> > +struct idle_inject_device *idle_inject_register_full(struct
> > cpumask *cpumask,
> > +                                                    int
> > (*prepare)(unsigned int cpu),
> > +                                                    void
> > (*complete)(unsigned int cpu));
> > +
> >   void idle_inject_unregister(struct idle_inject_device *ii_dev);
> >   
> >   int idle_inject_start(struct idle_inject_device *ii_dev);
>