lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAJZ5v0jmFLoPFTg9GLKt--iWThG4ezzWM7MH69=Q2BtCBP+Giw@mail.gmail.com>
Date:   Thu, 12 Jan 2023 22:03:13 +0100
From:   "Rafael J. Wysocki" <rafael@...nel.org>
To:     srinivas pandruvada <srinivas.pandruvada@...ux.intel.com>
Cc:     "Rafael J. Wysocki" <rafael@...nel.org>, linux-pm@...r.kernel.org,
        linux-kernel@...r.kernel.org, daniel.lezcano@...aro.org,
        rui.zhang@...el.com, amitk@...nel.org,
        kernel test robot <lkp@...el.com>
Subject: Re: [PATCH v2 3/4] thermal/drivers/intel_powerclamp: Use powercap
 idle-inject framework

On Thu, Jan 12, 2023 at 9:23 PM srinivas pandruvada
<srinivas.pandruvada@...ux.intel.com> wrote:
>
> On Thu, 2023-01-12 at 19:32 +0100, Rafael J. Wysocki wrote:
> > On Wed, Nov 30, 2022 at 12:34 AM Srinivas Pandruvada
> > <srinivas.pandruvada@...ux.intel.com> wrote:
> > >
> > > There are two idle injection implementation in the Linux kernel.
> > > One
> > > via intel_powerclamp and the other using powercap/idle_inject. Both
> > > implementation end up in calling play_idle* function from a FIFO
> > > priority thread. Both can't be used at the same time.
> > >
> > > Currently per core idle injection (cpuidle_cooling) is using
> > > powercap/idle_inject, which is not used in platforms where
> > > intel_powerclamp is used for system wide idle injection. So there
> > > is
> > > no conflict. But there are some use cases where per core idle
> > > injection
> > > is beneficial on the same system where system wide idle injection
> > > is
> > > also used via intel_powerclamp. To avoid conflict only one of the
> > > idle
> > > injection type must be in use at a time. This require a common
> > > framework
> > > which both per core and system wide idle injection can use.
> > >
> > > Here powercap/idle_inject can be used for both per-core and for
> > > system
> > > wide idle injection. This framework has a well defined interface
> > > which allow registry for per-core or for all CPUs (system wide). If
> > > particular CPU is already participating in idle injection, the call
> > > to registry fails. Here the registry can be done when user space
> > > changes the current cooling device state.
> > >
> > > Also one framework for idle injection is better as there is one
> > > loop
> > > calling play_idle*, instead of multiple for better maintenance.
> > >
> > > So, reuse powercap/idle_inject calls in intel_powerclamp. This
> > > simplifies
> > > the code as all per CPU kthreads which calls play_idle* can be
> > > removed.
> > >
> > > The changes include:
> > > - Remove unneeded include files
> > > - Remove per CPU kthread workers: balancing_work and
> > > idle_injection_work
> > > - Reuse the compensation related code by moving from previous
> > > worker
> > > thread to idle_injection callbacks
> > > - Adjust the idle_duration and runtime by using
> > > powercap/idle_inject
> > > interface
> > > - Remove all variables, which are not required once
> > > powercap/idle_inject
> > > is used
> > > - Add mutex to avoid race during removal of idle injection during
> > > module
> > > unload and user action to change idle inject percent
> > > - Use READ_ONCE and WRITE_ONCE for data accessed from multiple CPUs
> > >
> > > Signed-off-by: Srinivas Pandruvada
> > > <srinivas.pandruvada@...ux.intel.com>
> > > ---
> > > v2:
> > > - Use idle_inject_register_full instead of idle_inject_register
> > > - Also fix dependency issue with POWERCAP config
> > > Reported-by: kernel test robot <lkp@...el.com>
> > >
> > >  drivers/thermal/intel/Kconfig            |   2 +
> > >  drivers/thermal/intel/intel_powerclamp.c | 292 ++++++++++---------
> > > ----
> > >  2 files changed, 126 insertions(+), 168 deletions(-)
> > >
> > > diff --git a/drivers/thermal/intel/Kconfig
> > > b/drivers/thermal/intel/Kconfig
> > > index f0c845679250..6c2a95f41c81 100644
> > > --- a/drivers/thermal/intel/Kconfig
> > > +++ b/drivers/thermal/intel/Kconfig
> > > @@ -3,6 +3,8 @@ config INTEL_POWERCLAMP
> > >         tristate "Intel PowerClamp idle injection driver"
> > >         depends on X86
> > >         depends on CPU_SUP_INTEL
> > > +       select POWERCAP
> > > +       select IDLE_INJECT
> > >         help
> > >           Enable this to enable Intel PowerClamp idle injection
> > > driver. This
> > >           enforce idle time which results in more package C-state
> > > residency. The
> > > diff --git a/drivers/thermal/intel/intel_powerclamp.c
> > > b/drivers/thermal/intel/intel_powerclamp.c
> > > index b80e25ec1261..3f2b20ae8f68 100644
> > > --- a/drivers/thermal/intel/intel_powerclamp.c
> > > +++ b/drivers/thermal/intel/intel_powerclamp.c
> > > @@ -2,7 +2,7 @@
> > >  /*
> > >   * intel_powerclamp.c - package c-state idle injection
> > >   *
> > > - * Copyright (c) 2012, Intel Corporation.
> > > + * Copyright (c) 2022, Intel Corporation.
> >
> > Nit: I would retain the original year of introduction, so 2012 -
> > 2022.
> OK
>
> >
> > >   *
> > >
>
> [...]
>
> > > +
> > > +static int idle_inject_begin(unsigned int cpu)
> >
> > So this would be the ->prepare() callback to be invoked on each CPU
> > from idle_inject_fn() IIUC.
> >
> Yes
>
> > >  {
> > > -       struct powerclamp_worker_data *w_data =
> > > per_cpu_ptr(worker_data, cpu);
> > > -       struct kthread_worker *worker;
> > > +       /*
> > > +        * only elected controlling cpu can collect stats and
> > > update
> > > +        * control parameters.
> > > +        */
> > > +       if (cpu == control_cpu) {
> > > +               bool update = READ_ONCE(target_ratio_updated);
> > > +
> > > +               if (!(powerclamp_data.count %
> > > powerclamp_data.window_size_now)) {
> > > +                       bool skip =
> > > powerclamp_adjust_controls(powerclamp_data.target_ratio,
> > > +
> > > powerclamp_data.guard,
> > > +
> > > powerclamp_data.window_size_now);
> > > +                       WRITE_ONCE(should_skip, skip);
> > > +                       update = true;
> > > +               }
> > >
> > > -       worker = kthread_create_worker_on_cpu(cpu, 0,
> > > "kidle_inj/%ld", cpu);
> > > -       if (IS_ERR(worker))
> > > -               return;
> > > +               if (update) {
> > > +                       unsigned int runtime;
> > > +
> > > +                       runtime = get_run_time();
> > > +                       idle_inject_set_duration(ii_dev, runtime,
> > > duration);
> > > +                       WRITE_ONCE(target_ratio_updated, false);
> > > +               }
> > > +               powerclamp_data.count++;
> > > +       }
> > > +
> > > +       if (READ_ONCE(should_skip))
> > > +               return -EAGAIN;
> >
> > This has a bit of a synchronization issue, because the control CPU is
> > not guaranteed to run this code before any other CPUs in the given
> > cycle, so at least some of them may see a stale value of should_skip
> > and they will still inject idle in this cycle.  Or else, they may
> > skip
> > idle injection when it should be done.
> This is correct observation. This is true in in even in current
> implementation. The per thread timer in the existing implementation has
> this sync issue. So I tried to just mimic current implementation as is.

I see, but I don't think that the new implementation has to be bug
compatible with the old one.

> >
> > I think that it would be better to run the callback from
> > idle_inject_timer_fn() where it would decide whether or not to call
> > idle_inject_wakeup(), in which case the control CPU would not be
> > needed any more (which would be a plus), because the "control" could
> > be done by the CPU running the timer function, whichever it is.
> >
> > Does this sound viable?
>
> Yes it is. In this case prepare() callback from idle_inject core is not
> per CPU, but per device.

Right.

BTW, I also would call it "update" and make it return bool, so
idle_inject_wakeup() would be called when it returned 'true'.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ