linux-kernel - Re: [PATCH V2 3/3] cpufreq: CPPC: Add support for frequency invariance

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20210617032416.r2gfp25xxvhc5t4x@vireshk-i7>
Date:   Thu, 17 Jun 2021 08:54:16 +0530
From:   Viresh Kumar <viresh.kumar@...aro.org>
To:     Ionela Voinescu <ionela.voinescu@....com>
Cc:     Rafael Wysocki <rjw@...ysocki.net>,
        Sudeep Holla <sudeep.holla@....com>,
        Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        linux-pm@...r.kernel.org, Qian Cai <quic_qiancai@...cinc.com>,
        linux-acpi@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH V2 3/3] cpufreq: CPPC: Add support for frequency
 invariance

On 16-06-21, 13:48, Ionela Voinescu wrote:
> I was looking forward to the complete removal of stop_cpu() :).

No one wants to remove more code than I do :)

> I'll only comment on this for now as I should know the rest.
> 
> Let's assume we don't have these, what happens now is the following:
> 
> 1. We hotplug out the last CPU in a policy, we call the
>    .stop_cpu()/exit() function which will free the cppc_cpudata structure.
> 
>    The only vulnerability is if we have a last tick on that last CPU,
>    after the above callback was called.
> 
> 2. When the CPU at 1. gets hotplugged back in, the cppc_fi->cpu_data is
>    stale.
> 
> We do not have a problem when removing the CPPC cpufreq module as we're
> doing cppc_freq_invariance_exit() before unregistering the driver and
> freeing the data.
> 
> Are 1. and 2 the only problems we have, or have I missed any?

There is more to it. For simplicity, lets assume a quad-core setup,
with all 4 CPUs sharing the cpufreq policy. And here is what happens
without the new changes:

- On CPPC cpufreq driver insertion, we start 4 kthreads/irq-works (1
  for each CPU as it fires from tick) from the ->init() callback.

- Now lets say we offline CPU3. The CPPC cpufreq driver isn't notified
  by anyone and it hasn't registered itself to hotplug notifier as
  well. So, the irq-work/kthread isn't stopped. This results in the
  issue reported by Qian earlier.

  The very same thing happens with schedutil governor too, which uses
  very similar mechanisms, and the cpufreq core takes care of it there
  by stopping the governor before removing the CPU from policy->cpus
  and starting it again. So there we stop irq-work/kthread for all 4
  CPUs, then start them only for remaining 3.

  I thought about that approach as well, but it was too heavy to stop
  everyone and start again in this case. And so I invented start_cpu()
  and stop_cpu() callbacks.

- In this case, because the CPU is going away, we need to make sure we
  don't queue any more irq-work or kthread to it and this is one of
  the main reasons for adding synchronization in the topology core,
  because we need a hard guarantee here that irq-work won't fire
  again, as the CPU won't be there or will not be in a sane state.

- The same sequence of events is true for the case where the last CPU
  of a policy goes away (not in this example, but lets say quad-core
  setup with separate policies for each CPU).

- Not just the policy, but the CPU may be going away as well.

I hope I was able to clarify a bit here.

-- 
viresh