lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <29bae535-6292-400e-813d-063498adbfce@quicinc.com>
Date:   Mon, 7 Aug 2023 08:52:11 +0530
From:   Pavan Kondeti <quic_pkondeti@...cinc.com>
To:     David Dai <davidai@...gle.com>
CC:     Pavan Kondeti <quic_pkondeti@...cinc.com>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        Viresh Kumar <viresh.kumar@...aro.org>,
        Rob Herring <robh+dt@...nel.org>,
        Krzysztof Kozlowski <krzysztof.kozlowski+dt@...aro.org>,
        Conor Dooley <conor+dt@...nel.org>,
        Sudeep Holla <sudeep.holla@....com>,
        Saravana Kannan <saravanak@...gle.com>,
        Quentin Perret <qperret@...gle.com>,
        Masami Hiramatsu <mhiramat@...gle.com>,
        Will Deacon <will@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        "Vincent Guittot" <vincent.guittot@...aro.org>,
        Marc Zyngier <maz@...nel.org>,
        "Oliver Upton" <oliver.upton@...ux.dev>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Gupta Pankaj <pankaj.gupta@....com>,
        Mel Gorman <mgorman@...e.de>, <kernel-team@...roid.com>,
        <linux-pm@...r.kernel.org>, <devicetree@...r.kernel.org>,
        <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v3 2/2] cpufreq: add virtual-cpufreq driver

On Fri, Aug 04, 2023 at 04:46:11PM -0700, David Dai wrote:
> Hi Pavan,
> 
> Thanks for reviewing!
> 
> On Wed, Aug 2, 2023 at 9:18 PM Pavan Kondeti <quic_pkondeti@...cinc.com> wrote:
> >
> > On Mon, Jul 31, 2023 at 10:46:09AM -0700, David Dai wrote:
> > > Introduce a virtualized cpufreq driver for guest kernels to improve
> > > performance and power of workloads within VMs.
> > >
> > > This driver does two main things:
> > >
> > > 1. Sends the frequency of vCPUs as a hint to the host. The host uses the
> > > hint to schedule the vCPU threads and decide physical CPU frequency.
> > >
> > > 2. If a VM does not support a virtualized FIE(like AMUs), it queries the
> > > host CPU frequency by reading a MMIO region of a virtual cpufreq device
> > > to update the guest's frequency scaling factor periodically. This enables
> > > accurate Per-Entity Load Tracking for tasks running in the guest.
> > >
> > > Co-developed-by: Saravana Kannan <saravanak@...gle.com>
> > > Signed-off-by: Saravana Kannan <saravanak@...gle.com>
> > > Signed-off-by: David Dai <davidai@...gle.com>
> >
> > [...]
> >
> > > +static void virt_scale_freq_tick(void)
> > > +{
> > > +     struct cpufreq_policy *policy = cpufreq_cpu_get(smp_processor_id());
> > > +     struct virt_cpufreq_drv_data *data = policy->driver_data;
> > > +     u32 max_freq = (u32)policy->cpuinfo.max_freq;
> > > +     u64 cur_freq;
> > > +     u64 scale;
> > > +
> > > +     cpufreq_cpu_put(policy);
> > > +
> > > +     cur_freq = (u64)data->ops->get_freq(policy);
> > > +     cur_freq <<= SCHED_CAPACITY_SHIFT;
> > > +     scale = div_u64(cur_freq, max_freq);
> > > +
> > > +     this_cpu_write(arch_freq_scale, (unsigned long)scale);
> > > +}
> > > +
> >
> > We expect the host to provide the frequency in kHz, can you please add a
> > comment about it. It is not very obvious when you look at the
> > REG_CUR_FREQ_OFFSET register name.
> 
> I’ll include a KHZ in the offset names.
> 

Sure, that would help. Also, can you limit the scale to
SCHED_CAPACITY_SCALE? It may be possible that host may be running at a
higher frequency than max_freq advertised on this guest.

> >
> > > +
> > > +static unsigned int virt_cpufreq_fast_switch(struct cpufreq_policy *policy,
> > > +             unsigned int target_freq)
> > > +{
> > > +     virt_cpufreq_set_perf(policy);
> > > +     return target_freq;
> > > +}
> > > +
> > > +static int virt_cpufreq_target_index(struct cpufreq_policy *policy,
> > > +             unsigned int index)
> > > +{
> > > +     return virt_cpufreq_set_perf(policy);
> > > +}
> > > +
> > > +static int virt_cpufreq_cpu_init(struct cpufreq_policy *policy)
> > > +{
> > > +     struct virt_cpufreq_drv_data *drv_data = cpufreq_get_driver_data();
> > > +     struct cpufreq_frequency_table *table;
> > > +     struct device *cpu_dev;
> > > +     int ret;
> > > +
> > > +     cpu_dev = get_cpu_device(policy->cpu);
> > > +     if (!cpu_dev)
> > > +             return -ENODEV;
> > > +
> > > +     ret = dev_pm_opp_of_add_table(cpu_dev);
> > > +     if (ret)
> > > +             return ret;
> > > +
> > > +     ret = dev_pm_opp_get_opp_count(cpu_dev);
> > > +     if (ret <= 0) {
> > > +             dev_err(cpu_dev, "OPP table can't be empty\n");
> > > +             return -ENODEV;
> > > +     }
> > > +
> > > +     ret = dev_pm_opp_init_cpufreq_table(cpu_dev, &table);
> > > +     if (ret) {
> > > +             dev_err(cpu_dev, "failed to init cpufreq table: %d\n", ret);
> > > +             return ret;
> > > +     }
> > > +
> > > +     policy->freq_table = table;
> > > +     policy->dvfs_possible_from_any_cpu = false;
> > > +     policy->fast_switch_possible = true;
> > > +     policy->driver_data = drv_data;
> > > +
> > > +     /*
> > > +      * Only takes effect if another FIE source such as AMUs
> > > +      * have not been registered.
> > > +      */
> > > +     topology_set_scale_freq_source(&virt_sfd, policy->cpus);
> > > +
> > > +     return 0;
> > > +
> > > +}
> > > +
> >
> > Do we need to register as FIE source even with the below commit? By
> > registering as a source, we are not supplying any accurate metric. We
> > still fallback on the same source that cpufreq implements it.
> 
> The arch_set_freq_scale() done at cpufreq driver’s frequency updates
> at cpufreq_freq_transition_end() and cpufreq_driver_fast_switch() only
> represent the guest’s frequency request. However, this does not
> accurately represent the physical CPU’s frequency that the vCPU is
> running on. E.g. There may be other processes sharing the same
> physical CPU that results in a much higher CPU frequency than what’s
> requested by the vCPU.
> 

understood that policy->cur may not reflect the actual frequency. Is this
something needs to be advertised to cpufreq core so that it query the
underlying cpufreq driver and use it for frequency scale updates. This
also gives userspace to read the actual frequency when read from sysfs.

In fact, cpufreq_driver_fast_switch() comment says that
cpufreq_driver::fast_switch() should return the *actual* frequency and
the same is used to update frequency scale updates. I understand that it
depends on other things like if host defer the frequency switch, the
value read from REG_CUR_FREQ_OFFSET may reflect the old value..

May be a comment in code would help.

Thanks,
Pavan

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ