lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 4 Nov 2020 10:57:20 +0000
From:   Lukasz Luba <lukasz.luba@....com>
To:     Daniel Lezcano <daniel.lezcano@...aro.org>
Cc:     rafael@...nel.org, srinivas.pandruvada@...ux.intel.com,
        linux-kernel@...r.kernel.org, linux-pm@...r.kernel.org,
        rui.zhang@...el.com, "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Palmer Dabbelt <palmerdabbelt@...gle.com>,
        Anup Patel <anup.patel@....com>,
        Atish Patra <atish.patra@....com>,
        Marc Zyngier <maz@...nel.org>,
        Andrew Jones <drjones@...hat.com>,
        Michael Kelley <mikelley@...rosoft.com>,
        Mike Leach <mike.leach@...aro.org>,
        Kajol Jain <kjain@...ux.ibm.com>,
        Daniel Jordan <daniel.m.jordan@...cle.com>,
        Steven Price <steven.price@....com>
Subject: Re: [PATCH 4/4] powercap/drivers/dtpm: Add CPU energy model based
 support



On 11/4/20 10:47 AM, Daniel Lezcano wrote:
> 
> Hi Lukasz,
> 
> 
> On 23/10/2020 15:27, Lukasz Luba wrote:
>> Hi Daniel,
>>
>>
>> On 10/6/20 1:20 PM, Daniel Lezcano wrote:
>>> With the powercap dtpm controller, we are able to plug devices with
>>> power limitation features in the tree.
>>>
>>> The following patch introduces the CPU power limitation based on the
>>> energy model and the performance states.
>>>
>>> The power limitation is done at the performance domain level. If some
>>> CPUs are unplugged, the corresponding power will be substracted from
>>> the performance domain total power.
>>>
>>> It is up to the platform to initialize the dtpm tree and add the CPU.
>>>
>>> Here is an example to create a simple tree with one root node called
>>> "pkg" and the cpu's performance domains.
> 
> [ ... ]
> 
>>> +static int set_pd_power_limit(struct powercap_zone *pcz, int cid,
>>> +                  u64 power_limit)
>>> +{
>>> +    struct dtpm *dtpm = to_dtpm(pcz);
>>> +    struct dtpm_cpu *dtpm_cpu = dtpm->private;
>>> +    struct em_perf_domain *pd;
>>> +    unsigned long freq;
>>> +    int i, nr_cpus;
>>> +
>>> +    spin_lock(&dtpm->lock);
>>> +
>>> +    power_limit = clamp_val(power_limit, dtpm->power_min,
>>> dtpm->power_max);
>>> +
>>> +    pd = em_cpu_get(dtpm_cpu->cpu);
>>> +
>>> +    nr_cpus = cpumask_weight(to_cpumask(pd->cpus));
>>> +
>>> +    for (i = 0; i < pd->nr_perf_states; i++) {
>>> +
>>> +        u64 power = pd->table[i].power * MICROWATT_PER_MILLIWATT;
>>> +
>>> +        if ((power * nr_cpus) > power_limit)
>>
>> We have one node in that DTPM hierarchy tree, which represents all CPUs
>> which are in 'related_cpus' mask. I saw below that we just remove the
>> node in hotplug.
> 
> The last CPU hotplugged will remove the node.
> 
>> I have put a comment below asking if we could change the registration,
>> which will affect power calculation.
>>
>>
>>> +            break;
>>> +    }
>>> +
>>> +    freq = pd->table[i - 1].frequency;
>>> +
>>> +    freq_qos_update_request(&dtpm_cpu->qos_req, freq);
>>> +
>>> +    dtpm->power_limit = power_limit;
>>> +
>>> +    spin_unlock(&dtpm->lock);
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static int get_pd_power_limit(struct powercap_zone *pcz, int cid, u64
>>> *data)
>>> +{
>>> +    struct dtpm *dtpm = to_dtpm(pcz);
>>> +
>>> +    spin_lock(&dtpm->lock);
>>> +    *data = dtpm->power_max;
>>> +    spin_unlock(&dtpm->lock);
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static int get_pd_power_uw(struct powercap_zone *pcz, u64 *power_uw)
>>> +{
>>> +    struct dtpm *dtpm = to_dtpm(pcz);
>>> +    struct dtpm_cpu *dtpm_cpu = dtpm->private;
>>> +    struct em_perf_domain *pd;
>>> +    unsigned long freq;
>>> +    int i, nr_cpus;
>>> +
>>> +    freq = cpufreq_quick_get(dtpm_cpu->cpu);
>>> +    pd = em_cpu_get(dtpm_cpu->cpu);
>>> +    nr_cpus = cpumask_weight(to_cpumask(pd->cpus));
>>> +
>>> +    for (i = 0; i < pd->nr_perf_states; i++) {
>>> +
>>> +        if (pd->table[i].frequency < freq)
>>> +            continue;
>>> +
>>> +        *power_uw = pd->table[i].power *
>>> +            MICROWATT_PER_MILLIWATT * nr_cpus;
>>
>> Same here, we have 'nr_cpus'.
>>
>>> +
>>> +        return 0;
>>> +    }
>>> +
>>> +    return -EINVAL;
>>> +}
>>> +
>>> +static int cpu_release_zone(struct powercap_zone *pcz)
>>> +{
>>> +    struct dtpm *dtpm = to_dtpm(pcz);
>>> +    struct dtpm_cpu *dtpm_cpu = dtpm->private;
>>> +
>>> +    freq_qos_remove_request(&dtpm_cpu->qos_req);
>>> +
>>> +    return dtpm_release_zone(pcz);
>>> +}
>>> +
>>> +static struct powercap_zone_constraint_ops pd_constraint_ops = {
>>> +    .set_power_limit_uw = set_pd_power_limit,
>>> +    .get_power_limit_uw = get_pd_power_limit,
>>> +};
>>> +
>>> +static struct powercap_zone_ops pd_zone_ops = {
>>> +    .get_power_uw = get_pd_power_uw,
>>> +    .release = cpu_release_zone,
>>> +};
>>> +
>>> +static int cpuhp_dtpm_cpu_offline(unsigned int cpu)
>>> +{
>>> +    struct cpufreq_policy *policy;
>>> +    struct em_perf_domain *pd;
>>> +    struct dtpm *dtpm;
>>> +
>>> +    policy = cpufreq_cpu_get(cpu);
>>> +
>>> +    if (!policy)
>>> +        return 0;
>>> +
>>> +    pd = em_cpu_get(cpu);
>>> +    if (!pd)
>>> +        return -EINVAL;
>>> +
>>> +    dtpm = per_cpu(dtpm_per_cpu, cpu);
>>> +
>>> +    power_sub(dtpm, pd);
>>> +
>>> +    if (cpumask_weight(policy->cpus) != 1)
>>> +        return 0;
>>> +
>>> +    for_each_cpu(cpu, policy->related_cpus)
>>> +        per_cpu(dtpm_per_cpu, cpu) = NULL;
>>
>> Hotplugging one CPU would affect others. I would keep them
>> all but marked somehow that CPU is offline.
> 
> No, the last one will remove the node. This is checked in the test above
> (policy->cpus) != 1 ...
> 
>>> +
>>> +    dtpm_unregister(dtpm);
>>
>> Could we keep the node in the hierarchy on CPU hotplug?
> 
> [ ... ]
> 
>>> diff --git a/include/linux/dtpm.h b/include/linux/dtpm.h
>>> index 6696bdcfdb87..b62215a13baa 100644
>>> --- a/include/linux/dtpm.h
>>> +++ b/include/linux/dtpm.h
>>> @@ -70,4 +70,7 @@ int dtpm_register_parent(const char *name, struct
>>> dtpm *dtpm,
>>>    int dtpm_register(const char *name, struct dtpm *dtpm, struct dtpm
>>> *parent,
>>>              struct powercap_zone_ops *ops, int nr_constraints,
>>>              struct powercap_zone_constraint_ops *const_ops);
>>> +
>>> +int dtpm_register_cpu(struct dtpm *parent);
>>> +
>>>    #endif
>>>
>>
>> I have a few comments for this DTPM CPU.
>>
>> 1. Maybe we can register these CPUs differently. First register
>> the parent node as a separate dtpm based on 'policy->related_cpus. Then
>> register new children nodes, one for each CPU. When the CPU is up, mark
>> it as 'active'.
>>
>> 2. We don't remove the node when the CPU is hotplugged, but we mark it
>> '!active' Or 'offline'. The power calculation could be done in upper
>> node, which takes into account that flag for children.
>>
>> 3. We would only remove the node when it's module is unloaded (e.g. GPU)
>>
>> That would make the tree more stable and also more detailed.
>> We would also account the power properly when one CPU went offline, but
>> the other are still there.
>>
>> What do you think?
> 
> The paradigm of the DTPM is the intermediate nodes (have children), are
> aggregating the power of their children and do not represent the real
> devices. The leaves are the real devices which are power manageable.

OK, I see, it makes sense. Thanks for the explanation.

> 
> In our case, the CPU DTPM is based on the performance state which is a
> group of CPUs, hence it is a leaf of the tree.
> 
> I think you misunderstood the power is recomputed when the CPU is
> switched on/off and the node is removed when the last CPU is hotplugged.

Yes, you are right. I misunderstood the hotplug and then power calc.

> 
> eg. 1000mW max per CPU, a performance domain with 4 CPUs.
> 
> With all CPUs on, max power is 4000mW
> With 3 CPUs on, and 1 CPU off, max power is 3000mW
> 
> etc...
> 
> With 4 CPUs off, the node is removed.
> 
> If the hardware evolves with a performance domain per CPU, we will end
> up with a leaf per CPU and a "cluster" on top of them.
> 
> 

Let me go again through the patches and then I will add my reviewed by.

I will also run LTP hotplug or LISA hotplug torture on this tree,
just to check it's fine.

Regards,
Lukasz

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ