linux-kernel - Re: [PATCH v1] cpufreq: qcom: Read voltage LUT and populate OPP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20181226193246.GG261387@google.com>
Date:   Wed, 26 Dec 2018 11:32:46 -0800
From:   Matthias Kaehlcke <mka@...omium.org>
To:     Taniya Das <tdas@...eaurora.org>
Cc:     "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Viresh Kumar <viresh.kumar@...aro.org>,
        linux-kernel@...r.kernel.org, linux-pm@...r.kernel.org,
        Stephen Boyd <sboyd@...nel.org>,
        Rajendra Nayak <rnayak@...eaurora.org>,
        devicetree@...r.kernel.org, robh@...nel.org,
        skannan@...eaurora.org, linux-arm-msm@...r.kernel.org,
        amit.kucheria@...aro.org, evgreen@...gle.com
Subject: Re: [PATCH v1] cpufreq: qcom: Read voltage LUT and populate OPP

Hi Taniya,

On Mon, Dec 24, 2018 at 12:29:18AM +0530, Taniya Das wrote:
> Hello Matthias,
> 
> Thanks for your review comments.
> 
> On 12/22/2018 2:27 AM, Matthias Kaehlcke wrote:
> > Hi Taniya,
> > 
> > On Fri, Dec 21, 2018 at 11:36:48PM +0530, Taniya Das wrote:
> > > Add support to read the voltage look up table and populate OPP for all
> > > corresponding CPUS.
> > > 
> > > Signed-off-by: Taniya Das <tdas@...eaurora.org>
> > > ---
> > >   drivers/cpufreq/qcom-cpufreq-hw.c | 32 ++++++++++++++++++++++++++++++--
> > >   1 file changed, 30 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c
> > > index d83939a..7559b87 100644
> > > --- a/drivers/cpufreq/qcom-cpufreq-hw.c
> > > +++ b/drivers/cpufreq/qcom-cpufreq-hw.c
> > > @@ -10,18 +10,21 @@
> > >   #include <linux/module.h>
> > >   #include <linux/of_address.h>
> > >   #include <linux/of_platform.h>
> > > +#include <linux/pm_opp.h>
> > >   #include <linux/slab.h>
> > > 
> > >   #define LUT_MAX_ENTRIES			40U
> > >   #define LUT_SRC				GENMASK(31, 30)
> > >   #define LUT_L_VAL			GENMASK(7, 0)
> > >   #define LUT_CORE_COUNT			GENMASK(18, 16)
> > > +#define LUT_VOLT			GENMASK(11, 0)
> > >   #define LUT_ROW_SIZE			32
> > >   #define CLK_HW_DIV			2
> > > 
> > >   /* Register offsets */
> > >   #define REG_ENABLE			0x0
> > > -#define REG_LUT_TABLE			0x110
> > > +#define REG_FREQ_LUT_TABLE		0x110
> > > +#define REG_VOLT_LUT_TABLE		0x114
> > 
> > The new names suggest that there is a LUT for frequencies and another
> > one for voltages. I don't have access to hardware documentation, but
> > from the code and offsets in this driver it seems there is a single
> > table at offset 0x110, with a 'row' of 32 bytes per OPP. Within this
> > row the frequency (and other values) is located at offset 0, the
> > voltage at offset 4.
> > 
> > I'd suggest to keep REG_LUT_TABLE, add a define LUT_OFFSET_VOLTAGE/MV
> > (or similar) and adjust the math in qcom_cpufreq_hw_read_lut() to use
 > > REG_LUT_TABLE as base offset.
> > 
> 
> These names are as per HW documentation and the math is kept as per the
> documentation for reading the voltage.

The HW documentation is confusing then and I'm not convinced this
should be carried over 1:1 to the driver. In any case this
documentation is only available to a reduced audience, why make it
harder for everyone else?

I think something like this would be preferable (removed _TABLE suffix,
since that's already part of LUT):

#define OFFSET_LUT		0x110
#define REG_FREQ_LUT		0x00
#define REG_VOLT_LUT		0x04

freq = read(OFFSET_LUT + (LUT_ROW_SIZE * i) + REG_FREQ_LUT);
volt = read(OFFSET_LUT + (LUT_ROW_SIZE * i) + REG_VOLT_LUT);

or probably better:

row_addr = OFFSET_LUT + (LUT_ROW_SIZE * i);
freq = read(row_addr + REG_FREQ_LUT);
volt = read(row_addr + REG_VOLT_LUT);

> > >   #define REG_PERF_STATE			0x920
> > > 
> > >   static unsigned long cpu_hw_rate, xo_rate;
> > > @@ -75,19 +78,26 @@ static int qcom_cpufreq_hw_read_lut(struct device *dev,
> > >   				    void __iomem *base)
> > >   {
> > >   	u32 data, src, lval, i, core_count, prev_cc = 0, prev_freq = 0, freq;
> > > +	u32 volt;
> > >   	unsigned int max_cores = cpumask_weight(policy->cpus);
> > >   	struct cpufreq_frequency_table	*table;
> > > +	unsigned long cpu_r;
> > 
> > nit: why 'cpu_r' and not just 'cpu'?
> > 
> > (if it is needed at all, see my comment below)
> > 
> > > 
> > >   	table = kcalloc(LUT_MAX_ENTRIES + 1, sizeof(*table), GFP_KERNEL);
> > >   	if (!table)
> > >   		return -ENOMEM;
> > > 
> > >   	for (i = 0; i < LUT_MAX_ENTRIES; i++) {
> > > -		data = readl_relaxed(base + REG_LUT_TABLE + i * LUT_ROW_SIZE);
> > > +		data = readl_relaxed(base + REG_FREQ_LUT_TABLE +
> > > +				      i * LUT_ROW_SIZE);
> > >   		src = FIELD_GET(LUT_SRC, data);
> > >   		lval = FIELD_GET(LUT_L_VAL, data);
> > >   		core_count = FIELD_GET(LUT_CORE_COUNT, data);
> > > 
> > > +		data = readl_relaxed(base + REG_VOLT_LUT_TABLE +
> > > +				      i * LUT_ROW_SIZE);
> > > +		volt = FIELD_GET(LUT_VOLT, data) * 1000;
> > > +
> > >   		if (src)
> > >   			freq = xo_rate * lval / 1000;
> > >   		else
> > > @@ -123,6 +133,10 @@ static int qcom_cpufreq_hw_read_lut(struct device *dev,
> > > 
> > >   		prev_cc = core_count;
> > >   		prev_freq = freq;
> > > +
> > > +		freq *= 1000;
> > > +		for_each_cpu(cpu_r, policy->cpus)
> > > +			dev_pm_opp_add(get_cpu_device(cpu_r), freq, volt);
> > 
> > Are you sure we want to duplicate the OPP entries for all CPUs in the
> > cluster? IIUC the frequencies of the cores in a cluster can't be
> > changed individually, hence the cores should have a shared table. I
> > think dev_pm_opp_get_sharing_cpus() does what you need.
> > 
> > You currently also add OPPs for invalid frequencies. From my SDM845
> > device:
> > 
> > cat /sys/devices/system/cpu/cpufreq/policy4/scaling_available_freq
> >    => 825600 902400 979200 1056000 1209600 1286400 1363200 1459200
> >    1536000 1612800 1689600 1766400 1843200 1920000 1996800 2092800
> >    2169600 2246400 2323200 2400000 2476800 2553600 2649600
> > 
> > cat /sys/devices/system/cpu/cpufreq/policy4/scaling_boost_frequencies
> > 2803200
> > 
> > ls /sys/kernel/debug/opp/cpu4/
> > opp:1056000000  opp:1612800000  opp:2092800000  opp:2553600000  opp:825600000
> > opp:1209600000  opp:1689600000  opp:2169600000  opp:2649600000  opp:902400000
> > opp:1286400000  opp:1766400000  opp:2246400000  opp:2707200000  opp:979200000
> > opp:1363200000  opp:1843200000  opp:2323200000  opp:2764800000
> > opp:1459200000  opp:1920000000  opp:2400000000  opp:2784000000
> > opp:1536000000  opp:1996800000  opp:2476800000  opp:2803200000
> > 
> > There are OPP entries for 2707200000, 2764800000 and 2784000000 Hz,
> > however these frequencies appear neither in available_frequencies nor
> > boost_frequencies.
> > 
> > >   	}
> > > 
> 
> Could you help validating with the patch below?
> 
> > >   	table[i].frequency = CPUFREQ_TABLE_END;
> > > @@ -159,10 +173,18 @@ static int qcom_cpufreq_hw_cpu_init(struct cpufreq_policy *policy)
> > >   	struct device *dev = &global_pdev->dev;
> > >   	struct of_phandle_args args;
> > >   	struct device_node *cpu_np;
> > > +	struct device *cpu_dev;
> > >   	struct resource *res;
> > >   	void __iomem *base;
> > >   	int ret, index;
> > > 
> > > +	cpu_dev = get_cpu_device(policy->cpu);
> > > +	if (!cpu_dev) {
> > > +		pr_err("%s: failed to get cpu%d device\n", __func__,
> > > +		       policy->cpu);
> > > +		return -ENODEV;
> > > +	}
> > > +
> > >   	cpu_np = of_cpu_device_node_get(policy->cpu);
> > >   	if (!cpu_np)
> > >   		return -EINVAL;
> > > @@ -205,6 +227,12 @@ static int qcom_cpufreq_hw_cpu_init(struct cpufreq_policy *policy)
> > >   		goto error;
> > >   	}
> > > 
> > > +	ret = dev_pm_opp_get_opp_count(cpu_dev);
> > > +	if (ret <= 0) {
> > > +		dev_err(cpu_dev, "OPP table is not ready\n");
> > > +		goto error;
> > > +	}
> > > +
> > >   	policy->fast_switch_possible = true;
> > > 
> > >   	return 0;
> > 
> > I suppose we want to remove the OPPs when the cpufreq driver is
> > unloaded, looks like dev_pm_opp_cpumask_remove_table() should do the
> > trick.
> > 
> > Cheers
> > 
> > Matthias
> > 
> 
> diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c
> b/drivers/cpufreq/qcom-cpufreq-hw.c
> index 7559b87..23338b2 100644
> --- a/drivers/cpufreq/qcom-cpufreq-hw.c
> +++ b/drivers/cpufreq/qcom-cpufreq-hw.c
> @@ -81,7 +81,6 @@ static int qcom_cpufreq_hw_read_lut(struct device *dev,
>         u32 volt;
>         unsigned int max_cores = cpumask_weight(policy->cpus);
>         struct cpufreq_frequency_table  *table;
> -       unsigned long cpu_r;
> 
>         table = kcalloc(LUT_MAX_ENTRIES + 1, sizeof(*table), GFP_KERNEL);
>         if (!table)
> @@ -110,6 +109,8 @@ static int qcom_cpufreq_hw_read_lut(struct device *dev,
>                         table[i].frequency = freq;
>                         dev_dbg(dev, "index=%d freq=%d, core_count %d\n", i,
>                                 freq, core_count);
> +                       dev_pm_opp_add(get_cpu_device(policy->cpu),
> +                                       freq * 1000, volt);

nit: I'd suggest to put this before dev_dbg() or assign the table
after dev_dbg(), to keep the actual actions together instead of
splitting them unnecessarily with a debug log.

>                 }
> 
>                 /*
> @@ -126,6 +127,8 @@ static int qcom_cpufreq_hw_read_lut(struct device *dev,
>                         if (prev_cc != max_cores) {
>                                 prev->frequency = prev_freq;
>                                 prev->flags = CPUFREQ_BOOST_FREQ;
> +                               dev_pm_opp_add(get_cpu_device(policy->cpu),
> +                                               prev_freq * 1000, volt);
>                         }
> 
>                         break;
> @@ -133,12 +136,9 @@ static int qcom_cpufreq_hw_read_lut(struct device *dev,
> 
>                 prev_cc = core_count;
>                 prev_freq = freq;
> -
> -               freq *= 1000;
> -               for_each_cpu(cpu_r, policy->cpus)
> -                       dev_pm_opp_add(get_cpu_device(cpu_r), freq, volt);
>         }
> 
> +       dev_pm_opp_set_sharing_cpus(get_cpu_device(policy->cpu),
> policy->cpus);

nit: since the loop above first initializes the table and then adds
the OPP it would be slightly more consistent to also finish the
table business first here and then handle the OPPs. Shouldn't make a
functional difference though, just a suggestion.

>         table[i].frequency = CPUFREQ_TABLE_END;
>         policy->freq_table = table;
> 
> @@ -245,6 +245,7 @@ static int qcom_cpufreq_hw_cpu_exit(struct
> cpufreq_policy *policy)
>  {
>         void __iomem *base = policy->driver_data - REG_PERF_STATE;
> 
> +       dev_pm_opp_cpumask_remove_table(policy->cpus);
>         kfree(policy->freq_table);
>         devm_iounmap(&global_pdev->dev, base);
>

Looks good to me except for the nits.

Unbinding the device ("echo 17d43000.cpufreq > /sys/bus/platform/drivers/qcom-cpufreq-hw/unbind")
results in a similar lockdep spat as the one reported earlier by
Stephen (https://lore.kernel.org/patchwork/patch/1024546/#1209031),
this time involving 'dev_pm_opp_cpumask_remove_table', however I don't
think this an issue introduced by this patch.

Thanks

Matthias