[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b5e35f904174905d8f90df3f49944b22389126c7.camel@linux.intel.com>
Date: Tue, 23 May 2023 05:19:49 -0700
From: srinivas pandruvada <srinivas.pandruvada@...ux.intel.com>
To: "Rafael J. Wysocki" <rafael@...nel.org>,
Fieah Lim <kweifat@...il.com>
Cc: lenb@...nel.org, linux-pm@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] cpufreq: intel_pstate: Avoid initializing variables
prematurely
On Tue, 2023-05-23 at 13:08 +0200, Rafael J. Wysocki wrote:
> On Tue, May 23, 2023 at 10:51 AM Fieah Lim <kweifat@...il.com> wrote:
> >
> > We should avoid initializing some rather expensive data
> > when the function has a chance to bail out earlier
> > before actually using it.
> > This applies to the following initializations:
> >
> > - cpudata *cpu = all_cpu_data; (in everywhere)
> > - this_cpu = smp_processor_id(); (in notify_hwp_interrupt)
> > - hwp_cap = READ_ONCE(cpu->hwp_cap_cached); (in
> > intel_pstate_hwp_boost_up)
> >
> > These initializations are premature because there is a chance
> > that the function will bail out before actually using the data.
> > I think this qualifies as a micro-optimization,
> > especially in such a hot path.
> >
> > While at it, tidy up how and when we initialize
> > all of the cpu_data pointers, for the sake of consistency.
> >
> > A side note on the intel_pstate_cpu_online change:
> > we simply don't have to initialize cpudata just
> > for the pr_debug, while policy->cpu is being there.
> >
> > Signed-off-by: Fieah Lim <kweifat@...il.com>
> > ---
> > V1 -> V2: Rewrite changelog for better explanation.
> >
[...]
> > void notify_hwp_interrupt(void)
> > {
> > - unsigned int this_cpu = smp_processor_id();
> > + unsigned int this_cpu;
> > struct cpudata *cpudata;
> > unsigned long flags;
> > u64 value;
> > @@ -1591,6 +1593,8 @@ void notify_hwp_interrupt(void)
> > if (!(value & 0x01))
> > return;
> >
> > + this_cpu = smp_processor_id();
> > +
> > spin_lock_irqsave(&hwp_notify_lock, flags);
> >
> > if (!cpumask_test_cpu(this_cpu, &hwp_intr_enable_mask))
>
> This is a place where it may really matter for performance, but how
> much? Can you actually estimate this?
If DEBUG_PREEMPT is defined
~12 instructions (most of them with latency = 1 in dependency chain)
Thanks,
Srinivas
>
> > @@ -2024,8 +2028,8 @@ static int hwp_boost_hold_time_ns = 3 *
> > NSEC_PER_MSEC;
> >
> > static inline void intel_pstate_hwp_boost_up(struct cpudata *cpu)
> > {
> > + u64 hwp_cap;
> > u64 hwp_req = READ_ONCE(cpu->hwp_req_cached);
> > - u64 hwp_cap = READ_ONCE(cpu->hwp_cap_cached);
> > u32 max_limit = (hwp_req & 0xff00) >> 8;
> > u32 min_limit = (hwp_req & 0xff);
> > u32 boost_level1;
> > @@ -2052,6 +2056,7 @@ static inline void
> > intel_pstate_hwp_boost_up(struct cpudata *cpu)
> > cpu->hwp_boost_min = min_limit;
> >
> > /* level at half way mark between min and guranteed */
> > + hwp_cap = READ_ONCE(cpu->hwp_cap_cached);
> > boost_level1 = (HWP_GUARANTEED_PERF(hwp_cap) + min_limit)
> > >> 1;
> >
> > if (cpu->hwp_boost_min < boost_level1)
>
> For clarity, the original code is much better than the new one and
> the
> only case where hwp_cap is not used is when that single read doesn't
> matter. Moreover, the compiler is free to optimize it too.
>
> > @@ -2389,9 +2394,7 @@ static const struct x86_cpu_id
> > intel_pstate_cpu_ee_disable_ids[] = {
> >
> > static int intel_pstate_init_cpu(unsigned int cpunum)
> > {
> > - struct cpudata *cpu;
> > -
> > - cpu = all_cpu_data[cpunum];
> > + struct cpudata *cpu = all_cpu_data[cpunum];
> >
> > if (!cpu) {
> > cpu = kzalloc(sizeof(*cpu), GFP_KERNEL);
> > @@ -2431,11 +2434,13 @@ static int intel_pstate_init_cpu(unsigned
> > int cpunum)
> >
> > static void intel_pstate_set_update_util_hook(unsigned int
> > cpu_num)
> > {
> > - struct cpudata *cpu = all_cpu_data[cpu_num];
> > + struct cpudata *cpu;
> >
> > if (hwp_active && !hwp_boost)
> > return;
> >
> > + cpu = all_cpu_data[cpu_num];
> > +
> > if (cpu->update_util_set)
> > return;
> >
>
> This isn't a hot path.
>
> > @@ -2638,9 +2643,7 @@ static int intel_cpufreq_cpu_offline(struct
> > cpufreq_policy *policy)
> >
> > static int intel_pstate_cpu_online(struct cpufreq_policy *policy)
> > {
> > - struct cpudata *cpu = all_cpu_data[policy->cpu];
> > -
> > - pr_debug("CPU %d going online\n", cpu->cpu);
> > + pr_debug("CPU %d going online\n", policy->cpu);
> >
> > intel_pstate_init_acpi_perf_limits(policy);
> >
> > @@ -2649,6 +2652,8 @@ static int intel_pstate_cpu_online(struct
> > cpufreq_policy *policy)
> > * Re-enable HWP and clear the "suspended" flag to
> > let "resume"
> > * know that it need not do that.
> > */
> > + struct cpudata *cpu = all_cpu_data[policy->cpu];
> > +
> > intel_pstate_hwp_reenable(cpu);
> > cpu->suspended = false;
>
> Same here and I don't see why the change matters.
>
> > }
> > --
>
> There is one piece in this patch that may be regarded as useful.
> Please send it separately.
Powered by blists - more mailing lists