[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1585734367.oqwn7dzljo.naveen@linux.ibm.com>
Date: Wed, 01 Apr 2020 15:28:48 +0530
From: "Naveen N. Rao" <naveen.n.rao@...ux.vnet.ibm.com>
To: "Gautham R. Shenoy" <ego@...ux.vnet.ibm.com>,
Kamalesh Babulal <kamalesh@...ux.vnet.ibm.com>,
Michael Ellerman <mpe@...erman.id.au>,
Nathan Lynch <nathanl@...ux.ibm.com>,
Vaidyanathan Srinivasan <svaidy@...ux.vnet.ibm.com>,
Tyrel Datwyler <tyreld@...ux.ibm.com>
Cc: linux-kernel@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org
Subject: Re: [PATCH v4 6/6] pseries/sysfs: Minimise IPI noise while reading
[idle_][s]purr
Gautham R. Shenoy wrote:
> From: "Gautham R. Shenoy" <ego@...ux.vnet.ibm.com>
>
> Currently purr, spurr, idle_purr, idle_spurr are exposed for every CPU
> via the sysfs interface
> /sys/devices/system/cpu/cpuX/[idle_][s]purr. Each sysfs read currently
> generates an IPI to obtain the desired value from the target CPU X.
> Since these aforementioned sysfs are typically read one after another,
> we end up generating 4 IPIs per CPU in a short duration.
>
> In order to minimize the IPI noise, this patch caches the values of
> all the four entities whenever one of them is read. If subsequently
> any of these are read within the next 10ms, the cached value is
> returned. With this, we will generate at most one IPI every 10ms for
> every CPU.
>
> Test-results: While reading the four sysfs files back-to-back for a
> given CPU every second for 100 seconds.
>
> Without the patch:
> 16 [XICS 2 Edge IPI] = 422 times
> DBL [Doorbell interrupts] = 13 times
> Total : 435 IPIs.
>
> With the patch:
> 16 [XICS 2 Edge IPI] = 111 times
> DBL [Doorbell interrupts] = 17 times
> Total : 128 IPIs.
>
> Signed-off-by: Gautham R. Shenoy <ego@...ux.vnet.ibm.com>
> ---
> arch/powerpc/kernel/sysfs.c | 117 ++++++++++++++++++++++++++++++++++++--------
> 1 file changed, 97 insertions(+), 20 deletions(-)
>
> diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
> index 571b325..bd92023 100644
> --- a/arch/powerpc/kernel/sysfs.c
> +++ b/arch/powerpc/kernel/sysfs.c
> @@ -586,8 +586,6 @@ void ppc_enable_pmcs(void)
> * SPRs which are not related to PMU.
> */
> #ifdef CONFIG_PPC64
> -SYSFS_SPRSETUP(purr, SPRN_PURR);
> -SYSFS_SPRSETUP(spurr, SPRN_SPURR);
> SYSFS_SPRSETUP(pir, SPRN_PIR);
> SYSFS_SPRSETUP(tscr, SPRN_TSCR);
>
> @@ -596,8 +594,6 @@ void ppc_enable_pmcs(void)
> enable write when needed with a separate function.
> Lets be conservative and default to pseries.
> */
> -static DEVICE_ATTR(spurr, 0400, show_spurr, NULL);
> -static DEVICE_ATTR(purr, 0400, show_purr, store_purr);
> static DEVICE_ATTR(pir, 0400, show_pir, NULL);
> static DEVICE_ATTR(tscr, 0600, show_tscr, store_tscr);
> #endif /* CONFIG_PPC64 */
> @@ -761,22 +757,110 @@ static void create_svm_file(void)
> }
> #endif /* CONFIG_PPC_SVM */
>
> +#ifdef CONFIG_PPC64
> +/*
> + * The duration (in ms) from the last IPI to the target CPU until
> + * which a cached value of purr, spurr, idle_purr, idle_spurr can be
> + * reported to the user on a corresponding sysfs file read. Beyond
> + * this duration, fresh values need to be obtained by sending IPIs to
> + * the target CPU when the sysfs files are read.
> + */
> +static unsigned long util_stats_staleness_tolerance_ms = 10;
This is a nice optimization for our use in lparstat, though I have a
concern below.
> +struct util_acct_stats {
> + u64 latest_purr;
> + u64 latest_spurr;
> +#ifdef CONFIG_PPC_PSERIES
> + u64 latest_idle_purr;
> + u64 latest_idle_spurr;
> +#endif
You can probably drop the 'latest_' prefix.
> + unsigned long last_update_jiffies;
> +};
> +
> +DEFINE_PER_CPU(struct util_acct_stats, util_acct_stats);
Per snowpatch, this should be static, and so should get_util_stats_ptr()
below:
https://openpower.xyz/job/snowpatch/job/snowpatch-linux-sparse/16601//artifact/linux/report.txt
> +
> +static void update_util_acct_stats(void *ptr)
> +{
> + struct util_acct_stats *stats = ptr;
> +
> + stats->latest_purr = mfspr(SPRN_PURR);
> + stats->latest_spurr = mfspr(SPRN_SPURR);
> #ifdef CONFIG_PPC_PSERIES
> -static void read_idle_purr(void *val)
> + stats->latest_idle_purr = read_this_idle_purr();
> + stats->latest_idle_spurr = read_this_idle_spurr();
> +#endif
> + stats->last_update_jiffies = jiffies;
> +}
> +
> +struct util_acct_stats *get_util_stats_ptr(int cpu)
> +{
> + struct util_acct_stats *stats = per_cpu_ptr(&util_acct_stats, cpu);
> + unsigned long delta_jiffies;
> +
> + delta_jiffies = jiffies - stats->last_update_jiffies;
> +
> + /*
> + * If we have a recent enough data, reuse that instead of
> + * sending an IPI.
> + */
> + if (jiffies_to_msecs(delta_jiffies) < util_stats_staleness_tolerance_ms)
> + return stats;
> +
> + smp_call_function_single(cpu, update_util_acct_stats, stats, 1);
> + return stats;
> +}
> +
> +static ssize_t show_purr(struct device *dev,
> + struct device_attribute *attr, char *buf)
> {
> - u64 *ret = val;
> + struct cpu *cpu = container_of(dev, struct cpu, dev);
> + struct util_acct_stats *stats;
>
> - *ret = read_this_idle_purr();
> + stats = get_util_stats_ptr(cpu->dev.id);
> + return sprintf(buf, "%llx\n", stats->latest_purr);
This alters the behavior of the current sysfs purr file. I am not sure
if it is reasonable to return the same PURR value across a 10ms window.
I wonder if we should introduce a sysctl interface to control
thresholding. It can default to 0, which disables thresholding so that
the existing behavior continues. Applications (lparstat) can optionally
set it to suit their use.
- Naveen
Powered by blists - more mailing lists