linux-kernel - Re: [PATCH 1/1] sched: Make schedstats a runtime tunable that is disabled by default v4

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160203124921.GA28953@gmail.com>
Date:	Wed, 3 Feb 2016 13:49:21 +0100
From:	Ingo Molnar <mingo@...nel.org>
To:	Mel Gorman <mgorman@...hsingularity.net>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	Matt Fleming <matt@...eblueprint.co.uk>,
	Mike Galbraith <mgalbraith@...e.de>,
	Srikar Dronamraju <srikar@...ux.vnet.ibm.com>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 1/1] sched: Make schedstats a runtime tunable that is
 disabled by default v4


* Mel Gorman <mgorman@...hsingularity.net> wrote:

> On Wed, Feb 03, 2016 at 12:28:49PM +0100, Ingo Molnar wrote:
> > 
> > * Mel Gorman <mgorman@...hsingularity.net> wrote:
> > 
> > > Changelog since v3
> > > o Force enable stats during profiling and latencytop
> > > 
> > > Changelog since V2
> > > o Print stats that are not related to schedstat
> > > o Reintroduce a static inline for update_stats_dequeue
> > > 
> > > Changelog since V1
> > > o Introduce schedstat_enabled and address Ingo's feedback
> > > o More schedstat-only paths eliminated, particularly ttwu_stat
> > > 
> > > schedstats is very useful during debugging and performance tuning but it
> > > incurs overhead. As such, even though it can be disabled at build time,
> > > it is often enabled as the information is useful.  This patch adds a
> > > kernel command-line and sysctl tunable to enable or disable schedstats on
> > > demand. It is disabled by default as someone who knows they need it can
> > > also learn to enable it when necessary.
> > > 
> > > The benefits are workload-dependent but when it gets down to it, the
> > > difference will be whether cache misses are incurred updating the shared
> > > stats or not. [...]
> > 
> > Hm, which shared stats are those?
> 
> Extremely poor phrasing on my part. The stats share a cache line and the impact 
> partially depends on whether unrelated stats share a cache line or not during 
> updates.

Yes, but the question is, are there true cross-CPU cache-misses? I.e. are there 
any 'global' (or per node) counters that we keep touching and which keep 
generating cache-misses?

> > I think we should really fix those as well: those shared stats should be 
> > percpu collected as well, with no extra cache misses in any scheduler fast 
> > path.
> 
> I looked into that but converting those stats to per-cpu counters would incur 
> sizable memory overhead. There are a *lot* of them and the basic structure for 
> the generic percpu-counter is
> 
> struct percpu_counter {
>         raw_spinlock_t lock;
>         s64 count;
> #ifdef CONFIG_HOTPLUG_CPU
>         struct list_head list;  /* All percpu_counters are on a list */
> #endif
>         s32 __percpu *counters;
> };

We don't have to reuse percpu_counter().

> That's not taking the associated runtime overhead such as synchronising them. 

Why do we have to synchronize them in the kernel? User-space can recover them on a 
percpu basis and add them up if it wishes to. We can update the schedstat utility 
to handle the more spread out fields as well.

Thanks,

	Ingo