linux-kernel - Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160125201319.GA19020@dhcp22.suse.cz>
Date:	Mon, 25 Jan 2016 21:13:20 +0100
From:	Michal Hocko <mhocko@...nel.org>
To:	Christoph Lameter <cl@...ux.com>
Cc:	Mike Galbraith <umgwanakikbuti@...il.com>,
	Peter Zijlstra <peterz@...radead.org>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable
 again and shut down on idle)

On Mon 25-01-16 12:02:06, Christoph Lameter wrote:
> On Mon, 25 Jan 2016, Michal Hocko wrote:
> 
> > On Sat 23-01-16 17:21:55, Mike Galbraith wrote:
> > > Hi Christoph,
> > >
> > > While you're fixing that commit up, can you perhaps find a better home
> > > for quiet_vmstat()?  It not only munches cycles when switching cross
> > > -core mightily, for -rt it injects a sleeping lock into the idle task.
> > >
> > >     12.89%  [kernel]       [k] refresh_cpu_vm_stats.isra.12
> > >      4.75%  [kernel]       [k] __schedule
> > >      4.70%  [kernel]       [k] mutex_unlock
> > >      3.14%  [kernel]       [k] __switch_to
> >
> > Hmm, I wouldn't have expected that refresh_cpu_vm_stats could have
> > such a large footprint. I guess this would be just an expensive noop
> > because we have to check all the zones*counters and do an expensive
> > this_cpu_xchg. Is the whole deferred thing worth this overhead?
> 
> Why would the deferring cause this overhead?

I guess the profile speaks for itself, doesn't it?

> Also there is no cross core activity from quiet_vmstat(). It simply
> disables the local vmstat updates.

It doesn't go cross core but it still does nr_zones * counters atomic
ops.

> > Unless there is a clear and huge win from doing the vmstat update
> > deferrable then I think a revert is more appropriate IMHO.
> 
> It reduces the OS events that the application experiences by folding it
> into the tick events. If its not deferrable then a timer event will be
> generated in addition to the tick. We do not want that.

Yes this is what I have read in the changelog. But "how much" part is
really missing. Is this even quantifiable?

> Workqueues are used in many places. If RT can sleep within workqueue
> management functions then spinlocks cannot be taken anymore and there may
> be issues with preemption.

RT can sleep in _any_ spinlock except for raw spin locks. Even though
the !RT kernel is not sleeping doesn't really matter much because
cancel_delayed_work is quite a heavy function which shouldn't be called
from the idle context AFAIU. Sure most of the time it will boil down to
del_timer but it can hit the slowpath as well if the timer got migrated
to a different CPU and we have to race with the WQ pool management IIUC.

Maybe this overhead can be reduced by outsourcing the functionality to
vmstat_shepherd which can check idle CPUs, cancel the timer for them
update the differentials and put them to cpu_stat_off? 

> The regression that I know of (independent of "RT") is due as far as I
> know due to the switch of the parameters of some vmstat functions to 64
> bit instead of 32 bit.

I am not sure I am following.

-- 
Michal Hocko
SUSE Labs