linux-kernel - Re: [PATCH 08/32] nohz: Try not to give the timekeeping duty to an adaptive tickless cpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1332923983.2528.12.camel@twins>
Date:	Wed, 28 Mar 2012 10:39:43 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Christoph Lameter <cl@...ux.com>
Cc:	Frederic Weisbecker <fweisbec@...il.com>,
	LKML <linux-kernel@...r.kernel.org>,
	linaro-sched-sig@...ts.linaro.org,
	Alessio Igor Bogani <abogani@...nel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Avi Kivity <avi@...hat.com>,
	Chris Metcalf <cmetcalf@...era.com>,
	Daniel Lezcano <daniel.lezcano@...aro.org>,
	Geoff Levand <geoff@...radead.org>,
	Gilad Ben Yossef <gilad@...yossef.com>,
	Ingo Molnar <mingo@...nel.org>,
	Max Krasnyansky <maxk@...lcomm.com>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Stephen Hemminger <shemminger@...tta.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Sven-Thorsten Dietrich <thebigcorporation@...il.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Zen Lin <zen@...nhuawei.org>,
	Dimitri Sivanich <sivanich@....com>
Subject: Re: [PATCH 08/32] nohz: Try not to give the timekeeping duty to an
 adaptive tickless cpu

On Tue, 2012-03-27 at 20:12 -0500, Christoph Lameter wrote:
> On Tue, 27 Mar 2012, Peter Zijlstra wrote:
> 
> > On Tue, 2012-03-27 at 11:08 -0500, Christoph Lameter wrote:
> > >
> > > I wish you would disentangle the nohz work from the cpusets. Cpusets is
> > > aged and being replaced by cgroups. And the cgroup work is something that
> > > is not suitable for many loads given the VM overhead added.
> >
> > What VM overhead? Are you talking about the memcg nonsense? That's
> > entirely optional, you don't need to either build that or enable it.
> 
> cgroups in general cause a much more complex VM processing with multiple
> LRUs and additional checks in various places.

Uhm, not if you don't have the memcg thing enabled, the controllers are
separate.

> Even just adding cpusets enables the group scheduler functionality f.e.
> which creates significantly larger scheduling latencies. Also complicates
> key allocation VM paths etc etc.

No, you're mistaken.

Its perfectly possible to compile a kernel with

CONFIG_CGROUPS=y
CONFIG_CPUSETS=y
CONFIG_CGROUP_MEM_RES_CTLR=n
CONFIG_CGROUP_SCHED=n

That will give you cpusets, but not the cpu (sched) controller crap and
not the memcg (vm) controller muck.

> > And if we ever get rid of that multiple hierarchy nonsense I don't see a
> > reason to get rid of cpuset at all. The only reason to want to replace
> > it is to avoid the dis-joint-ness it has with the cpu controller (and
> > possible the memcg one).
> 
> I like cpusets much more than cgroups. I agree with you.

cpusets is a cgroup controller..

> But I am not sure that cpusets are needed for nohz. We already have an
> isolcpu set and it sounds to me that nohz is generally useful.

I really really want to kill isolcpu in favour of cpusets, the amount of
disparity and overlap in features is driving me insane.

isolcpu will only create separate cpus, you can do the same with cpusets
by creating 1 cpu sets and disabling load_balance on the root set.

The only difference is that isolcpu will never have had a task running
on the cpu and hence its timer lists etc will be guaranteed empty. So
once we add an interface to push away and/or wait for a cpu to quiesce
we should end up with the same state.

At that point I'll rip isolcpu out. 

> It would seem that the nohz patches would be much simpler if it would not
> require cpusets to administer. The only thing that would be needed is to
> have one cpu that is not subject to nohz. The logical choice is a
> timekeeper cpu (which is usually cpu 0). Having that configurable would be
> an extra bonus.

Like Frederic has been telling, the nohz stuff adds syscall overhead, it
needs to timestamp on kernel entry/exit etc.. Making it unconditional
will add this overhead to everybody and this might not be acceptable.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/