lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140520162433.GE17741@localhost.localdomain>
Date:	Tue, 20 May 2014 18:24:36 +0200
From:	Frederic Weisbecker <fweisbec@...il.com>
To:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Cc:	Mike Galbraith <umgwanakikbuti@...il.com>,
	Paul Gortmaker <paul.gortmaker@...driver.com>,
	linux-kernel@...r.kernel.org, linux-rt-users@...r.kernel.org,
	Ingo Molnar <mingo@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is
 futile

On Tue, May 20, 2014 at 08:53:24AM -0700, Paul E. McKenney wrote:
> On Tue, May 20, 2014 at 04:53:52PM +0200, Frederic Weisbecker wrote:
> > I'm not sure that I really understand what you want here.
> > 
> > The current state of the art is that when you enable CONFIG_NO_HZ_FULL=y, full dynticks
> > is actually off by default. This is only overriden by "nohz_full=" boot parameter.
> 
> If I understand correctly, if there is no nohz_full= boot parameter,
> then the context-tracking code takes the early exit via the
> context_tracking_is_enabled() check in context_tracking_user_enter().

Exactly. It's even jump labeled. So it should, in the better arch support case,
resume to a single unconditional jump when it's off.

> I would not expect this to cause much in the way of syscall performance
> degradation.

Now the jump label concern all cases but syscalls (exceptions and irq). Syscalls
are even better off-case optimized with a TIF_NOHZ flag. So it goes down to the
slow path all-in-one condition. At least in x86.

> However, it looks like having even one CPU in nohz_full
> mode causes all CPUs to enable context tracking.

True unfortunately. It's necessary to track down syscalls and exceptions
entry exit across CPUs.

So if CPU 1 is full nohz and a task enters in userspace on CPU 0 and then migrates
to CPU 1, we must know there that it's resuming in userspace in order to stop the tick
confidently. So CPU 0 must do context tracking as well.

Of course one can argue that we can find out that the task is resuming in userspace from
CPU 0 scheduler entry without the need for previous context tracking, but I couldn't find safe
solution for that. This is because probing on user/kernel boundaries can only be done
in the soft way, throught explicit function calls. So there is an inevitable shift
between soft and hard boundaries, between what we probe and what we can guess.

> 
> My guess is that Mike wants to have (say) half of his CPUs running
> nohz_full, and the other half having fast system calls.  So my guess
> also is that he would like some way of having the non-nohz_full CPUs
> to opt out of the context-tracking overhead, including the memory
> barriers and atomic ops in rcu_user_enter() and rcu_user_exit().  ;-)

I see. So we could possibly restrict the context tracking to a bunch of
CPUs but only iff the tasks running there can't run on non-tracking CPUs.

Ah one possible thing is to rely on the NOHZ flag for that and check which
task needs to be tracked.

> > Now if what you need is to enable or disable it at runtime instead of boottime,
> > I must warn you that this is going to complicate the nohz code a lot (and also perhaps sched
> > and RCU).
> 
> What Frederic said!  Making RCU deal with this is possible, but a bit on
> the complicated side.  Given that I haven't heard too many people complaining
> that RCU is too simple, I would like to opt out of runtime changes to the
> nohz_full mask.

Agreed.

> 
> > I've already been eyed by vulturous frozen sharks flying in circles above me lately
> > after a few overengineering visions.
> 
> Nothing like the icy glare of a frozen shark, is there?  ;-)

I think they were even three-eyed!!!

> 
> > And given that the full nohz code is still in a baby shape, it's probably not the right
> > time to expand it that way. I haven't even yet heard about users who crossed the testing
> > stage of full nohz.
> > 
> > We'll probably extend it that way in the future. But likely not in a near future.
> 
> My guess is that Mike would be OK with making nohz_full choice of CPUs
> still at boot time, but that he would like the CPUs that are not to be
> in nohz_full state be able to opt out of the context-tracking overhead.

Ok that might be possible. Although still require a bit of complication.
Lets wait for Mike input.

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ