lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20141110202655.GB29741@lerouge>
Date:	Mon, 10 Nov 2014 21:26:59 +0100
From:	Frederic Weisbecker <fweisbec@...il.com>
To:	Christoph Lameter <cl@...ux.com>
Cc:	Thomas Gleixner <tglx@...utronix.de>, linux-kernel@...r.kernel.org,
	Gilad Ben-Yossef <gilad@...yossef.com>,
	Tejun Heo <tj@...nel.org>,
	John Stultz <john.stultz@...aro.org>,
	Mike Frysinger <vapier@...too.org>,
	Minchan Kim <minchan.kim@...il.com>,
	Hakan Akkan <hakanakkan@...il.com>,
	Max Krasnyansky <maxk@....qualcomm.com>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Hugh Dickins <hughd@...gle.com>,
	Viresh Kumar <viresh.kumar@...aro.org>,
	"H. Peter Anvin" <hpa@...or.com>, Ingo Molnar <mingo@...nel.org>,
	Peter Zijlstra <peterz@...radead.org>
Subject: Re: [NOHZ] Remove scheduler_tick_max_deferment

On Sat, Nov 01, 2014 at 04:52:13PM -0500, Christoph Lameter wrote:
> On Sat, 1 Nov 2014, Thomas Gleixner wrote:
> 
> > On Fri, 31 Oct 2014, Christoph Lameter wrote:
> > > The reasoning behind this function is not clear to me and removal seems
> >
> > The comment above the function is clear enough.
> 
> I looked around into the functions called by the timer interrupt for
> accounting etc. They have measures to compensate if the HZ is not
> occurring for some time.

Not very well. They handle correctly dynticks idle but not dynticks full.
Checkout update_cpu_load_active() -> __update_cpu_load() for example.

There is a pending_update argument that take care of tickless delta but
decay_load_miss() catch up with the missing cpu load assuming it was all 0 (idle)
all that time.

Generally speaking the scheduler assume dynticks to be idle dynticks. And that
concerns the above example and probably many other accounting.

Now the issue with update_cpu_load_active() is there, whether we keep 1 Hz or not,
any delta of full dynticks workload makes it buggy because it's accounted as idle
load.

But removing the 1 Hz residual tick is dangerous because many accounting in the
scheduler tick assume regular updates. It's mostly ok as long as the accounting
is exclusively updated and read locally. But some accounting is also updated locally
and read remotely. So if CPU 0 is full dynticks and runs for 1 hour in userspace and
CPU 1 reads its stats, those will be buggy because of the missing updates. At best
in this scenarion CPU 1 may consider that CPU 0 has been idle for 1 hour, at worst
the stats can be junk and there can be crashes. Also a lot of the scheduler decisions
is based on these accountings. Load balancing to the least.

So we have two possible solutions:

1) Make the scheduler more full-dynticks aware. Which means that any remote
stat accounting read must handle out of date results. That's going to be tricky: if
you check scheduler_tick() and sched_class::task_tick(), even simply trying to
sort out which stat is updated, can handle busy dynticks load, is read only locally
or can be read remotely, handles overflow, etc... That's enough work for an army of ants.

2) Offload scheduler_tick() to the housekeeping. It looks like many of the updaters
there can easily take a remote rq argument. There doesn't seem to be much local rq
assumption. So that's the easiest solution.

But we can't just remove scheduler_tick_max_deferment() and not fix things behind.
The result will be unpredictably insane and dangerous. The only predictable thing
that's going to happen if we do that is that nobody will ever fix it properly.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ