linux-kernel - Re: [RFC] dynticks: dynticks_idle is only modified locally use this

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.11.1409031226001.24691@gentwo.org>
Date:	Wed, 3 Sep 2014 12:43:15 -0500 (CDT)
From:	Christoph Lameter <cl@...ux.com>
To:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
cc:	Frederic Weisbecker <fweisbec@...il.com>,
	linux-kernel@...r.kernel.org
Subject: Re: [RFC] dynticks: dynticks_idle is only modified locally use
 this_cpu ops

On Wed, 3 Sep 2014, Paul E. McKenney wrote:

> > Well, a shared data structure would be cleaner in general but there are
> > certainly other approaches.
>
> Per-CPU variables -are- a shared data structure.

No the intent is for them to be for the particular cpu and therefore
there is limited support for sharing. Its not a shared data structure in
the classic sense.

The code in the rcu subsystem operates like other percpu code: There is a
modification of local variables by the current processor and other
processors inspect the state once in awhile. The other percpu code does
not need atomics and barriers. RCU for some reason (that is not clear to
me) does.

> > But lets focus on the dynticks_idle case we are discussing here rather
> > than tackle the more difficult other atomics. What is checked in the loop
> > over the remote cpus is the dynticks_idle value plus
> > dynticks_idle_jiffies. So it seems that memory ordering is only used to
> > ensure that the jiffies are seen correctly.
> >
> > In that case both the dynticks_idle and dynticks_idle_jiffies could be
> > placed in one 64 bit value. If this is stored and retrieved as one then
> > there is no issue with ordering anymore and the barriers would no longer
> > be needed.
>
> If there was an upper bound on the propagation of values through a system,
> I could buy this.

What is different in propagation speeds? The atomic read on the function
that checks for the quiescent period having passed is a regular read
anyways. The atomic_inc makes the cacheline propagate faster through the
system? I understand that it evicts the cachelines from other processors
caches (containing other percpu data by the way). That is the desired
effect?

> But Mike Galbraith checked the overhead of ->dynticks_idle and found
> it to be too small to measure.  So doesn't seem to be a problem worth
> extraordinary efforts, especially given that many systems can avoid
> it simply by leaving CONFIG_NO_HZ_SYSIDLE=n.

The code looks fragile and bound to have issues in the future given the
barriers/atomics etc. Its going to be cleaner without that.

And we are right now focusing on the simplest case. The atomics scheme is
used multiple times in the RCU subsystem. There is more weird looking code
there like atomic_add using zero etc.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/