linux-kernel - Re: question about RCU dynticks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5547C1DC.10802@redhat.com>
Date:	Mon, 04 May 2015 15:00:44 -0400
From:	Rik van Riel <riel@...hat.com>
To:	Paolo Bonzini <pbonzini@...hat.com>,
	Ingo Molnar <mingo@...nel.org>,
	Andy Lutomirski <luto@...capital.net>
CC:	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	X86 ML <x86@...nel.org>, williams@...hat.com,
	Andrew Lutomirski <luto@...nel.org>, fweisbec@...hat.com,
	Peter Zijlstra <peterz@...radead.org>,
	Heiko Carstens <heiko.carstens@...ibm.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>,
	"Paul E. McKenney" <paulmck@...ibm.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: question about RCU dynticks_nesting

On 05/04/2015 11:59 AM, Rik van Riel wrote:

> However, currently the RCU code seems to use a much more
> complex counting scheme, with a different increment for
> kernel/task use, and irq use.
> 
> This counter seems to be modeled on the task preempt_counter,
> where we do care about whether we are in task context, irq
> context, or softirq context.
> 
> On the other hand, the RCU code only seems to care about
> whether or not a CPU is in an extended quiescent state,
> or is potentially in an RCU critical section.
> 
> Paul, what is the reason for RCU using a complex counter,
> instead of a simple increment for each potential kernel/RCU
> entry, like rcu_read_lock() does with CONFIG_PREEMPT_RCU
> enabled?

Looking at the code for a while more, I have not found
any reason why the rcu dynticks counter is so complex.

The rdtp->dynticks atomic seems to be used as a serial
number. Odd means the cpu is in an rcu quiescent state,
even means it is not.

This test is used to verify whether or not a CPU is
in rcu quiescent state. Presumably the atomic_add_return
is used to add a memory barrier.

	atomic_add_return(0, &rdtp->dynticks) & 0x1)

> In fact, would we be able to simply use tsk->rcu_read_lock_nesting
> as an indicator of whether or not we should bother waiting on that
> task or CPU when doing synchronize_rcu?

We seem to have two variants of __rcu_read_lock().

One increments current->rcu_read_lock_nesting, the other
calls preempt_disable().

In case of the non-preemptible RCU, we could easily also
increase current->rcu_read_lock_nesting at the same time
we increase the preempt counter, and use that as the
indicator to test whether the cpu is in an extended
rcu quiescent state. That way there would be no extra
overhead at syscall entry or exit at all. The trick
would be getting the preempt count and the rcu read
lock nesting count in the same cache line for each task.

In case of the preemptible RCU scheme, we would have to
examine the per-task state (under the runqueue lock)
to get the current task info of all CPUs, and in
addition wait for the blkd_tasks list to empty out
when doing a synchronize_rcu().

That does not appear to require special per-cpu
counters; examining the per-cpu rdp and the lists
inside it, with the rnp->lock held if doing any
list manipulation, looks like it would be enough.

However, the current code is a lot more complicated
than that. Am I overlooking something obvious, Paul?
Maybe something non-obvious? :)

-- 
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/