linux-kernel - Re: [PATCH 1/7] rcu: Fix preempt-unsafe debug check of rcu extended quiescent state

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 27 Sep 2011 23:44:56 +0200
From:	Frederic Weisbecker <fweisbec@...il.com>
To:	Pavel Ivanov <paivanof@...il.com>
Cc:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Thomas Gleixner <tglx@...utronix.de>,
	Lai Jiangshan <laijs@...fujitsu.com>,
	Ingo Molnar <mingo@...hat.com>
Subject: Re: [PATCH 1/7] rcu: Fix preempt-unsafe debug check of rcu extended
 quiescent state

On Tue, Sep 27, 2011 at 11:16:02AM -0400, Pavel Ivanov wrote:
> > What matters is that we do that check by ensuring we are really
> > checking the value of the cpu var in the CPU we are currently
> > running and not some other random one that can change its dynticks
> > value at the same time.
> 
> Define the "CPU we are currently running on" in this context. Is it
> CPU executing call to rcu_check_extended_qs() or is it CPU executing
> return from rcu_check_extended_qs() ? These CPUs can be different both
> before your patch and after that. And function can return extended_qs
> state from either of these CPUs again both before and after the patch.
> If the calling code wants these CPUs to be the same it has to disable
> preemption before making the call. And if it does so then additional
> preemption disabling inside the function is pointless.

So, like Paul said rcu_read_lock() doesn't necessary imply to disable
preemption.

Hence by the time we call rcu_check_extended_qs() the current task
can be migrated anytime before, while in the function (except
a little part) or after.

The CPU I was referring to when I talked about "CPU we are currently
running on" is the CPU we are running between the call to get_cpu_var()
and put_cpu_var(). This one can not be changed because get_cpu_var()
disables preemption.

So consider this piece of code:

        struct rcu_dynticks *rdtp =
               &get_cpu_var(rcu_dynticks);
        bool ext_qs = true;

        if (atomic_read(&rdtp->dynticks) & 0x1)
                ext_qs = false;

        put_cpu_var(rcu_dynticks);	

What I expect from preemption disabled is that when I read the local
CPU variable rdtp->dyntick, I'm sure this is the CPU var of the local
CPU and the rdtp->dyntick from another CPU.

If I don't disable preemption, like it was without my patch:

0	struct rcu_dynticks *rdtp =
1               &__raw_get_cpu_var(rcu_dynticks);
2
3        if (atomic_read(&rdtp->dynticks) & 0x1)
4                ext_qs = false;
5
6        put_cpu_var(rcu_dynticks);	

I can fetch rdtp of CPU 1 in line 0. Then the task migrates on CPU 2.
So on line 3 I'm reading rdtp->dynticks of CPU 1 from CPU 2 and this is
racy because CPU 1 can change the value of rdtp->dynticks concurrently.

Now indeed it's weird because we can migrate anytime outside that preempt
disabled section.

So let's explore the two cases where this function can be called:

- From the idle task. For now this is the only place where we can
run sections of code in RCU extended quiescent state. If any use
of RCU is made on such section, it will hit our check.
Here there is no head-scratching about the role of disabling preemption
because the idle tasks can't be migrated. There is one per cpu so
the rcu_dynticks variable we look at is always the same inside a
given idle task.

- From a normal task. We assume it can be migrated anytime. But
normal tasks aren't supposed in RCU extended quiescent state.
Still the check can be useful there and spot for example cases where
we exit the idle task without calling rcu_exit_nohz().

Now from a normal task, when we call rcu_read_lock(), we assume
we can read the value dynticks from any CPU, wherever we migrate
to. So for example if we are running idle in CPU 1, then we exit
idle without calling rcu_exit_nohz(), the next task running on this
CPU is about to call rcu_read_lock(), but of on the last time before
we do our check it migrates to CPU 2. It won't detect the issue in CPU 1
then. But it doesn't matter much, soon or later there are fair
chances there will be a call to rcu_read_lock() on CPU 1 that
will report the issue.

That's also an anticipation for future development where we may
call rcu_enter_nohz() in more place than just idle. Like in
the Nohz cpusets for example.

Right?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/