linux-kernel - [BUG] race of RCU vs NOHU

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090807151529.2806b8b4@skybase>
Date:	Fri, 7 Aug 2009 15:15:29 +0200
From:	Martin Schwidefsky <schwidefsky@...ibm.com>
To:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	linux-kernel@...r.kernel.org
Cc:	Ingo Molnar <mingo@...e.hu>, Thomas Gleixner <tglx@...utronix.de>,
	Gerald Schaefer <gerald.schaefer@...ibm.com>
Subject: [BUG] race of RCU vs NOHU

Hi Paul,
I analysed a dump of a hanging 2.6.30 system and found what I think is
a bug of RCU vs NOHZ. There are a number of patches ontop of that
kernel but they should be independent of the bug.

The systems has 4 cpus and uses classic RCU. cpus #0, #2 and #3 woke up
recently, cpu #1 has been sleeping for 5 minutes, but there is a pending
rcu batch. The timer wheel for cpu #1 is empty, it will continue to
sleep for NEXT_TIMER_MAX_DELTA ticks.

Now if I look at the RCU data structures I find this:

rcu_ctrlblk
>> px *(struct rcu_ctrlblk *) 0x810000
struct rcu_ctrlblk {
        cur = 0xffffffffffffff99
        completed = 0xffffffffffffff98
        pending = 0xffffffffffffff99
        signaled = 0x0
        lock = spinlock_t {
                raw_lock = raw_spinlock_t {
                        owner_cpu = 0x0
                }
                break_lock = 0x0
                magic = 0xdead4ead
                owner_cpu = 0xffffffff
                owner = 0xffffffffffffffff
                dep_map = struct lockdep_map {
                        key = 0x810118
                        class_cache = 0xcbcff0
                        name = 0x63e944
                        cpu = 0x0
                        ip = 0x1a7f64
                }
        }
        cpumask = {
                [0] 0x2
        }
}

rcu_data cpu #0
>> px *(struct rcu_data *) 0x872f8430
struct rcu_data {
        quiescbatch = 0xffffffffffffff99
        passed_quiesc = 0x1
        qs_pending = 0x0
        batch = 0xffffffffffffff97
        nxtlist = (nil)
        nxttail = {
                [0]         0x872f8448
                [1]         0x872f8448
                [2]         0x872f8448
        }
        qlen = 0x0
        donelist = (nil)
        donetail = 0x872f8470
        blimit = 0xa
        cpu = 0x0
        barrier = struct rcu_head {
                next = (nil)
                func = 0x0
        }
}

rcu_data cpu #1
>> px *(struct rcu_data *) 0x874be430
struct rcu_data {
        quiescbatch = 0xffffffffffffff98
        passed_quiesc = 0x1
        qs_pending = 0x0
        batch = 0xffffffffffffff97
        nxtlist = (nil)
        nxttail = {
                [0]         0x874be448
                [1]         0x874be448
                [2]         0x874be448
        }
        qlen = 0x0
        donelist = (nil)
        donetail = 0x874be470
        blimit = 0xa
        cpu = 0x1
        barrier = struct rcu_head {
                next = (nil)
                func = 0x0
        }
}

rcu_data cpu #2
>> px *(struct rcu_data *) 0x87684430
struct rcu_data {
        quiescbatch = 0xffffffffffffff99
        passed_quiesc = 0x1
        qs_pending = 0x0
        batch = 0xffffffffffffff99
        nxtlist = 0xffc1fc18
        nxttail = {
                [0]         0x87684448
                [1]         0x87684448
                [2]         0xffc1fc18
        }
        qlen = 0x1
        donelist = (nil)
        donetail = 0x87684470
        blimit = 0xa
        cpu = 0x2
        barrier = struct rcu_head {
                next = (nil)
                func = 0x0
        }
}

rcu_data cpu #3
>> px *(struct rcu_data *) 0x8784a430
struct rcu_data {
        quiescbatch = 0xffffffffffffff99
        passed_quiesc = 0x1
        qs_pending = 0x0
        batch = 0xffffffffffffff63
        nxtlist = (nil)
        nxttail = {
                [0]         0x8784a448
                [1]         0x8784a448
                [2]         0x8784a448
        }
        qlen = 0x0
        donelist = (nil)
        donetail = 0x8784a470
        blimit = 0xa
        cpu = 0x3
        barrier = struct rcu_head {
                next = (nil)
                func = 0x0
        }
}

At the time cpu #1 went to sleep rcu_needs_cpu must have answered false,
otherwise a 1 tick delay would have been programmed. rcu_pending compares
rcu_ctrlblk.cur with rcu_data.quiescbatch for cpu #1. So these two must
have been equal otherwise rcu_needs_cpu would have answered true.
That means that the rcu_needs_cpu check has been completed before
rcu_start_batch for batch 0xffffffffffffff99. The bit for cpu #1 is
still set in the rcu_ctrlblk.cpumask, therefore the bit for cpu #1
in nohz_cpu_mask can not have been set at the time rcu_start_batch has
completed. That gives the following race (cpu 0 is starting the batch,
cpu 1 is going to sleep):

cpu 1: tick_nohz_stop_sched_tick: rcu_needs_cpu();
cpu 0: rcu_start_batch: rcp->cur++;
cpu 0: rcu_start_batch: cpumask_andnot(to_cpumask(rcp->cpumask),
					cpu_online_mask, nonz_cpu_mask);
cpu 1: tick_nohz_stop_schedk_tick: cpumask_set_cpu(1, nohz_cpu_mask);

The order of i) setting the bit in nohz_cpu_mask and ii) the rcu_needs_cpu()
check in tick_nohz_stop_sched_tick is wrong, no? Or did I miss some suble
check that comes afterwards ?

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/