linux-kernel - tasks-trace RCU: question about grace period forward progress

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <354598689.4868.1614262968890.JavaMail.zimbra@efficios.com>
Date:   Thu, 25 Feb 2021 09:22:48 -0500 (EST)
From:   Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To:     paulmck <paulmck@...nel.org>
Cc:     linux-kernel <linux-kernel@...r.kernel.org>,
        rcu <rcu@...r.kernel.org>, Peter Zijlstra <peterz@...radead.org>,
        Josh Triplett <josh@...htriplett.org>,
        rostedt <rostedt@...dmis.org>,
        Lai Jiangshan <jiangshanlai@...il.com>,
        "Joel Fernandes, Google" <joel@...lfernandes.org>
Subject: tasks-trace RCU: question about grace period forward progress

Hi Paul,

Answering a question from Peter on IRC got me to look at rcu_read_lock_trace(), and I see this:

static inline void rcu_read_lock_trace(void)
{
        struct task_struct *t = current;

        WRITE_ONCE(t->trc_reader_nesting, READ_ONCE(t->trc_reader_nesting) + 1);
        barrier();
        if (IS_ENABLED(CONFIG_TASKS_TRACE_RCU_READ_MB) &&
            t->trc_reader_special.b.need_mb)
                smp_mb(); // Pairs with update-side barriers
        rcu_lock_acquire(&rcu_trace_lock_map);
}

static inline void rcu_read_unlock_trace(void)
{
        int nesting;
        struct task_struct *t = current;

        rcu_lock_release(&rcu_trace_lock_map);
        nesting = READ_ONCE(t->trc_reader_nesting) - 1;
        barrier(); // Critical section before disabling.
        // Disable IPI-based setting of .need_qs.
        WRITE_ONCE(t->trc_reader_nesting, INT_MIN);
        if (likely(!READ_ONCE(t->trc_reader_special.s)) || nesting) {
                WRITE_ONCE(t->trc_reader_nesting, nesting);
                return;  // We assume shallow reader nesting.
        }
        rcu_read_unlock_trace_special(t, nesting);
}

AFAIU, each thread keeps track of whether it is nested within a RCU read-side critical
section with a counter, and grace periods iterate over all threads to make sure they
are not within a read-side critical section before they can complete:

# define rcu_tasks_trace_qs(t)                                          \
        do {                                                            \
                if (!likely(READ_ONCE((t)->trc_reader_checked)) &&      \
                    !unlikely(READ_ONCE((t)->trc_reader_nesting))) {    \
                        smp_store_release(&(t)->trc_reader_checked, true); \
                        smp_mb(); /* Readers partitioned by store. */   \
                }                                                       \
        } while (0)

It reminds me of the liburcu urcu-mb flavor which also deals with per-thread
state to track whether threads are nested within a critical section:

https://github.com/urcu/userspace-rcu/blob/master/include/urcu/static/urcu-mb.h#L90
https://github.com/urcu/userspace-rcu/blob/master/include/urcu/static/urcu-mb.h#L125

static inline void _urcu_mb_read_lock_update(unsigned long tmp)
{
	if (caa_likely(!(tmp & URCU_GP_CTR_NEST_MASK))) {
		_CMM_STORE_SHARED(URCU_TLS(urcu_mb_reader).ctr, _CMM_LOAD_SHARED(urcu_mb_gp.ctr));
		cmm_smp_mb();
	} else
		_CMM_STORE_SHARED(URCU_TLS(urcu_mb_reader).ctr, tmp + URCU_GP_COUNT);
}

static inline void _urcu_mb_read_lock(void)
{
	unsigned long tmp;

	urcu_assert(URCU_TLS(urcu_mb_reader).registered);
	cmm_barrier();
	tmp = URCU_TLS(urcu_mb_reader).ctr;
	urcu_assert((tmp & URCU_GP_CTR_NEST_MASK) != URCU_GP_CTR_NEST_MASK);
	_urcu_mb_read_lock_update(tmp);
}

The main difference between the two algorithm is that task-trace within the
kernel lacks the global "urcu_mb_gp.ctr" state snapshot, which is either
incremented or flipped between 0 and 1 by the grace period. This allow RCU readers
outermost nesting starting after the beginning of the grace period not to prevent
progress of the grace period.

Without this, a steady flow of incoming tasks-trace-RCU readers can prevent the
grace period from ever completing.

Or is this handled in a clever way that I am missing here ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com