linux-kernel - Re: [PATCH RFC tip/core/rcu] rcu: direct algorithmic SRCU implementation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Mon, 20 Feb 2012 15:15:33 +0800
From:	Lai Jiangshan <laijs@...fujitsu.com>
To:	paulmck@...ux.vnet.ibm.com
CC:	linux-kernel@...r.kernel.org, mingo@...e.hu, dipankar@...ibm.com,
	akpm@...ux-foundation.org, mathieu.desnoyers@...ymtl.ca,
	josh@...htriplett.org, niv@...ibm.com, tglx@...utronix.de,
	peterz@...radead.org, rostedt@...dmis.org, Valdis.Kletnieks@...edu,
	dhowells@...hat.com, eric.dumazet@...il.com, darren@...art.com,
	fweisbec@...il.com, patches@...aro.org
Subject: Re: [PATCH RFC tip/core/rcu] rcu: direct algorithmic SRCU implementation

On 02/13/2012 10:09 AM, Paul E. McKenney wrote:

>  /*
>   * Helper function for synchronize_srcu() and synchronize_srcu_expedited().
>   */
> -static void __synchronize_srcu(struct srcu_struct *sp, void (*sync_func)(void))
> +static void __synchronize_srcu(struct srcu_struct *sp, bool expedited)
>  {
>  	int idx;
>  
> @@ -178,90 +316,51 @@ static void __synchronize_srcu(struct srcu_struct *sp, void (*sync_func)(void))
>  			   !lock_is_held(&rcu_sched_lock_map),
>  			   "Illegal synchronize_srcu() in same-type SRCU (or RCU) read-side critical section");
>  
> -	idx = sp->completed;
> +	idx = ACCESS_ONCE(sp->completed);
>  	mutex_lock(&sp->mutex);
>  
>  	/*
>  	 * Check to see if someone else did the work for us while we were
> -	 * waiting to acquire the lock.  We need -two- advances of
> +	 * waiting to acquire the lock.  We need -three- advances of
>  	 * the counter, not just one.  If there was but one, we might have
>  	 * shown up -after- our helper's first synchronize_sched(), thus
>  	 * having failed to prevent CPU-reordering races with concurrent
> -	 * srcu_read_unlock()s on other CPUs (see comment below).  So we
> -	 * either (1) wait for two or (2) supply the second ourselves.
> +	 * srcu_read_unlock()s on other CPUs (see comment below).  If there
> +	 * was only two, we are guaranteed to have waited through only one
> +	 * full index-flip phase.  So we either (1) wait for three or
> +	 * (2) supply the additional ones we need.
>  	 */
>  
> -	if ((sp->completed - idx) >= 2) {
> +	if (sp->completed == idx + 2)
> +		idx = 1;
> +	else if (sp->completed == idx + 3) {
>  		mutex_unlock(&sp->mutex);
>  		return;
> -	}
> -
> -	sync_func();  /* Force memory barrier on all CPUs. */
> +	} else
> +		idx = 0;


Hi, Paul

I don't think this check-and-return path is needed since we will introduce call_srcu().
We just need a correct code to show how it works and to be used for a while,
and new call_srcu() will be implemented based on this correct code which will be removed.

And I think this unneeded check-and-return path is incorrect. See the following:

Reader			Updater						Helper thread
			old_ptr = rcu_ptr;
			/* rcu_ptr = NULL; but be reordered to (1) */
			start synchronize_srcu()
			idx = ACCESS_ONCE(sp->completed);(2)
									synchronize_srcu()
									synchronize_srcu()
srcu_read_lock();
old_ptr = rcu_ptr;
			rcu_ptr = NULL;(1)
			mutex_lock() and read sp->completed
			and return from synchronize_srcu()
			free(old_ptr);
use freed old_ptr
srcu_read_unlock();


So, we need a smb_mb() between (1) and (2) to force the order.

__synchronize_srcu() {
	smp_mb(); /* F */
	idx = ACCESS_ONCE(sp->completed); /* (2) */
	....
}

And this smp_mb() F is paired with helper's smp_mb() D. So if Updater sees X advances of
->completed, Updater must sees X times of *full* flip_and_wait(). So We need see -two- advances of
->completed from Helper only, not -three-.

        if (sp->completed == idx + 1)
                idx = 1;
        else if (sp->completed == idx + 2) {
                mutex_unlock(&sp->mutex);
                return;
        } else
                idx = 0;


Or simpler:

__synchronize_srcu() {
	unsigned int idx;   /* <-------- unsigned */

	/* comments for smp_mb() F */
	smp_mb(); /* F */
	idx = ACCESS_ONCE(sp->completed);

	mutex_lock(&sp->mutex);
	idx = sp->completed - idx;

	/* original comments */
	for (; idx < 2; idx++)
                flip_idx_and_wait(sp, expedited);
        mutex_unlock(&sp->mutex);
}

At last, I can't understand the comments of this check-and-return path.
So maybe the above reply and I are totally wrong.
But the comments of this check-and-return path does not describe the code
well(especially the order), and it contains the old "synchronize_sched()"
which make me confuse.

My conclusion, we can just remove the check-and-return path to reduce
the complexity since we will introduce call_srcu().

This new srcu is very great, especially the SRCU_USAGE_COUNT for every
lock/unlock witch forces any increment/decrement pair changes the counter
for me.


Thanks,
Lai

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/