linux-kernel - Re: [RFC 2/2] srcu: Remove memory barrier "E" as it is not required

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20221218214243.GA1990383@lothringen>
Date:   Sun, 18 Dec 2022 22:42:43 +0100
From:   Frederic Weisbecker <frederic@...nel.org>
To:     "Joel Fernandes (Google)" <joel@...lfernandes.org>
Cc:     linux-kernel@...r.kernel.org,
        Josh Triplett <josh@...htriplett.org>,
        Lai Jiangshan <jiangshanlai@...il.com>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        "Paul E. McKenney" <paulmck@...nel.org>, rcu@...r.kernel.org,
        Steven Rostedt <rostedt@...dmis.org>
Subject: Re: [RFC 2/2] srcu: Remove memory barrier "E" as it is not required

On Sun, Dec 18, 2022 at 07:13:09PM +0000, Joel Fernandes (Google) wrote:
> During a flip, we have a full memory barrier before idx is incremented.
> 
> The effect of this seems to be to guarantee that, if a READER sees srcu_idx
> updates (srcu_flip), then prior scans would not see its updates to counters on
> that index.
> 
> That does not matter because of the following reason: If a prior scan did see
> counter updates on the new index, that means the prior scan would would wait
> for the reader when it probably did not need to.

I'm confused, isn't it actually what we want to prevent from?
The point of the barrier here is to make sure that the inactive index that
we just scanned is guaranteed to remain seen as inactive during the whole scan
(minus the possible twice residual increments from a given task that we debated
on Paul's patch, but we want the guarantee that the inactive index won't be
incremented thrice by a given task or any further while we are scanning it).

If some readers see the new index and increments the lock and we see that while
we are scanning it, there is a risk that the GP is going to be delayed indefinetly.

> @@ -982,14 +982,6 @@ static bool try_check_zero(struct srcu_struct *ssp, int idx, int trycount)
>   */
>  static void srcu_flip(struct srcu_struct *ssp)
>  {
> -	/*
> -	 * Ensure that if a given reader sees the new value of ->srcu_idx, this
> -	 * updater's earlier scans cannot have seen that reader's increments
> -	 * (which is OK, because this grace period need not wait on that
> -	 * reader).
> -	 */
> -	smp_mb(); /* E */  /* Pairs with B and C. */

That said, I've been starring at this very barrier for the whole day, and I'm
wondering what does it match exactly on the other end?

      UPDATER                               READER
      -------                               ------
      idx = ssp->srcu_idx;                  idx = srcu_idx;
      READ srcu_unlock_count[srcu_idx ^ 1]  srcu_lock_count[idx]++
      smp_mb();                             smp_mb();
      READ srcu_lock_count[srcu_idx ^ 1]    srcu_unlock_count[old_idx]++
      smp_mb()
      srcu_idx++;

For a true match, I would expect a barrier between srcu_idx read and
srcu_lock_count write. I'm not used to ordering writes after reads.
So what is the pattern here? I would expect something like the below
but that doesn't match the above:

C rwrw

{}

P0(int *X, int *Y)
{
	int x;

	x = READ_ONCE(*X);
	smp_mb();
	WRITE_ONCE(*Y, 1);
}

P1(int *X, int *Y)
{

	int y;

	y = READ_ONCE(*Y);
	smp_mb();
	WRITE_ONCE(*X, 1);
}

exists (0:x=1 /\ 1:y=1)

> -
>  	WRITE_ONCE(ssp->srcu_idx, ssp->srcu_idx + 1);
>  
>  	/*
> -- 
> 2.39.0.314.g84b9a713c41-goog
>