linux-kernel - Re: [PATCH -next v2] locking/osq_lock: annotate a data race in osq

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200511155812.GB22270@willie-the-truck>
Date:   Mon, 11 May 2020 16:58:13 +0100
From:   Will Deacon <will@...nel.org>
To:     "Paul E. McKenney" <paulmck@...nel.org>
Cc:     Qian Cai <cai@....pw>, Elver Marco <elver@...gle.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...hat.com>,
        "Peter Zijlstra (Intel)" <peterz@...radead.org>
Subject: Re: [PATCH -next v2] locking/osq_lock: annotate a data race in
 osq_lock

On Sat, May 09, 2020 at 02:36:54PM -0700, Paul E. McKenney wrote:
> On Sat, May 09, 2020 at 12:53:38PM -0400, Qian Cai wrote:
> > 
> > 
> > > On May 9, 2020, at 12:12 PM, Paul E. McKenney <paulmck@...nel.org> wrote:
> > > 
> > > Ah, and I forgot to ask.  Why "if (data_race(prev->next == node)" instead
> > > of "if (data_race(prev->next) == node"?
> > 
> > I think the one you suggested is slightly better to point out the exact race. Do you want me to resend or you could squash it instead?
> 
> The patch was still at the top of my stack, so I just amended it.  Here
> is the updated version.
> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> commit 13e69ca01ce1621ce74248bda86cfad47fa5a0fa
> Author: Qian Cai <cai@....pw>
> Date:   Tue Feb 11 08:54:15 2020 -0500
> 
>     locking/osq_lock: Annotate a data race in osq_lock
>     
>     The prev->next pointer can be accessed concurrently as noticed by KCSAN:
>     
>      write (marked) to 0xffff9d3370dbbe40 of 8 bytes by task 3294 on cpu 107:
>       osq_lock+0x25f/0x350
>       osq_wait_next at kernel/locking/osq_lock.c:79
>       (inlined by) osq_lock at kernel/locking/osq_lock.c:185
>       rwsem_optimistic_spin
>       <snip>
>     
>      read to 0xffff9d3370dbbe40 of 8 bytes by task 3398 on cpu 100:
>       osq_lock+0x196/0x350
>       osq_lock at kernel/locking/osq_lock.c:157
>       rwsem_optimistic_spin
>       <snip>
>     
>     Since the write only stores NULL to prev->next and the read tests if
>     prev->next equals to this_cpu_ptr(&osq_node). Even if the value is
>     shattered, the code is still working correctly. Thus, mark it as an
>     intentional data race using the data_race() macro.
>     
>     Signed-off-by: Qian Cai <cai@....pw>
>     Signed-off-by: Paul E. McKenney <paulmck@...nel.org>
> 
> diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> index 1f77349..1de006e 100644
> --- a/kernel/locking/osq_lock.c
> +++ b/kernel/locking/osq_lock.c
> @@ -154,7 +154,11 @@ bool osq_lock(struct optimistic_spin_queue *lock)
>  	 */
>  
>  	for (;;) {
> -		if (prev->next == node &&
> +		/*
> +		 * cpu_relax() below implies a compiler barrier which would
> +		 * prevent this comparison being optimized away.
> +		 */
> +		if (data_race(prev->next) == node &&
>  		    cmpxchg(&prev->next, node, NULL) == node)
>  			break;

I'm fine with the data_race() placement, but I don't find the comment
very helpful. We assign the result of a READ_ONCE() to 'prev' in the
loop, so I don't think that the cpu_relax() is really relevant.

The reason we don't need READ_ONCE() here is because if we race with
the writer then either we'll go round the loop again after accidentally
thinking prev->next != node, or we'll erroneously attempt the cmpxchg()
because we thought they were equal and that will fail.

Make sense?

Will