netdev - Re: [PATCH net-next] unix: Guarantee sk_state relevance in case of it was assigned by a task on other cpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230126214744.GN2948950@paulmck-ThinkPad-P17-Gen-1>
Date:   Thu, 26 Jan 2023 13:47:44 -0800
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Jakub Kicinski <kuba@...nel.org>
Cc:     Kirill Tkhai <tkhai@...ru>,
        Linux Kernel Network Developers <netdev@...r.kernel.org>,
        davem@...emloft.net, edumazet@...gle.com, pabeni@...hat.com,
        kuniyu@...zon.com, gorcunov@...il.com
Subject: Re: [PATCH net-next] unix: Guarantee sk_state relevance in case of
 it was assigned by a task on other cpu

On Thu, Jan 26, 2023 at 01:33:22PM -0800, Jakub Kicinski wrote:
> On Thu, 26 Jan 2023 12:25:11 -0800 Paul E. McKenney wrote:
> > > Me trying to prove that memory ordering is transitive would be 100%
> > > speculation. Let's ask Paul instead - is the above valid? Or the fact
> > > that CPU1 observes state from CPU0 and is strongly ordered with CPU2
> > > implies that CPU2 will also observe CPU0's state?  
> > 
> > Hmmm...  What is listen() doing?  There seem to be a lot of them
> > in the kernel.
> > 
> > But proceeding on first principles...
> > 
> > Sometimes.  Memory ordering is transitive only when the ordering is
> > sufficiently strong.
> > 
> > In this case, I do not see any ordering between CPU 0 and anything else.
> > If the listen() function were to acquire the same mutex as CPU1 and CPU2
> > did, and if it acquired it first, then CPU2 would be guaranteed to see
> > anything CPU0 did while holding that mutex.
> 
> The fuller picture would be:
> 
> [CPU0]                     [CPU1]                [CPU2]
> WRITE_ONCE(sk->sk_state,
>            TCP_LISTEN);
>                            val = READ_ONCE(sk->sk_state) 
>                            mutex_lock()
>                            shared_mem_var = val
>                            mutex_unlock()
>                                                   mutex_lock()
>                                                   if (shared_mem_var == TCP_LISTEN)
>                                                      BUG_ON(READ_ONCE(sk->sk_state)
>                                                             != TCP_LISTEN)
>                                                   mutex_unlock()

And just to make sure that I represented your example correctly, please
see the litmus test below.

LKMM says that the bad outcome cannot happen, that is, if CPU1 sees
CPU0's write to sk->sk_state ("ss" in the litmus test) and if CPU2
sees CPU1's write to shared_mem_var ("smv" in the litmus test), then
it cannot be the case that CPU2 sees the old value of sk->sk_state that
was overwritten by CPU0.

And as you note below, it is no surprise that LKMM has this opinion.  ;-)

But I need to make sure that I didn't misrepresent your diagram above.

> > Alternatively, if CPU0 wrote to some memory, and CPU1 read that value
> > before releasing the mutex (including possibly before acquiring that
> > mutex), then CPU2 would be guaranteed to see that value (or the value
> > written by some later write to that same memory) after acquiring that
> > mutex.
> 
> Which I believe is exactly what happens in the example.
> 
> > So here are some things you can count on transitively:
> > 
> > 1.	After acquiring a given lock (or mutex or whatever), you will
> > 	see any values written or read prior to any earlier conflicting
> > 	release of that same lock.
> > 
> > 2.	After an access with acquire semantics (for example,
> > 	smp_load_acquire()) you will see any values written or read
> > 	prior to any earlier access with release semantics (for example,
> > 	smp_store_release()).
> > 
> > Or in all cases, you might see later values, in case someone else also
> > did a write to the location in question.
> > 
> > Does that help, or am I missing a turn in there somewhere?
> 
> Very much so, thank you!

------------------------------------------------------------------------

C C-Jakub-listen

(*
 * Result: Never
 *
 * Message-ID: <20230126133322.3bfab5e0@...nel.org>
 *)

{
}

P0(int *ss, int *smv, spinlock_t *gbl)
{
	WRITE_ONCE(*ss, 1);
}

P1(int *ss, int *smv, spinlock_t *gbl)
{
        int r1;

	r1 = READ_ONCE(*ss);
	spin_lock(gbl);
	WRITE_ONCE(*smv, 1);
	spin_unlock(gbl);
}

P2(int *ss, int *smv, spinlock_t *gbl)
{
	int r1;
        int r2;

	spin_lock(gbl);
	r1 = READ_ONCE(*smv);
	r2 = READ_ONCE(*ss);
	spin_unlock(gbl);
}

exists
(1:r1=1 /\ 2:r1=1 /\ 2:r2=0)