[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20240510091138.23367-1-kuniyu@amazon.com>
Date: Fri, 10 May 2024 18:11:38 +0900
From: Kuniyuki Iwashima <kuniyu@...zon.com>
To: <pabeni@...hat.com>
CC: <billy@...rlabs.sg>, <davem@...emloft.net>, <edumazet@...gle.com>,
<kuba@...nel.org>, <kuni1840@...il.com>, <kuniyu@...zon.com>,
<netdev@...r.kernel.org>
Subject: Re: [PATCH v1 net] af_unix: Update unix_sk(sk)->oob_skb under sk_receive_queue lock.
From: Paolo Abeni <pabeni@...hat.com>
Date: Fri, 10 May 2024 09:53:25 +0200
> On Fri, 2024-05-10 at 14:03 +0900, Kuniyuki Iwashima wrote:
> > From: Paolo Abeni <pabeni@...hat.com>
> > Date: Thu, 09 May 2024 11:12:38 +0200
> > > On Tue, 2024-05-07 at 10:00 -0700, Kuniyuki Iwashima wrote:
> > > > Billy Jheng Bing-Jhong reported a race between __unix_gc() and
> > > > queue_oob().
> > > >
> > > > __unix_gc() tries to garbage-collect close()d inflight sockets,
> > > > and then if the socket has MSG_OOB in unix_sk(sk)->oob_skb, GC
> > > > will drop the reference and set NULL to it locklessly.
> > > >
> > > > However, the peer socket still can send MSG_OOB message to the
> > > > GC candidate and queue_oob() can update unix_sk(sk)->oob_skb
> > > > concurrently, resulting in NULL pointer dereference. [0]
> > > >
> > > > To avoid the race, let's update unix_sk(sk)->oob_skb under the
> > > > sk_receive_queue's lock.
> > >
> > > I'm sorry to delay this fix but...
> > >
> > > AFAICS every time AF_UNIX touches the ooo_skb, it's under the receiver
> > > unix_state_lock. The only exception is __unix_gc. What about just
> > > acquiring such lock there?
> >
> > In the new GC, there is unix_state_lock -> gc_lock ordering, and
> > we need another fix then.
> >
> > That's why I chose locking recvq for old GC too.
> > https://lore.kernel.org/netdev/20240507172606.85532-1-kuniyu@amazon.com/
> >
> > Also, Linus says:
> >
> > I really get the feeling that 'sb->oob_skb' should actually be forced
> > to always be in sync with the receive queue by always doing the
> > accesses under the receive_queue lock.
> >
> > ( That's in the security@ thread I added you, but I just noticed
> > Linus replied to the previous mail. I'll forward the mails to you. )
> >
> >
> > > Otherwise there are other chunk touching the ooo_skb is touched where
> > > this patch does not add the receive queue spin lock protection e.g. in
> > > unix_stream_recv_urg(), making the code a bit inconsistent.
> >
> > Yes, now the receive path is protected by unix_state_lock() and the
> > send path is by unix_state_lock() and recvq lock.
> >
> > Ideally, as Linus suggested, we should acquire recvq lock everywhere
> > touching oob_skb and remove the additional refcount by skb_get(), but
> > I thought it's too much as a fix and I would do that refactoring in
> > the next cycle.
> >
> > What do you think ?
>
> I missed/forgot the unix_state_lock -> gc_lock ordering on net-next.
>
> What about using the receive queue lock, and acquiring that everywhere
> oob_skb is touched, without the additional refcount refactor?
>
> Would be more consistent and reasonably small. It should work on the
> new CG, too.
>
> The refcount refactor could later come on net-next, and will be less
> complex with the lock already in place.
yeah, sounds good.
will post v2 with additional recvq locks.
Thanks!
>
> Incremental patch on top of yours, completely untested:
> ---
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index 9a6ad5974dff..a489f2aef29d 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -2614,8 +2614,10 @@ static int unix_stream_recv_urg(struct unix_stream_read_state *state)
>
> mutex_lock(&u->iolock);
> unix_state_lock(sk);
> + spin_lock(&sk->sk_receive_queue.lock);
>
> if (sock_flag(sk, SOCK_URGINLINE) || !u->oob_skb) {
> + spin_unlock(&sk->sk_receive_queue.lock);
> unix_state_unlock(sk);
> mutex_unlock(&u->iolock);
> return -EINVAL;
> @@ -2627,6 +2629,7 @@ static int unix_stream_recv_urg(struct unix_stream_read_state *state)
> WRITE_ONCE(u->oob_skb, NULL);
> else
> skb_get(oob_skb);
> + spin_unlock(&sk->sk_receive_queue.lock);
> unix_state_unlock(sk);
>
> chunk = state->recv_actor(oob_skb, 0, chunk, state);
> @@ -2655,6 +2658,7 @@ static struct sk_buff *manage_oob(struct sk_buff *skb, struct sock *sk,
> consume_skb(skb);
> skb = NULL;
> } else {
> + spin_lock(&sk->sk_receive_queue.lock);
> if (skb == u->oob_skb) {
> if (copied) {
> skb = NULL;
> @@ -2673,6 +2677,7 @@ static struct sk_buff *manage_oob(struct sk_buff *skb, struct sock *sk,
> skb = skb_peek(&sk->sk_receive_queue);
> }
> }
> + spin_unlock(&sk->sk_receive_queue.lock);
> }
> return skb;
> }
>
Powered by blists - more mailing lists