netdev - Re: [PATCH net] net: implement lockless setsockopt(SO_PEEK

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240408183114.76329-1-kuniyu@amazon.com>
Date: Mon, 8 Apr 2024 11:31:14 -0700
From: Kuniyuki Iwashima <kuniyu@...zon.com>
To: <aha310510@...il.com>
CC: <daan.j.demeyer@...il.com>, <davem@...emloft.net>, <dsahern@...nel.org>,
	<edumazet@...gle.com>, <eric.dumazet@...il.com>, <kuba@...nel.org>,
	<kuniyu@...zon.com>, <martin.lau@...nel.org>, <netdev@...r.kernel.org>,
	<pabeni@...hat.com>, <willemdebruijn.kernel@...il.com>
Subject: Re: [PATCH net] net: implement lockless setsockopt(SO_PEEK_OFF)

From: Jeongjun Park <aha310510@...il.com>
Date: Tue,  9 Apr 2024 00:43:54 +0900
> Eric Dumazet wrote:
> > On Mon, Apr 8, 2024 at 4:30 PM Jeongjun Park <aha310510@...il.com> wrote:
> > >
> > > Eric Dumazet wrote:
> > > > syzbot reported a lockdep violation [1] involving af_unix
> > > > support of SO_PEEK_OFF.
> > > >
> > > > Since SO_PEEK_OFF is inherently not thread safe (it uses a per-socket
> > > > sk_peek_off field), there is really no point to enforce a pointless
> > > > thread safety in the kernel.
> > > >
> > > > After this patch :
> > > >
> > > > - setsockopt(SO_PEEK_OFF) no longer acquires the socket lock.
> > > >
> > > > - skb_consume_udp() no longer has to acquire the socket lock.
> > > >
> > > > - af_unix no longer needs a special version of sk_set_peek_off(),
> > > >   because it does not lock u->iolock anymore.
> > >
> > > The method employed in this patch, which avoids locking u->iolock in
> > > SO_PEEK_OFF, appears to have effectively remedied the immediate vulnerability,
> > > and the patch itself seems robust.
> > >
> > > However, if a future scenario arises where mutex_lock(&u->iolock) is required
> > > after sk_setsockopt(sk), this patch would become ineffective.
> > >
> > > In practical testing within my environment, I observed that reintroducing
> > > mutex_lock(&u->iolock) within sk_setsockopt() triggered the vulnerability once again.
> > >
> > > Therefore, I believe it's crucial to address the fundamental cause triggering this vulnerability
> > > alongside the current patch.
> > >
> > > [   30.537400] ======================================================
> > > [   30.537765] WARNING: possible circular locking dependency detected
> > > [   30.538237] 6.9.0-rc1-00058-g4076fa161217-dirty #8 Not tainted
> > > [   30.538541] ------------------------------------------------------
> > > [   30.538791] poc/209 is trying to acquire lock:
> > > [   30.539008] ffff888007a8cd58 (sk_lock-AF_UNIX){+.+.}-{0:0}, at: __unix_dgram_recvmsg+0x37e/0x550
> > > [   30.540060]
> > > [   30.540060] but task is already holding lock:
> > > [   30.540482] ffff888007a8d070 (&u->iolock){+.+.}-{3:3}, at: __unix_dgram_recvmsg+0xec/0x550
> > > [   30.540871]
> > > [   30.540871] which lock already depends on the new lock.
> > > [   30.540871]
> > > [   30.541341]
> > > [   30.541341] the existing dependency chain (in reverse order) is:
> > > [   30.541816]
> > > [   30.541816] -> #1 (&u->iolock){+.+.}-{3:3}:
> > > [   30.542411]        lock_acquire+0xc0/0x2e0
> > > [   30.542650]        __mutex_lock+0x91/0x4b0
> > > [   30.542830]        sk_setsockopt+0xae2/0x1510
> > > [   30.543009]        do_sock_setsockopt+0x14e/0x180
> > > [   30.543443]        __sys_setsockopt+0x73/0xc0
> > > [   30.543635]        __x64_sys_setsockopt+0x1a/0x30
> > > [   30.543859]        do_syscall_64+0xc9/0x1e0
> > > [   30.544057]        entry_SYSCALL_64_after_hwframe+0x6d/0x75
> > > [   30.544652]
> > > [   30.544652] -> #0 (sk_lock-AF_UNIX){+.+.}-{0:0}:
> > > [   30.544987]        check_prev_add+0xeb/0xa20
> > > [   30.545174]        __lock_acquire+0x12fb/0x1740
> > > [   30.545516]        lock_acquire+0xc0/0x2e0
> > > [   30.545692]        lock_sock_nested+0x2d/0x80
> > > [   30.545871]        __unix_dgram_recvmsg+0x37e/0x550
> > > [   30.546066]        sock_recvmsg+0xbf/0xd0
> > > [   30.546419]        ____sys_recvmsg+0x85/0x1d0
> > > [   30.546653]        ___sys_recvmsg+0x77/0xc0
> > > [   30.546971]        __sys_recvmsg+0x55/0xa0
> > > [   30.547149]        do_syscall_64+0xc9/0x1e0
> > > [   30.547428]        entry_SYSCALL_64_after_hwframe+0x6d/0x75
> > > [   30.547740]
> > > [   30.547740] other info that might help us debug this:
> > > [   30.547740]
> > > [   30.548217]  Possible unsafe locking scenario:
> > > [   30.548217]
> > > [   30.548502]        CPU0                    CPU1
> > > [   30.548713]        ----                    ----
> > > [   30.548926]   lock(&u->iolock);
> > > [   30.549234]                                lock(sk_lock-AF_UNIX);
> > > [   30.549535]                                lock(&u->iolock);
> > > [   30.549798]   lock(sk_lock-AF_UNIX);
> > > [   30.549970]
> > > [   30.549970]  *** DEADLOCK ***
> > > [   30.549970]
> > > [   30.550504] 1 lock held by poc/209:
> > > [   30.550681]  #0: ffff888007a8d070 (&u->iolock){+.+.}-{3:3}, at: __unix_dgram_recvmsg+0xec/0x550
> > > [   30.551100]
> > > [   30.551100] stack backtrace:
> > > [   30.551532] CPU: 1 PID: 209 Comm: poc Not tainted 6.9.0-rc1-00058-g4076fa161217-dirty #8
> > > [   30.551910] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
> > > [   30.552539] Call Trace:
> > > [   30.552788]  <TASK>
> > > [   30.552987]  dump_stack_lvl+0x68/0xa0
> > > [   30.553429]  check_noncircular+0x135/0x150
> > > [   30.553626]  check_prev_add+0xeb/0xa20
> > > [   30.553811]  __lock_acquire+0x12fb/0x1740
> > > [   30.553993]  lock_acquire+0xc0/0x2e0
> > > [   30.554234]  ? __unix_dgram_recvmsg+0x37e/0x550
> > > [   30.554543]  ? __skb_try_recv_datagram+0xb2/0x190
> > > [   30.554752]  lock_sock_nested+0x2d/0x80
> > > [   30.554912]  ? __unix_dgram_recvmsg+0x37e/0x550
> > > [   30.555097]  __unix_dgram_recvmsg+0x37e/0x550
> > > [   30.555498]  sock_recvmsg+0xbf/0xd0
> > > [   30.555661]  ____sys_recvmsg+0x85/0x1d0
> > > [   30.555826]  ? __import_iovec+0x177/0x1d0
> > > [   30.555998]  ? import_iovec+0x1a/0x20
> > > [   30.556401]  ? copy_msghdr_from_user+0x68/0xa0
> > > [   30.556676]  ___sys_recvmsg+0x77/0xc0
> > > [   30.556856]  ? __fget_files+0xc8/0x1a0
> > > [   30.557612]  ? lock_release+0xbd/0x290
> > > [   30.557799]  ? __fget_files+0xcd/0x1a0
> > > [   30.557969]  __sys_recvmsg+0x55/0xa0
> > > [   30.558284]  do_syscall_64+0xc9/0x1e0
> > > [   30.558455]  entry_SYSCALL_64_after_hwframe+0x6d/0x75
> > > [   30.558740] RIP: 0033:0x7f3c14632dad
> > > [   30.559329] Code: 28 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 6a ef ff ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2f 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 44 24 08 e8 9e ef f8
> > > [   30.560156] RSP: 002b:00007f3c12c43e60 EFLAGS: 00000293 ORIG_RAX: 000000000000002f
> > > [   30.560582] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3c14632dad
> > > [   30.560933] RDX: 0000000000000000 RSI: 00007f3c12c44eb0 RDI: 0000000000000005
> > > [   30.562935] RBP: 00007f3c12c44ef0 R08: 0000000000000000 R09: 00007f3c12c45700
> > > [   30.565833] R10: fffffffffffff648 R11: 0000000000000293 R12: 00007ffe93a2bfde
> > > [   30.566161] R13: 00007ffe93a2bfdf R14: 00007f3c12c44fc0 R15: 0000000000802000
> > > [   30.569456]  </TASK>
> > >
> > >
> > >
> > >
> > > What are your thoughts on this?
> > 
> > You are talking about some unreleased code ?
> > 
> > I can not comment, obviously.
> 
> 
> 
> I think it would be prudent to patch __unix_dgram_recvmsg() as 
> depicted below to ensure its proper functionality, even 
> in the event of a later addition of mutex_lock in sk_setsockopt(). 
> 
> By implementing this patch, we mitigate the risk of potential deadlock scenarios 
> in the future by eliminating the condition that could lead to them.

It might mitigate your risk, but it does not exist upstream.

We don't accept such a patch that adds unnecessary locks just
for a future possible issue.  It should be fixed when such an
option requiring u->iolock is added.


> 
> ---
>  net/unix/af_unix.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index 5b41e2321209..f102f08f649f 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -2458,11 +2458,14 @@ int __unix_dgram_recvmsg(struct sock *sk, struct msghdr *msg, size_t size,
>                                                 EPOLLWRBAND);
>  
>         if (msg->msg_name) {
> +               mutex_unlock(&u->iolock);
> +
>                 unix_copy_addr(msg, skb->sk);
> -
> +
>                 BPF_CGROUP_RUN_PROG_UNIX_RECVMSG_LOCK(sk,
>                                                       msg->msg_name,
>                                                       &msg->msg_namelen);
> +               mutex_lock(&u->iolock);
>         }
>  
>         if (size > skb->len - skip)
> @@ -2814,6 +2817,8 @@ static int unix_stream_read_generic(struct unix_stream_read_state *state,
>  
>                 /* Copy address just once */
>                 if (state->msg && state->msg->msg_name) {
> +                       mutex_unlock(&u->iolock);
> +
>                         DECLARE_SOCKADDR(struct sockaddr_un *, sunaddr,
>                                          state->msg->msg_name);
>                         unix_copy_addr(state->msg, skb->sk);
> @@ -2823,6 +2828,8 @@ static int unix_stream_read_generic(struct unix_stream_read_state *state,
>                                                               &state->msg->msg_namelen);
>  
>                         sunaddr = NULL;
> +
> +                       mutex_lock(&u->iolock);
>                 }
>  
>                 chunk = min_t(unsigned int, unix_skb_len(skb) - skip, size);
> -- 
> 2.34.1