[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230420215407.48720-1-kuniyu@amazon.com>
Date: Thu, 20 Apr 2023 14:54:07 -0700
From: Kuniyuki Iwashima <kuniyu@...zon.com>
To: <willemdebruijn.kernel@...il.com>
CC: <davem@...emloft.net>, <edumazet@...gle.com>, <kuba@...nel.org>,
<kuni1840@...il.com>, <kuniyu@...zon.com>,
<netdev@...r.kernel.org>, <pabeni@...hat.com>,
<syzkaller@...glegroups.com>, <willemb@...gle.com>
Subject: RE: [PATCH v3 net] tcp/udp: Fix memleaks of sk and zerocopy skbs with TX timestamp.
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
Date: Thu, 20 Apr 2023 15:21:19 -0400
> Kuniyuki Iwashima wrote:
> > syzkaller reported [0] memory leaks of an UDP socket and ZEROCOPY
> > skbs. We can reproduce the problem with these sequences:
> >
> > sk = socket(AF_INET, SOCK_DGRAM, 0)
> > sk.setsockopt(SOL_SOCKET, SO_TIMESTAMPING, SOF_TIMESTAMPING_TX_SOFTWARE)
> > sk.setsockopt(SOL_SOCKET, SO_ZEROCOPY, 1)
> > sk.sendto(b'', MSG_ZEROCOPY, ('127.0.0.1', 53))
> > sk.close()
> >
> > sendmsg() calls msg_zerocopy_alloc(), which allocates a skb, sets
> > skb->cb->ubuf.refcnt to 1, and calls sock_hold(). Here, struct
> > ubuf_info_msgzc indirectly holds a refcnt of the socket. When the
> > skb is sent, __skb_tstamp_tx() clones it and puts the clone into
> > the socket's error queue with the TX timestamp.
> >
> > When the original skb is received locally, skb_copy_ubufs() calls
> > skb_unclone(), and pskb_expand_head() increments skb->cb->ubuf.refcnt.
> > This additional count is decremented while freeing the skb, but struct
> > ubuf_info_msgzc still has a refcnt, so __msg_zerocopy_callback() is
> > not called.
> >
> > The last refcnt is not released unless we retrieve the TX timestamped
> > skb by recvmsg(). Since we clear the error queue in inet_sock_destruct()
> > after the socket's refcnt reaches 0, there is a circular dependency.
> > If we close() the socket holding such skbs, we never call sock_put()
> > and leak the count, sk, and skb.
> >
> > TCP has the same problem, and commit e0c8bccd40fc ("net: stream:
> > purge sk_error_queue in sk_stream_kill_queues()") tried to fix it
> > by calling skb_queue_purge() during close(). However, there is a
> > small chance that skb queued in a qdisc or device could be put
> > into the error queue after the skb_queue_purge() call.
>
> I'd remove this part. If there is an issue in TCP, it is a separate
> issue and deserves a separate patch.
I don't think it's a separate issue. The issue resides in the common
zerocopy infra, and each blamed commit introduces the issue to each
protocol.
>
> The UDP part looks great to me. Thanks for fixing that.
>
> > In __skb_tstamp_tx(), the cloned skb should not have a reference
> > to the ubuf to remove the circular dependency, but skb_clone() does
> > not call skb_copy_ubufs() for zerocopy skb. So, we need to call
> > skb_orphan_frags_rx() for the cloned skb to call skb_copy_ubufs().
> >
> > [0]:
> > BUG: memory leak
> > unreferenced object 0xffff88800c6d2d00 (size 1152):
> > comm "syz-executor392", pid 264, jiffies 4294785440 (age 13.044s)
> > hex dump (first 32 bytes):
> > 00 00 00 00 00 00 00 00 cd af e8 81 00 00 00 00 ................
> > 02 00 07 40 00 00 00 00 00 00 00 00 00 00 00 00 ...@............
> > backtrace:
> > [<0000000055636812>] sk_prot_alloc+0x64/0x2a0 net/core/sock.c:2024
> > [<0000000054d77b7a>] sk_alloc+0x3b/0x800 net/core/sock.c:2083
> > [<0000000066f3c7e0>] inet_create net/ipv4/af_inet.c:319 [inline]
> > [<0000000066f3c7e0>] inet_create+0x31e/0xe40 net/ipv4/af_inet.c:245
> > [<000000009b83af97>] __sock_create+0x2ab/0x550 net/socket.c:1515
> > [<00000000b9b11231>] sock_create net/socket.c:1566 [inline]
> > [<00000000b9b11231>] __sys_socket_create net/socket.c:1603 [inline]
> > [<00000000b9b11231>] __sys_socket_create net/socket.c:1588 [inline]
> > [<00000000b9b11231>] __sys_socket+0x138/0x250 net/socket.c:1636
> > [<000000004fb45142>] __do_sys_socket net/socket.c:1649 [inline]
> > [<000000004fb45142>] __se_sys_socket net/socket.c:1647 [inline]
> > [<000000004fb45142>] __x64_sys_socket+0x73/0xb0 net/socket.c:1647
> > [<0000000066999e0e>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > [<0000000066999e0e>] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80
> > [<0000000017f238c1>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
> >
> > BUG: memory leak
> > unreferenced object 0xffff888017633a00 (size 240):
> > comm "syz-executor392", pid 264, jiffies 4294785440 (age 13.044s)
> > hex dump (first 32 bytes):
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> > 00 00 00 00 00 00 00 00 00 2d 6d 0c 80 88 ff ff .........-m.....
> > backtrace:
> > [<000000002b1c4368>] __alloc_skb+0x229/0x320 net/core/skbuff.c:497
> > [<00000000143579a6>] alloc_skb include/linux/skbuff.h:1265 [inline]
> > [<00000000143579a6>] sock_omalloc+0xaa/0x190 net/core/sock.c:2596
> > [<00000000be626478>] msg_zerocopy_alloc net/core/skbuff.c:1294 [inline]
> > [<00000000be626478>] msg_zerocopy_realloc+0x1ce/0x7f0 net/core/skbuff.c:1370
> > [<00000000cbfc9870>] __ip_append_data+0x2adf/0x3b30 net/ipv4/ip_output.c:1037
> > [<0000000089869146>] ip_make_skb+0x26c/0x2e0 net/ipv4/ip_output.c:1652
> > [<00000000098015c2>] udp_sendmsg+0x1bac/0x2390 net/ipv4/udp.c:1253
> > [<0000000045e0e95e>] inet_sendmsg+0x10a/0x150 net/ipv4/af_inet.c:819
> > [<000000008d31bfde>] sock_sendmsg_nosec net/socket.c:714 [inline]
> > [<000000008d31bfde>] sock_sendmsg+0x141/0x190 net/socket.c:734
> > [<0000000021e21aa4>] __sys_sendto+0x243/0x360 net/socket.c:2117
> > [<00000000ac0af00c>] __do_sys_sendto net/socket.c:2129 [inline]
> > [<00000000ac0af00c>] __se_sys_sendto net/socket.c:2125 [inline]
> > [<00000000ac0af00c>] __x64_sys_sendto+0xe1/0x1c0 net/socket.c:2125
> > [<0000000066999e0e>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > [<0000000066999e0e>] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80
> > [<0000000017f238c1>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
> >
> > Fixes: f214f915e7db ("tcp: enable MSG_ZEROCOPY")
> > Fixes: b5947e5d1e71 ("udp: msg_zerocopy")
> > Reported-by: syzbot <syzkaller@...glegroups.com>
> > Signed-off-by: Kuniyuki Iwashima <kuniyu@...zon.com>
> > ---
> > v3:
> > * Call skb_orphan_frags_rx() instead of adding locking rule and skb_queue_purge()
> >
> > v2: https://lore.kernel.org/netdev/20230418180832.81430-1-kuniyu@amazon.com/
> > * Move skb_queue_purge() after setting SOCK_DEAD in udp_destroy_sock()
> > * Check SOCK_DEAD in sock_queue_err_skb() with sk_error_queue.lock
> > * Add Fixes tag for TCP
> >
> > v1: https://lore.kernel.org/netdev/20230417171155.22916-1-kuniyu@amazon.com/
> > ---
> > net/core/skbuff.c | 3 +++
> > net/core/stream.c | 6 ------
> > 2 files changed, 3 insertions(+), 6 deletions(-)
> >
> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > index 4c0879798eb8..2f9bb98170ab 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -5162,6 +5162,9 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
> > skb = alloc_skb(0, GFP_ATOMIC);
> > } else {
> > skb = skb_clone(orig_skb, GFP_ATOMIC);
> > +
> > + if (skb_orphan_frags_rx(skb, GFP_ATOMIC))
> > + return;
> > }
> > if (!skb)
> > return;
> > diff --git a/net/core/stream.c b/net/core/stream.c
> > index 434446ab14c5..e6dd1a68545f 100644
> > --- a/net/core/stream.c
> > +++ b/net/core/stream.c
> > @@ -196,12 +196,6 @@ void sk_stream_kill_queues(struct sock *sk)
> > /* First the read buffer. */
> > __skb_queue_purge(&sk->sk_receive_queue);
> >
> > - /* Next, the error queue.
> > - * We need to use queue lock, because other threads might
> > - * add packets to the queue without socket lock being held.
> > - */
> > - skb_queue_purge(&sk->sk_error_queue);
> > -
>
> Why include this?
As mentioned in 24bcbe1cc69f ("net: stream: don't purge sk_error_queue in
sk_stream_kill_queues()"), we don't need this, but e0c8bccd40fc ("net:
stream: purge sk_error_queue in sk_stream_kill_queues()") added it back
to avoid the same issue. It's most likely to remove the issue, but we
concluded this way was insufficient.
Now we have the proper fix and should remove the workaround.
>
> > /* Next, the write queue. */
> > WARN_ON_ONCE(!skb_queue_empty(&sk->sk_write_queue));
> >
> > --
> > 2.30.2
Powered by blists - more mailing lists