lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 21 May 2024 17:49:32 +0200
From: Eric Dumazet <edumazet@...gle.com>
To: Jason Xing <kerneljasonxing@...il.com>
Cc: dsahern@...nel.org, kuba@...nel.org, pabeni@...hat.com, 
	davem@...emloft.net, kuniyu@...zon.com, netdev@...r.kernel.org, 
	Jason Xing <kernelxing@...cent.com>, 
	syzbot+2eca27bdcb48ed330251@...kaller.appspotmail.com
Subject: Re: [PATCH net] Revert "rds: tcp: Fix use-after-free of net in reqsk_timer_handler()."

On Tue, May 21, 2024 at 4:49 PM Jason Xing <kerneljasonxing@...il.com> wrote:
>
> From: Jason Xing <kernelxing@...cent.com>
>
> This reverts commit 2a750d6a5b365265dbda33330a6188547ddb5c24.
>
> Syzbot[1] reported the drecrement of reference count hits leaking memory.
>
> If we failed in setup_net() and try to undo the setup process, the
> reference now is 1 which shouldn't be decremented. However, it happened
> actually.
>
> After applying this patch which allows us to check the reference first,
> it will not hit zero anymore in tcp_twsk_purge() without calling
> inet_twsk_purge() one more time.
>
> [1]
> refcount_t: decrement hit 0; leaking memory.
> WARNING: CPU: 3 PID: 1396 at lib/refcount.c:31 refcount_warn_saturate+0x1ed/0x210 lib/refcount.c:31
> Modules linked in:
> CPU: 3 PID: 1396 Comm: syz-executor.3 Not tainted 6.9.0-syzkaller-07370-g33e02dc69afb #0
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> RIP: 0010:refcount_warn_saturate+0x1ed/0x210 lib/refcount.c:31
> RSP: 0018:ffffc9000480fa70 EFLAGS: 00010282
> RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffc9002ce28000
> RDX: 0000000000040000 RSI: ffffffff81505406 RDI: 0000000000000001
> RBP: ffff88804d8b3f80 R08: 0000000000000001 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000002 R12: ffff88804d8b3f80
> R13: ffff888031c601c0 R14: ffffc900013c04f8 R15: 000000002a3e5567
> FS:  00007f56d897c6c0(0000) GS:ffff88806b300000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000001b3182b000 CR3: 0000000034ed6000 CR4: 0000000000350ef0
> Call Trace:
>  <TASK>
>  __refcount_dec include/linux/refcount.h:336 [inline]
>  refcount_dec include/linux/refcount.h:351 [inline]
>  inet_twsk_kill+0x758/0x9c0 net/ipv4/inet_timewait_sock.c:70
>  inet_twsk_deschedule_put net/ipv4/inet_timewait_sock.c:221 [inline]
>  inet_twsk_purge+0x725/0x890 net/ipv4/inet_timewait_sock.c:304
>  tcp_twsk_purge+0x115/0x150 net/ipv4/tcp_minisocks.c:402
>  tcp_sk_exit_batch+0x1c/0x170 net/ipv4/tcp_ipv4.c:3522
>  ops_exit_list+0x128/0x180 net/core/net_namespace.c:178
>  setup_net+0x714/0xb40 net/core/net_namespace.c:375
>  copy_net_ns+0x2f0/0x670 net/core/net_namespace.c:508
>  create_new_namespaces+0x3ea/0xb10 kernel/nsproxy.c:110
>  unshare_nsproxy_namespaces+0xc0/0x1f0 kernel/nsproxy.c:228
>  ksys_unshare+0x419/0x970 kernel/fork.c:3323
>  __do_sys_unshare kernel/fork.c:3394 [inline]
>  __se_sys_unshare kernel/fork.c:3392 [inline]
>  __x64_sys_unshare+0x31/0x40 kernel/fork.c:3392
>  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
>  do_syscall_64+0xcf/0x260 arch/x86/entry/common.c:83
>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7f56d7c7cee9
>
> Fixes: 2a750d6a5b36 ("rds: tcp: Fix use-after-free of net in reqsk_timer_handler().")
> Reported-by: syzbot+2eca27bdcb48ed330251@...kaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=2eca27bdcb48ed330251
> Signed-off-by: Jason Xing <kernelxing@...cent.com>
> ---
> The reverted patch trying to solve another issue causes unexpected error as above. I
> think that issue can be properly analyzed and handled later. So can we revert it first?
> ---
>  net/ipv4/tcp_minisocks.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
> index b93619b2384b..46e6f9db4227 100644
> --- a/net/ipv4/tcp_minisocks.c
> +++ b/net/ipv4/tcp_minisocks.c
> @@ -399,6 +399,10 @@ void tcp_twsk_purge(struct list_head *net_exit_list)
>                         /* Even if tw_refcount == 1, we must clean up kernel reqsk */
>                         inet_twsk_purge(net->ipv4.tcp_death_row.hashinfo);
>                 } else if (!purged_once) {
> +                       /* The last refcount is decremented in tcp_sk_exit_batch() */
> +                       if (refcount_read(&net->ipv4.tcp_death_row.tw_refcount) == 1)
> +                               continue;
> +
>                         inet_twsk_purge(&tcp_hashinfo);
>                         purged_once = true;
>                 }
> --

This can not be a fix for a race condition.

By definition a TW has a refcount on tcp_death_row.tw_refcount   only
if its timer is armed.

And inet_twsk_deschedule_put() does

if (del_timer_sync(&tw->tw_timer))
    inet_twsk_kill(tw);

I think you need to provide a full explanation, instead of a shot in the dark.

Before releasing this syzbot, I thought that maybe the refcount_inc()
was done too late,
but considering other invariants, this should not matter.

diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c
index e28075f0006e333897ad379ebc8c87fc3f9643bd..d8f4d93c45331be64d70af96de33f783870e1dcc
100644
--- a/net/ipv4/inet_timewait_sock.c
+++ b/net/ipv4/inet_timewait_sock.c
@@ -255,8 +255,8 @@ void __inet_twsk_schedule(struct
inet_timewait_sock *tw, int timeo, bool rearm)

                __NET_INC_STATS(twsk_net(tw), kill ? LINUX_MIB_TIMEWAITKILLED :
                                                     LINUX_MIB_TIMEWAITED);
-               BUG_ON(mod_timer(&tw->tw_timer, jiffies + timeo));
                refcount_inc(&tw->tw_dr->tw_refcount);
+               BUG_ON(mod_timer(&tw->tw_timer, jiffies + timeo));
        } else {
                mod_timer_pending(&tw->tw_timer, jiffies + timeo);
        }

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ