[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAL+tcoAKEv2WOV-4k5kDa2EJMGp_h0bW3jhYZrQ9aiK+s4AcOQ@mail.gmail.com>
Date: Wed, 22 May 2024 14:27:30 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: dsahern@...nel.org, kuba@...nel.org, pabeni@...hat.com,
davem@...emloft.net, kuniyu@...zon.com, netdev@...r.kernel.org,
Jason Xing <kernelxing@...cent.com>,
syzbot+2eca27bdcb48ed330251@...kaller.appspotmail.com
Subject: Re: [PATCH net] Revert "rds: tcp: Fix use-after-free of net in reqsk_timer_handler()."
Hello Eric,
On Tue, May 21, 2024 at 11:49 PM Eric Dumazet <edumazet@...gle.com> wrote:
>
> On Tue, May 21, 2024 at 4:49 PM Jason Xing <kerneljasonxing@...il.com> wrote:
> >
> > From: Jason Xing <kernelxing@...cent.com>
> >
> > This reverts commit 2a750d6a5b365265dbda33330a6188547ddb5c24.
> >
> > Syzbot[1] reported the drecrement of reference count hits leaking memory.
> >
> > If we failed in setup_net() and try to undo the setup process, the
> > reference now is 1 which shouldn't be decremented. However, it happened
> > actually.
> >
> > After applying this patch which allows us to check the reference first,
> > it will not hit zero anymore in tcp_twsk_purge() without calling
> > inet_twsk_purge() one more time.
> >
> > [1]
> > refcount_t: decrement hit 0; leaking memory.
> > WARNING: CPU: 3 PID: 1396 at lib/refcount.c:31 refcount_warn_saturate+0x1ed/0x210 lib/refcount.c:31
> > Modules linked in:
> > CPU: 3 PID: 1396 Comm: syz-executor.3 Not tainted 6.9.0-syzkaller-07370-g33e02dc69afb #0
> > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > RIP: 0010:refcount_warn_saturate+0x1ed/0x210 lib/refcount.c:31
> > RSP: 0018:ffffc9000480fa70 EFLAGS: 00010282
> > RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffc9002ce28000
> > RDX: 0000000000040000 RSI: ffffffff81505406 RDI: 0000000000000001
> > RBP: ffff88804d8b3f80 R08: 0000000000000001 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000002 R12: ffff88804d8b3f80
> > R13: ffff888031c601c0 R14: ffffc900013c04f8 R15: 000000002a3e5567
> > FS: 00007f56d897c6c0(0000) GS:ffff88806b300000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000001b3182b000 CR3: 0000000034ed6000 CR4: 0000000000350ef0
> > Call Trace:
> > <TASK>
> > __refcount_dec include/linux/refcount.h:336 [inline]
> > refcount_dec include/linux/refcount.h:351 [inline]
> > inet_twsk_kill+0x758/0x9c0 net/ipv4/inet_timewait_sock.c:70
> > inet_twsk_deschedule_put net/ipv4/inet_timewait_sock.c:221 [inline]
> > inet_twsk_purge+0x725/0x890 net/ipv4/inet_timewait_sock.c:304
> > tcp_twsk_purge+0x115/0x150 net/ipv4/tcp_minisocks.c:402
> > tcp_sk_exit_batch+0x1c/0x170 net/ipv4/tcp_ipv4.c:3522
> > ops_exit_list+0x128/0x180 net/core/net_namespace.c:178
> > setup_net+0x714/0xb40 net/core/net_namespace.c:375
> > copy_net_ns+0x2f0/0x670 net/core/net_namespace.c:508
> > create_new_namespaces+0x3ea/0xb10 kernel/nsproxy.c:110
> > unshare_nsproxy_namespaces+0xc0/0x1f0 kernel/nsproxy.c:228
> > ksys_unshare+0x419/0x970 kernel/fork.c:3323
> > __do_sys_unshare kernel/fork.c:3394 [inline]
> > __se_sys_unshare kernel/fork.c:3392 [inline]
> > __x64_sys_unshare+0x31/0x40 kernel/fork.c:3392
> > do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> > do_syscall_64+0xcf/0x260 arch/x86/entry/common.c:83
> > entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > RIP: 0033:0x7f56d7c7cee9
> >
> > Fixes: 2a750d6a5b36 ("rds: tcp: Fix use-after-free of net in reqsk_timer_handler().")
> > Reported-by: syzbot+2eca27bdcb48ed330251@...kaller.appspotmail.com
> > Closes: https://syzkaller.appspot.com/bug?extid=2eca27bdcb48ed330251
> > Signed-off-by: Jason Xing <kernelxing@...cent.com>
> > ---
> > The reverted patch trying to solve another issue causes unexpected error as above. I
> > think that issue can be properly analyzed and handled later. So can we revert it first?
> > ---
> > net/ipv4/tcp_minisocks.c | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
> > index b93619b2384b..46e6f9db4227 100644
> > --- a/net/ipv4/tcp_minisocks.c
> > +++ b/net/ipv4/tcp_minisocks.c
> > @@ -399,6 +399,10 @@ void tcp_twsk_purge(struct list_head *net_exit_list)
> > /* Even if tw_refcount == 1, we must clean up kernel reqsk */
> > inet_twsk_purge(net->ipv4.tcp_death_row.hashinfo);
> > } else if (!purged_once) {
> > + /* The last refcount is decremented in tcp_sk_exit_batch() */
> > + if (refcount_read(&net->ipv4.tcp_death_row.tw_refcount) == 1)
> > + continue;
> > +
> > inet_twsk_purge(&tcp_hashinfo);
> > purged_once = true;
> > }
> > --
>
> This can not be a fix for a race condition.
>
> By definition a TW has a refcount on tcp_death_row.tw_refcount only
> if its timer is armed.
>
> And inet_twsk_deschedule_put() does
>
> if (del_timer_sync(&tw->tw_timer))
> inet_twsk_kill(tw);
>
> I think you need to provide a full explanation, instead of a shot in the dark.
>
> Before releasing this syzbot, I thought that maybe the refcount_inc()
> was done too late,
> but considering other invariants, this should not matter.
>
> diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c
> index e28075f0006e333897ad379ebc8c87fc3f9643bd..d8f4d93c45331be64d70af96de33f783870e1dcc
> 100644
> --- a/net/ipv4/inet_timewait_sock.c
> +++ b/net/ipv4/inet_timewait_sock.c
> @@ -255,8 +255,8 @@ void __inet_twsk_schedule(struct
> inet_timewait_sock *tw, int timeo, bool rearm)
>
> __NET_INC_STATS(twsk_net(tw), kill ? LINUX_MIB_TIMEWAITKILLED :
> LINUX_MIB_TIMEWAITED);
> - BUG_ON(mod_timer(&tw->tw_timer, jiffies + timeo));
> refcount_inc(&tw->tw_dr->tw_refcount);
> + BUG_ON(mod_timer(&tw->tw_timer, jiffies + timeo));
> } else {
> mod_timer_pending(&tw->tw_timer, jiffies + timeo);
> }
Thanks for your information.
What you wrote is right, I think for a while. In
inet_twsk_deschedule_put(), there are two possible cases:
1) if it finds the timer is armed, then it can kill the tw socket by
decrementing the refcount in the right way. So it's a good/lucky thing
for us.
or
2) if it misses the point, then the tw socket arms the timer which
will expire in 60 seconds in the initialization phase. The tw socket
would have a chance to be destroyed when the timer expires.
It seems that you don't think your code could solve the race issue?
Thanks,
Jason
Powered by blists - more mailing lists