lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 22 May 2024 14:27:30 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: dsahern@...nel.org, kuba@...nel.org, pabeni@...hat.com, 
	davem@...emloft.net, kuniyu@...zon.com, netdev@...r.kernel.org, 
	Jason Xing <kernelxing@...cent.com>, 
	syzbot+2eca27bdcb48ed330251@...kaller.appspotmail.com
Subject: Re: [PATCH net] Revert "rds: tcp: Fix use-after-free of net in reqsk_timer_handler()."

Hello Eric,

On Tue, May 21, 2024 at 11:49 PM Eric Dumazet <edumazet@...gle.com> wrote:
>
> On Tue, May 21, 2024 at 4:49 PM Jason Xing <kerneljasonxing@...il.com> wrote:
> >
> > From: Jason Xing <kernelxing@...cent.com>
> >
> > This reverts commit 2a750d6a5b365265dbda33330a6188547ddb5c24.
> >
> > Syzbot[1] reported the drecrement of reference count hits leaking memory.
> >
> > If we failed in setup_net() and try to undo the setup process, the
> > reference now is 1 which shouldn't be decremented. However, it happened
> > actually.
> >
> > After applying this patch which allows us to check the reference first,
> > it will not hit zero anymore in tcp_twsk_purge() without calling
> > inet_twsk_purge() one more time.
> >
> > [1]
> > refcount_t: decrement hit 0; leaking memory.
> > WARNING: CPU: 3 PID: 1396 at lib/refcount.c:31 refcount_warn_saturate+0x1ed/0x210 lib/refcount.c:31
> > Modules linked in:
> > CPU: 3 PID: 1396 Comm: syz-executor.3 Not tainted 6.9.0-syzkaller-07370-g33e02dc69afb #0
> > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > RIP: 0010:refcount_warn_saturate+0x1ed/0x210 lib/refcount.c:31
> > RSP: 0018:ffffc9000480fa70 EFLAGS: 00010282
> > RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffc9002ce28000
> > RDX: 0000000000040000 RSI: ffffffff81505406 RDI: 0000000000000001
> > RBP: ffff88804d8b3f80 R08: 0000000000000001 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000002 R12: ffff88804d8b3f80
> > R13: ffff888031c601c0 R14: ffffc900013c04f8 R15: 000000002a3e5567
> > FS:  00007f56d897c6c0(0000) GS:ffff88806b300000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000001b3182b000 CR3: 0000000034ed6000 CR4: 0000000000350ef0
> > Call Trace:
> >  <TASK>
> >  __refcount_dec include/linux/refcount.h:336 [inline]
> >  refcount_dec include/linux/refcount.h:351 [inline]
> >  inet_twsk_kill+0x758/0x9c0 net/ipv4/inet_timewait_sock.c:70
> >  inet_twsk_deschedule_put net/ipv4/inet_timewait_sock.c:221 [inline]
> >  inet_twsk_purge+0x725/0x890 net/ipv4/inet_timewait_sock.c:304
> >  tcp_twsk_purge+0x115/0x150 net/ipv4/tcp_minisocks.c:402
> >  tcp_sk_exit_batch+0x1c/0x170 net/ipv4/tcp_ipv4.c:3522
> >  ops_exit_list+0x128/0x180 net/core/net_namespace.c:178
> >  setup_net+0x714/0xb40 net/core/net_namespace.c:375
> >  copy_net_ns+0x2f0/0x670 net/core/net_namespace.c:508
> >  create_new_namespaces+0x3ea/0xb10 kernel/nsproxy.c:110
> >  unshare_nsproxy_namespaces+0xc0/0x1f0 kernel/nsproxy.c:228
> >  ksys_unshare+0x419/0x970 kernel/fork.c:3323
> >  __do_sys_unshare kernel/fork.c:3394 [inline]
> >  __se_sys_unshare kernel/fork.c:3392 [inline]
> >  __x64_sys_unshare+0x31/0x40 kernel/fork.c:3392
> >  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> >  do_syscall_64+0xcf/0x260 arch/x86/entry/common.c:83
> >  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > RIP: 0033:0x7f56d7c7cee9
> >
> > Fixes: 2a750d6a5b36 ("rds: tcp: Fix use-after-free of net in reqsk_timer_handler().")
> > Reported-by: syzbot+2eca27bdcb48ed330251@...kaller.appspotmail.com
> > Closes: https://syzkaller.appspot.com/bug?extid=2eca27bdcb48ed330251
> > Signed-off-by: Jason Xing <kernelxing@...cent.com>
> > ---
> > The reverted patch trying to solve another issue causes unexpected error as above. I
> > think that issue can be properly analyzed and handled later. So can we revert it first?
> > ---
> >  net/ipv4/tcp_minisocks.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
> > index b93619b2384b..46e6f9db4227 100644
> > --- a/net/ipv4/tcp_minisocks.c
> > +++ b/net/ipv4/tcp_minisocks.c
> > @@ -399,6 +399,10 @@ void tcp_twsk_purge(struct list_head *net_exit_list)
> >                         /* Even if tw_refcount == 1, we must clean up kernel reqsk */
> >                         inet_twsk_purge(net->ipv4.tcp_death_row.hashinfo);
> >                 } else if (!purged_once) {
> > +                       /* The last refcount is decremented in tcp_sk_exit_batch() */
> > +                       if (refcount_read(&net->ipv4.tcp_death_row.tw_refcount) == 1)
> > +                               continue;
> > +
> >                         inet_twsk_purge(&tcp_hashinfo);
> >                         purged_once = true;
> >                 }
> > --
>
> This can not be a fix for a race condition.
>
> By definition a TW has a refcount on tcp_death_row.tw_refcount   only
> if its timer is armed.
>
> And inet_twsk_deschedule_put() does
>
> if (del_timer_sync(&tw->tw_timer))
>     inet_twsk_kill(tw);
>
> I think you need to provide a full explanation, instead of a shot in the dark.
>
> Before releasing this syzbot, I thought that maybe the refcount_inc()
> was done too late,
> but considering other invariants, this should not matter.
>
> diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c
> index e28075f0006e333897ad379ebc8c87fc3f9643bd..d8f4d93c45331be64d70af96de33f783870e1dcc
> 100644
> --- a/net/ipv4/inet_timewait_sock.c
> +++ b/net/ipv4/inet_timewait_sock.c
> @@ -255,8 +255,8 @@ void __inet_twsk_schedule(struct
> inet_timewait_sock *tw, int timeo, bool rearm)
>
>                 __NET_INC_STATS(twsk_net(tw), kill ? LINUX_MIB_TIMEWAITKILLED :
>                                                      LINUX_MIB_TIMEWAITED);
> -               BUG_ON(mod_timer(&tw->tw_timer, jiffies + timeo));
>                 refcount_inc(&tw->tw_dr->tw_refcount);
> +               BUG_ON(mod_timer(&tw->tw_timer, jiffies + timeo));
>         } else {
>                 mod_timer_pending(&tw->tw_timer, jiffies + timeo);
>         }

Thanks for your information.

What you wrote is right, I think for a while. In
inet_twsk_deschedule_put(), there are two possible cases:
1) if it finds the timer is armed, then it can kill the tw socket by
decrementing the refcount in the right way. So it's a good/lucky thing
for us.
or
2) if it misses the point, then the tw socket arms the timer which
will expire in 60 seconds in the initialization phase. The tw socket
would have a chance to be destroyed when the timer expires.

It seems that you don't think your code could solve the race issue?

Thanks,
Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ