[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAL+tcoD+8A+eJ2M2mgTw0HSaNEa7YDvakO3Q_CFNn-eeUmVzHQ@mail.gmail.com>
Date: Mon, 12 Aug 2024 23:49:40 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Florian Westphal <fw@...len.de>
Cc: Kuniyuki Iwashima <kuniyu@...zon.com>, davem@...emloft.net, dsahern@...nel.org,
edumazet@...gle.com, kuba@...nel.org, linux-kernel@...r.kernel.org,
netdev@...r.kernel.org, pabeni@...hat.com,
syzbot+8ea26396ff85d23a8929@...kaller.appspotmail.com,
syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] [net?] WARNING: refcount bug in inet_twsk_kill
On Mon, Aug 12, 2024 at 11:03 PM Florian Westphal <fw@...len.de> wrote:
>
> Jason Xing <kerneljasonxing@...il.com> wrote:
> > > I don't see how this helps, we need to wait until 'stolen' twsk
> > > has gone through inet_twsk_kill() and decremented tw_refcount.
> > > Obviously It would be a bit simpler if we had a reliable reproducer :-)
> >
> > Allow me to say something irrelevant to this bug report.
> >
> > Do you think that Kuniyuki's patch can solve the race between two
> > 'killers' calling inet_twsk_deschedule_put()->inet_twsk_kill()
> > concurrently at two cores, say, inet_twsk_purge() and tcp_abort()?
>
> I don't think its possible, tcp_abort() calls inet_twsk_deschedule_put,
> which does:
>
> if (timer_shutdown_sync(&tw->tw_timer))
> inet_twsk_kill(tw);
>
> So I don't see how two concurrent callers, working on same tw address,
> would both be able to shut down the timer.
>
> One will shut it down and calls inet_twsk_kill(), other will wait until
> the callback has completed, but it doesn't call inet_twsk_kill().
Oh, thanks. Since timer_shutdown_sync() can be used as a lock, there
is indeed no way to call inet_twsk_kill() concurrently.
>
> > It at least does help avoid decrementing tw_refcount twice in the
> > above case if I understand correctly.
>
> I don't think the refcount is decremented twice.
>
> Problem is one thread is already at the 'final' decrement of
> WARN_ON_ONCE(!refcount_dec_and_test(&net->ipv4.tcp_death_row.tw_refcount));
>
> in tcp_sk_exit_batch(), while other thread has not yet called
> refcount_dec() on it (inet_twsk_kill still executing).
>
> So we get two splats, refcount_dec_and_test() returns 1 not expected 0
> and refcount_dec() coming right afterwards from other task observes the
> transition to 0, while it should have dropped down to 1.
I see.
Thanks,
Jason
Powered by blists - more mailing lists