[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YZtncGsgIbo+q390@shredder>
Date: Mon, 22 Nov 2021 11:48:32 +0200
From: Ido Schimmel <idosch@...sch.org>
To: Nikolay Aleksandrov <nikolay@...dia.com>
Cc: Nikolay Aleksandrov <razor@...ckwall.org>, netdev@...r.kernel.org,
davem@...emloft.net, kuba@...nel.org, dsahern@...il.com
Subject: Re: [PATCH net 0/3] net: nexthop: fix refcount issues when replacing
groups
On Sun, Nov 21, 2021 at 08:17:49PM +0200, Nikolay Aleksandrov wrote:
> On 21/11/2021 19:55, Ido Schimmel wrote:
> > On Sun, Nov 21, 2021 at 05:24:50PM +0200, Nikolay Aleksandrov wrote:
> >> From: Nikolay Aleksandrov <nikolay@...dia.com>
> >>
> >> Hi,
> >> This set fixes a refcount bug when replacing nexthop groups and
> >> modifying routes. It is complex because the objects look valid when
> >> debugging memory dumps, but we end up having refcount dependency between
> >> unlinked objects which can never be released, so in turn they cannot
> >> free their resources and refcounts. The problem happens because we can
> >> have stale IPv6 per-cpu dsts in nexthops which were removed from a
> >> group. Even though the IPv6 gen is bumped, the dsts won't be released
> >> until traffic passes through them or the nexthop is freed, that can take
> >> arbitrarily long time, and even worse we can create a scenario[1] where it
> >> can never be released. The fix is to release the IPv6 per-cpu dsts of
> >> replaced nexthops after an RCU grace period so no new ones can be
> >> created. To do that we add a new IPv6 stub - fib6_nh_release_dsts, which
> >> is used by the nexthop code only when necessary. We can further optimize
> >> group replacement, but that is more suited for net-next as these patches
> >> would have to be backported to stable releases.
> >
> > Will run regression with these patches tonight and report tomorrow
> >
>
> Thank you, I've prepared v2 with the selftest mausezahn check and will hold
> it off to see how the tests would go. Also if any comments show up in the
> meantime. :)
>
> By the way I've been running a torture test all day for multiple IPv6 route
> forwarding + local traffic through different CPUs while also replacing multiple
> nh groups referencing multiple nexthops, so far it looks good.
Regression looks good. Later today I will also have results from a debug
kernel, but I think it should be fine.
Regarding patch #2, can you add a comment (or edit the commit message)
to explain why the fix is only relevant for IPv4? I made this comment,
but I think it was missed:
"This problem is specific to IPv6 because IPv4 dst entries do not hold
references on routes / FIB info thereby avoiding the circular dependency
described in the commit message?"
Thanks!
Powered by blists - more mailing lists