lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YZtncGsgIbo+q390@shredder>
Date:   Mon, 22 Nov 2021 11:48:32 +0200
From:   Ido Schimmel <idosch@...sch.org>
To:     Nikolay Aleksandrov <nikolay@...dia.com>
Cc:     Nikolay Aleksandrov <razor@...ckwall.org>, netdev@...r.kernel.org,
        davem@...emloft.net, kuba@...nel.org, dsahern@...il.com
Subject: Re: [PATCH net 0/3] net: nexthop: fix refcount issues when replacing
 groups

On Sun, Nov 21, 2021 at 08:17:49PM +0200, Nikolay Aleksandrov wrote:
> On 21/11/2021 19:55, Ido Schimmel wrote:
> > On Sun, Nov 21, 2021 at 05:24:50PM +0200, Nikolay Aleksandrov wrote:
> >> From: Nikolay Aleksandrov <nikolay@...dia.com>
> >>
> >> Hi,
> >> This set fixes a refcount bug when replacing nexthop groups and
> >> modifying routes. It is complex because the objects look valid when
> >> debugging memory dumps, but we end up having refcount dependency between
> >> unlinked objects which can never be released, so in turn they cannot
> >> free their resources and refcounts. The problem happens because we can
> >> have stale IPv6 per-cpu dsts in nexthops which were removed from a
> >> group. Even though the IPv6 gen is bumped, the dsts won't be released
> >> until traffic passes through them or the nexthop is freed, that can take
> >> arbitrarily long time, and even worse we can create a scenario[1] where it
> >> can never be released. The fix is to release the IPv6 per-cpu dsts of
> >> replaced nexthops after an RCU grace period so no new ones can be
> >> created. To do that we add a new IPv6 stub - fib6_nh_release_dsts, which
> >> is used by the nexthop code only when necessary. We can further optimize
> >> group replacement, but that is more suited for net-next as these patches
> >> would have to be backported to stable releases.
> > 
> > Will run regression with these patches tonight and report tomorrow
> > 
> 
> Thank you, I've prepared v2 with the selftest mausezahn check and will hold
> it off to see how the tests would go. Also if any comments show up in the
> meantime. :)
> 
> By the way I've been running a torture test all day for multiple IPv6 route
> forwarding + local traffic through different CPUs while also replacing multiple
> nh groups referencing multiple nexthops, so far it looks good.

Regression looks good. Later today I will also have results from a debug
kernel, but I think it should be fine.

Regarding patch #2, can you add a comment (or edit the commit message)
to explain why the fix is only relevant for IPv4? I made this comment,
but I think it was missed:

"This problem is specific to IPv6 because IPv4 dst entries do not hold
references on routes / FIB info thereby avoiding the circular dependency
described in the commit message?"

Thanks!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ