lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d902fd06-00c2-fbff-1df2-4db3e890724a@nvidia.com>
Date:   Sun, 21 Nov 2021 20:17:49 +0200
From:   Nikolay Aleksandrov <nikolay@...dia.com>
To:     Ido Schimmel <idosch@...sch.org>,
        Nikolay Aleksandrov <razor@...ckwall.org>
Cc:     netdev@...r.kernel.org, davem@...emloft.net, kuba@...nel.org,
        dsahern@...il.com
Subject: Re: [PATCH net 0/3] net: nexthop: fix refcount issues when replacing
 groups

On 21/11/2021 19:55, Ido Schimmel wrote:
> On Sun, Nov 21, 2021 at 05:24:50PM +0200, Nikolay Aleksandrov wrote:
>> From: Nikolay Aleksandrov <nikolay@...dia.com>
>>
>> Hi,
>> This set fixes a refcount bug when replacing nexthop groups and
>> modifying routes. It is complex because the objects look valid when
>> debugging memory dumps, but we end up having refcount dependency between
>> unlinked objects which can never be released, so in turn they cannot
>> free their resources and refcounts. The problem happens because we can
>> have stale IPv6 per-cpu dsts in nexthops which were removed from a
>> group. Even though the IPv6 gen is bumped, the dsts won't be released
>> until traffic passes through them or the nexthop is freed, that can take
>> arbitrarily long time, and even worse we can create a scenario[1] where it
>> can never be released. The fix is to release the IPv6 per-cpu dsts of
>> replaced nexthops after an RCU grace period so no new ones can be
>> created. To do that we add a new IPv6 stub - fib6_nh_release_dsts, which
>> is used by the nexthop code only when necessary. We can further optimize
>> group replacement, but that is more suited for net-next as these patches
>> would have to be backported to stable releases.
> 
> Will run regression with these patches tonight and report tomorrow
> 

Thank you, I've prepared v2 with the selftest mausezahn check and will hold
it off to see how the tests would go. Also if any comments show up in the
meantime. :)

By the way I've been running a torture test all day for multiple IPv6 route
forwarding + local traffic through different CPUs while also replacing multiple
nh groups referencing multiple nexthops, so far it looks good.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ