lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Mon, 22 Nov 2021 11:53:59 +0200
From:   Nikolay Aleksandrov <razor@...ckwall.org>
To:     Ido Schimmel <idosch@...sch.org>,
        Nikolay Aleksandrov <nikolay@...dia.com>
Cc:     netdev@...r.kernel.org, davem@...emloft.net, kuba@...nel.org,
        dsahern@...il.com
Subject: Re: [PATCH net 0/3] net: nexthop: fix refcount issues when replacing
 groups

On 22/11/2021 11:48, Ido Schimmel wrote:
> On Sun, Nov 21, 2021 at 08:17:49PM +0200, Nikolay Aleksandrov wrote:
>> On 21/11/2021 19:55, Ido Schimmel wrote:
>>> On Sun, Nov 21, 2021 at 05:24:50PM +0200, Nikolay Aleksandrov wrote:
>>>> From: Nikolay Aleksandrov <nikolay@...dia.com>
>>>>
>>>> Hi,
>>>> This set fixes a refcount bug when replacing nexthop groups and
>>>> modifying routes. It is complex because the objects look valid when
>>>> debugging memory dumps, but we end up having refcount dependency between
>>>> unlinked objects which can never be released, so in turn they cannot
>>>> free their resources and refcounts. The problem happens because we can
>>>> have stale IPv6 per-cpu dsts in nexthops which were removed from a
>>>> group. Even though the IPv6 gen is bumped, the dsts won't be released
>>>> until traffic passes through them or the nexthop is freed, that can take
>>>> arbitrarily long time, and even worse we can create a scenario[1] where it
>>>> can never be released. The fix is to release the IPv6 per-cpu dsts of
>>>> replaced nexthops after an RCU grace period so no new ones can be
>>>> created. To do that we add a new IPv6 stub - fib6_nh_release_dsts, which
>>>> is used by the nexthop code only when necessary. We can further optimize
>>>> group replacement, but that is more suited for net-next as these patches
>>>> would have to be backported to stable releases.
>>>
>>> Will run regression with these patches tonight and report tomorrow
>>>
>>
>> Thank you, I've prepared v2 with the selftest mausezahn check and will hold
>> it off to see how the tests would go. Also if any comments show up in the
>> meantime. :)
>>
>> By the way I've been running a torture test all day for multiple IPv6 route
>> forwarding + local traffic through different CPUs while also replacing multiple
>> nh groups referencing multiple nexthops, so far it looks good.
> 
> Regression looks good. Later today I will also have results from a debug
> kernel, but I think it should be fine.
> 
> Regarding patch #2, can you add a comment (or edit the commit message)
> to explain why the fix is only relevant for IPv4? I made this comment,
> but I think it was missed:
> 

I saw it, I've updated the commit msg to reflect why IPv4 isn't affected.

> "This problem is specific to IPv6 because IPv4 dst entries do not hold
> references on routes / FIB info thereby avoiding the circular dependency
> described in the commit message?"
> > Thanks!
> 

Cheers,
 Nik


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ