netdev - Re: [PATCH net-next v2] Do not invoke addrconf_verify

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bdf0eec2-e168-48f7-9fdf-178cce4ee18b@redhat.com>
Date: Mon, 16 Dec 2024 08:52:06 +0100
From: Paolo Abeni <pabeni@...hat.com>
To: Gilad Naaman <gnaaman@...venets.com>
Cc: davem@...emloft.net, dsahern@...nel.org, edumazet@...gle.com,
 horms@...nel.org, kuba@...nel.org, netdev@...r.kernel.org
Subject: Re: [PATCH net-next v2] Do not invoke addrconf_verify_rtnl
 unnecessarily

On 12/13/24 16:13, Gilad Naaman wrote:
>>> If the address IS perishable, and IS the soonest-to-be-expired address,
>>> calling or not-calling "verify" for a single address deletion is
>>> equivalent in cost.
>>
>> This last statement is not obvious to me, could you please expand the
>> reasoning?
> 
> Sorry, it does seem a bit vague when I am re-reading it now.
> 
> What I meant is that calling addrconf_verify_rtnl when no upkeep needs to be
> done has some cost K (in seconds) which is roughly a function of the
> total amount of addresses.
> 
> Let's say you've configured some addresses, 4 of which are perishable:
> 
> 	   |                
> 	T0-+- <---We are here
> 	   |                
> 	   |                
> 	T1-+- A <---Timer      
> 	   |                
> 	   |                
> 	   |                
> 	T2-+- B              
> 	   |                
> 	   |                
> 	   |                
> 	T3-+- C              
> 	   |                
> 	   |                
> 	   |                
> 	T4-+- D              
> 	   |                
> 	   |                
> 	   |                
> 	   v                
> 
> The timer is scheduled to expire in T1, because this is when address A
> perishes.
> 
> If you delete a non-perishable address, running addrconf_verify_rtnl is
> redundant, since it won't change the fact that the timer expires in T1.
> 
> If you delete A specifically, which is the cause of scheduling the timer
> to T1, you have 2 options:
> 
>  1. Pay K now, in T0, to reschedule the timer to T2
>  2. Pay nothing now, let the timer expire, pay the K in T1, and then reschedule

It looks like in this specific case option 1 will be cheaper, as it will
avoid an unneeded timer expiration.

> If we're talking about a deleting A, it seems equivalent in cost and result.
> Either way, exactly one K is paid, and the time eventually gets rescheduled to T2.
> 
> If we're talking about deleting an arbitrary address, using option 2 is
> better, since you do not lose functionaility, but you might be saving some
> Ks. (If you deleted B, the timer won't be rescheduled anyway)
> 
> If we're talking about deleting multiple/many address in a short time,
> option 2 is greatly preferable, since paying K for each address can get
> costly as the hash-table grows.

Makes sense to me.

>>> But calling "verify" immediately will result in a performance hit when
>>> deleting many addresses.
>>
>> Since this is about (control plane) performances, please include the
>> relevant test details (or even better, please add a small/fast self-test
>> covering the use-case).
> 
> Is it common to add scale-test to selftests?

AFAIK, not common at all. Note that the argument "so self-test for this
kind of thing" is actually a very good argument to add a self-tests.

> From my limited experience these tend to fail in automation for no good reason,
> though I feel I may have misunderstood the text in parens.
> I've added a link to the benchmark below.
> 
> Regarding the original test case:
> 
> We're developing a core-router and trying to scale-up to around 12K VLANs.
> Considering GUA+LLA this means 24K address in this table.
> In practice it's a bit more than that, due to other interfaces in the same
> namespace.
> 
> This makes addrconf_verify_rtnl very very very expensive for us.
> When initially setting the system up after boot, or when applying big
> configuration changes,
> adding addresses quickly slows down, as each added address has to pay
> for its predecesors. (all of our addresses are static)
> 
> On the reverse, when the VLANs' parent link goes down, the VLANs
> go down with it, causing us to pay for a lot of addrconf_verify_rtnl calls,
> during which rtnl_lock is held for a single long stretch of time.
> 
> I've ran some perf on an upatched kernel to demonstrate it:
> 
> 	https://github.com/gnaaman-dn/perf-addrconf-verify-rtnl
> 
> Turned out to be 13% of time when creating static addresses, and 18%
> when flushing them.
> 
> (In our original bug the VLANs were deleted, it is just easier to perf
> one iproute command if it's a flush)

Nice, so you already have the test infra ready :)

>>> @@ -3148,7 +3164,6 @@ static int inet6_addr_del(struct net *net, int ifindex, u32 ifa_flags,
>>>  			    (ifp->flags & IFA_F_MANAGETEMPADDR))
>>>  				delete_tempaddrs(idev, ifp);
>>>  
>>> -			addrconf_verify_rtnl(net);
>>
>> With an additional 'addrconf_perishable' check here protecting the (here
>> removed) addrconf_verify_rtnl(), the patch will be IMHO much less prone
>> to unintended side-effects.
> 
> I hope my explanation will be convincing that that is not needed.
> (or just coherent enough to point out a mistake in my understanding)
> 
> If not, I will send V3 with this condition added, as in practice most
> of our addresses are static.

>From my PoV, the main trouble is that this kind of change has the
potential of breaking things in subtle way. Your explanation above makes
sense to me, but I have the gut feeling I'm missing something.

A more verbose description will help for future memory.

Making the patch more straight-forward will give IMHO additional assurances.

A self-test could help tracking/detecting side-effects introduced here.

I think at least one of the above is needed, the more the better ;) Note
that the more verbose description could come in a form of a Link tag
pointing to this conversation.

Thanks,

Paolo