[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <95a98eea-a6bf-423e-9ee6-9be784138b60@suse.com>
Date: Fri, 7 Jun 2024 10:33:33 +0200
From: Petr Pavlu <petr.pavlu@...e.com>
To: Paolo Abeni <pabeni@...hat.com>
Cc: Kuifeng Lee <sinquersw@...il.com>, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH net] net/ipv6: Fix the RT cache flush via sysctl using a
previous delay
On 6/4/24 10:30, Paolo Abeni wrote:
> On Fri, 2024-05-31 at 10:53 +0200, Petr Pavlu wrote:
>> [Added back netdev@...r.kernel.org and linux-kernel@...r.kernel.org
>> which seem to be dropped by accident.]
>>
>> On 5/30/24 17:59, Kuifeng Lee wrote:
>>> On Wed, May 29, 2024 at 6:53 AM Petr Pavlu <petr.pavlu@...e.com> wrote:
>>>>
>>>> The net.ipv6.route.flush system parameter takes a value which specifies
>>>> a delay used during the flush operation for aging exception routes. The
>>>> written value is however not used in the currently requested flush and
>>>> instead utilized only in the next one.
>>>>
>>>> A problem is that ipv6_sysctl_rtcache_flush() first reads the old value
>>>> of net->ipv6.sysctl.flush_delay into a local delay variable and then
>>>> calls proc_dointvec() which actually updates the sysctl based on the
>>>> provided input.
>>>
>>> If the problem we are trying to fix is using the old value, should we move
>>> the line reading the value to a place after updating it instead of a
>>> local copy of
>>> the whole ctl_table?
>>
>> Just moving the read of net->ipv6.sysctl.flush_delay after the
>> proc_dointvec() call was actually my initial implementation. I then
>> opted for the proposed version because it looked useful to me to save
>> memory used to store net->ipv6.sysctl.flush_delay.
>
> Note that due to alignment, the struct netns_sysctl_ipv6 size is not
> going to change on 64 bits build.
>
> And if the layout would change, that could have subtle performance side
> effects (moving later fields in netns_sysctl_ipv6 in different
> cachelines) that we want to avoid for a net patch.
>
>> Another minor aspect is that these sysctl writes are not serialized. Two
>> invocations of ipv6_sysctl_rtcache_flush() could in theory occur at the
>> same time. It can then happen that they both first execute
>> proc_dointvec(). One of them ends up slower and thus its value gets
>> stored in net->ipv6.sysctl.flush_delay. Both runs then return to
>> ipv6_sysctl_rtcache_flush(), read the stored value and execute
>> fib6_run_gc(). It means one of them calls this function with a value
>> different that it was actually given on input. By having a purely local
>> variable, each write is independent and fib6_run_gc() is executed with
>> the right input delay.
>>
>> The cost of making a copy of ctl_table is a few instructions and this
>> isn't on any hot path. The same pattern is used, for example, in
>> net/ipv6/addrconf.c, function addrconf_sysctl_forward().
>>
>> So overall, the proposed version looked marginally better to me than
>> just moving the read of net->ipv6.sysctl.flush_delay later in
>> ipv6_sysctl_rtcache_flush().
>
> All in all the increased complexity vs the simple solution does not
> look worth to me.
>
> Please revert to the initial/simpler implementation for this fix,
> thanks!
Fair enough, I'll post v2 with the initial/simpler version.
Thanks,
Petr
Powered by blists - more mailing lists