[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <111ad356a137d0b69550cd73ff0cdef915c16e2e.camel@redhat.com>
Date: Tue, 04 Jun 2024 10:30:55 +0200
From: Paolo Abeni <pabeni@...hat.com>
To: Petr Pavlu <petr.pavlu@...e.com>, Kuifeng Lee <sinquersw@...il.com>
Cc: netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH net] net/ipv6: Fix the RT cache flush via sysctl using a
previous delay
On Fri, 2024-05-31 at 10:53 +0200, Petr Pavlu wrote:
> [Added back netdev@...r.kernel.org and linux-kernel@...r.kernel.org
> which seem to be dropped by accident.]
>
> On 5/30/24 17:59, Kuifeng Lee wrote:
> > On Wed, May 29, 2024 at 6:53 AM Petr Pavlu <petr.pavlu@...e.com> wrote:
> > >
> > > The net.ipv6.route.flush system parameter takes a value which specifies
> > > a delay used during the flush operation for aging exception routes. The
> > > written value is however not used in the currently requested flush and
> > > instead utilized only in the next one.
> > >
> > > A problem is that ipv6_sysctl_rtcache_flush() first reads the old value
> > > of net->ipv6.sysctl.flush_delay into a local delay variable and then
> > > calls proc_dointvec() which actually updates the sysctl based on the
> > > provided input.
> >
> > If the problem we are trying to fix is using the old value, should we move
> > the line reading the value to a place after updating it instead of a
> > local copy of
> > the whole ctl_table?
>
> Just moving the read of net->ipv6.sysctl.flush_delay after the
> proc_dointvec() call was actually my initial implementation. I then
> opted for the proposed version because it looked useful to me to save
> memory used to store net->ipv6.sysctl.flush_delay.
Note that due to alignment, the struct netns_sysctl_ipv6 size is not
going to change on 64 bits build.
And if the layout would change, that could have subtle performance side
effects (moving later fields in netns_sysctl_ipv6 in different
cachelines) that we want to avoid for a net patch.
> Another minor aspect is that these sysctl writes are not serialized. Two
> invocations of ipv6_sysctl_rtcache_flush() could in theory occur at the
> same time. It can then happen that they both first execute
> proc_dointvec(). One of them ends up slower and thus its value gets
> stored in net->ipv6.sysctl.flush_delay. Both runs then return to
> ipv6_sysctl_rtcache_flush(), read the stored value and execute
> fib6_run_gc(). It means one of them calls this function with a value
> different that it was actually given on input. By having a purely local
> variable, each write is independent and fib6_run_gc() is executed with
> the right input delay.
>
> The cost of making a copy of ctl_table is a few instructions and this
> isn't on any hot path. The same pattern is used, for example, in
> net/ipv6/addrconf.c, function addrconf_sysctl_forward().
>
> So overall, the proposed version looked marginally better to me than
> just moving the read of net->ipv6.sysctl.flush_delay later in
> ipv6_sysctl_rtcache_flush().
All in all the increased complexity vs the simple solution does not
look worth to me.
Please revert to the initial/simpler implementation for this fix,
thanks!
Paolo
Powered by blists - more mailing lists