lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <cbd56289-c9e9-4cd1-87d8-623ae7e39347@suse.com>
Date: Fri, 31 May 2024 10:53:11 +0200
From: Petr Pavlu <petr.pavlu@...e.com>
To: Kuifeng Lee <sinquersw@...il.com>
Cc: netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH net] net/ipv6: Fix the RT cache flush via sysctl using a
 previous delay

[Added back netdev@...r.kernel.org and linux-kernel@...r.kernel.org
which seem to be dropped by accident.]

On 5/30/24 17:59, Kuifeng Lee wrote:
> On Wed, May 29, 2024 at 6:53 AM Petr Pavlu <petr.pavlu@...e.com> wrote:
>>
>> The net.ipv6.route.flush system parameter takes a value which specifies
>> a delay used during the flush operation for aging exception routes. The
>> written value is however not used in the currently requested flush and
>> instead utilized only in the next one.
>>
>> A problem is that ipv6_sysctl_rtcache_flush() first reads the old value
>> of net->ipv6.sysctl.flush_delay into a local delay variable and then
>> calls proc_dointvec() which actually updates the sysctl based on the
>> provided input.
> 
> If the problem we are trying to fix is using the old value, should we move
> the line reading the value to a place after updating it instead of a
> local copy of
> the whole ctl_table?

Just moving the read of net->ipv6.sysctl.flush_delay after the
proc_dointvec() call was actually my initial implementation. I then
opted for the proposed version because it looked useful to me to save
memory used to store net->ipv6.sysctl.flush_delay.

Another minor aspect is that these sysctl writes are not serialized. Two
invocations of ipv6_sysctl_rtcache_flush() could in theory occur at the
same time. It can then happen that they both first execute
proc_dointvec(). One of them ends up slower and thus its value gets
stored in net->ipv6.sysctl.flush_delay. Both runs then return to
ipv6_sysctl_rtcache_flush(), read the stored value and execute
fib6_run_gc(). It means one of them calls this function with a value
different that it was actually given on input. By having a purely local
variable, each write is independent and fib6_run_gc() is executed with
the right input delay.

The cost of making a copy of ctl_table is a few instructions and this
isn't on any hot path. The same pattern is used, for example, in
net/ipv6/addrconf.c, function addrconf_sysctl_forward().

So overall, the proposed version looked marginally better to me than
just moving the read of net->ipv6.sysctl.flush_delay later in
ipv6_sysctl_rtcache_flush().

Thanks,
Petr

> 
>>
>> Fix the problem by removing net->ipv6.sysctl.flush_delay because the
>> value is never actually used after the flush operation and instead use
>> a temporary ctl_table in ipv6_sysctl_rtcache_flush() pointing directly
>> to the local delay variable.
>>
>> Fixes: 4990509f19e8 ("[NETNS][IPV6]: Make sysctls route per namespace.")
>> Signed-off-by: Petr Pavlu <petr.pavlu@...e.com>
>> ---
>>
>> Note that when testing this fix, I noticed that an aging exception route
>> (created via ICMP redirect) was not getting removed when triggering the
>> flush operation unless the associated fib6_info was an expiring route.
>> It looks the logic introduced in 5eb902b8e719 ("net/ipv6: Remove expired
>> routes with a separated list of routes.") otherwise missed registering
>> the fib6_info with the GC. That is potentially a separate issue, just
>> adding it here in case someone decides to test this patch and possibly
>> run into this problem too.
>>
>>  include/net/netns/ipv6.h |  1 -
>>  net/ipv6/route.c         | 13 ++++++-------
>>  2 files changed, 6 insertions(+), 8 deletions(-)
>>
>> diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h
>> index 5f2cfd84570a..2ed7659013a4 100644
>> --- a/include/net/netns/ipv6.h
>> +++ b/include/net/netns/ipv6.h
>> @@ -20,7 +20,6 @@ struct netns_sysctl_ipv6 {
>>         struct ctl_table_header *frags_hdr;
>>         struct ctl_table_header *xfrm6_hdr;
>>  #endif
>> -       int flush_delay;
>>         int ip6_rt_max_size;
>>         int ip6_rt_gc_min_interval;
>>         int ip6_rt_gc_timeout;
>> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
>> index bbc2a0dd9314..f07f050003c3 100644
>> --- a/net/ipv6/route.c
>> +++ b/net/ipv6/route.c
>> @@ -6335,15 +6335,17 @@ static int rt6_stats_seq_show(struct seq_file *seq, void *v)
>>  static int ipv6_sysctl_rtcache_flush(struct ctl_table *ctl, int write,
>>                               void *buffer, size_t *lenp, loff_t *ppos)
>>  {
>> -       struct net *net;
>> +       struct net *net = ctl->extra1;
>> +       struct ctl_table lctl;
>>         int delay;
>>         int ret;
>> +
>>         if (!write)
>>                 return -EINVAL;
>>
>> -       net = (struct net *)ctl->extra1;
>> -       delay = net->ipv6.sysctl.flush_delay;
>> -       ret = proc_dointvec(ctl, write, buffer, lenp, ppos);
>> +       lctl = *ctl;
>> +       lctl.data = &delay;
>> +       ret = proc_dointvec(&lctl, write, buffer, lenp, ppos);
>>         if (ret)
>>                 return ret;
>>
>> @@ -6368,7 +6370,6 @@ static struct ctl_table ipv6_route_table_template[] = {
>>         },
>>         {
>>                 .procname       =       "flush",
>> -               .data           =       &init_net.ipv6.sysctl.flush_delay,
>>                 .maxlen         =       sizeof(int),
>>                 .mode           =       0200,
>>                 .proc_handler   =       ipv6_sysctl_rtcache_flush
>> @@ -6444,7 +6445,6 @@ struct ctl_table * __net_init ipv6_route_sysctl_init(struct net *net)
>>         if (table) {
>>                 table[0].data = &net->ipv6.sysctl.ip6_rt_max_size;
>>                 table[1].data = &net->ipv6.ip6_dst_ops.gc_thresh;
>> -               table[2].data = &net->ipv6.sysctl.flush_delay;
>>                 table[2].extra1 = net;
>>                 table[3].data = &net->ipv6.sysctl.ip6_rt_gc_min_interval;
>>                 table[4].data = &net->ipv6.sysctl.ip6_rt_gc_timeout;
>> @@ -6521,7 +6521,6 @@ static int __net_init ip6_route_net_init(struct net *net)
>>  #endif
>>  #endif
>>
>> -       net->ipv6.sysctl.flush_delay = 0;
>>         net->ipv6.sysctl.ip6_rt_max_size = INT_MAX;
>>         net->ipv6.sysctl.ip6_rt_gc_min_interval = HZ / 2;
>>         net->ipv6.sysctl.ip6_rt_gc_timeout = 60*HZ;
>>
>> base-commit: 2bfcfd584ff5ccc8bb7acde19b42570414bf880b
>> --
>> 2.35.3
>>
>>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ