lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAAVpQUDfftGhVvsiHy999avT=Tz8Gyw-Dwvrd3=rPvU8GyzP2w@mail.gmail.com>
Date: Wed, 2 Jul 2025 00:44:50 -0700
From: Kuniyuki Iwashima <kuniyu@...gle.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: Cong Wang <xiyou.wangcong@...il.com>, "David S . Miller" <davem@...emloft.net>, 
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>, 
	Jamal Hadi Salim <jhs@...atatu.com>, Jiri Pirko <jiri@...nulli.us>, netdev@...r.kernel.org, 
	eric.dumazet@...il.com, Vlad Buslov <vladbu@...dia.com>, 
	Marcelo Ricardo Leitner <marcelo.leitner@...il.com>
Subject: Re: [PATCH net-next] net/sched: acp_api: no longer acquire RTNL in tc_action_net_exit()

On Wed, Jul 2, 2025 at 12:08 AM Eric Dumazet <edumazet@...gle.com> wrote:
>
> On Tue, Jul 1, 2025 at 7:35 PM Cong Wang <xiyou.wangcong@...il.com> wrote:
> >
> > On Tue, Jul 01, 2025 at 01:30:06PM +0000, Eric Dumazet wrote:
> > > tc_action_net_exit() got an rtnl exclusion in commit
> > > a159d3c4b829 ("net_sched: acquire RTNL in tc_action_net_exit()")
> > >
> > > Since then, commit 16af6067392c ("net: sched: implement reference
> > > counted action release") made this RTNL exclusion obsolete.
> >
> > I am not sure removing RTNL is safe even we have action refcnt.
> >
> > For example, are you sure tcf_action_offload_del() is safe to call
> > without RTNL?
>
> My thinking was that at the time of these calls, devices were already
> gone from the dismantling netns,

I thought the same because tcf_register_action() uses
register_pernet_subsys(), and subsys_ops->exit() is always executed
after other device_ops including default_device_exit_batch() that should
wait for all dev to go away.


> but this might be wrong.
>
> We can conditionally acquire rtnl from tcf_idrinfo_destroy() when
> there is at least one offloaded action in the idr.
>
> diff --git a/net/sched/act_api.c b/net/sched/act_api.c
> index 057e20cef3754f33357c4c1e30034f6b9b872d91..9e468e46346710c85c3a85b905d27dfe3972916a
> 100644
> --- a/net/sched/act_api.c
> +++ b/net/sched/act_api.c
> @@ -933,18 +933,25 @@ void tcf_idrinfo_destroy(const struct tc_action_ops *ops,
>                          struct tcf_idrinfo *idrinfo)
>  {
>         struct idr *idr = &idrinfo->action_idr;
> +       bool mutex_taken = false;
>         struct tc_action *p;
> -       int ret;
>         unsigned long id = 1;
>         unsigned long tmp;
> +       int ret;
>
>         idr_for_each_entry_ul(idr, p, tmp, id) {
> +               if (tc_act_in_hw(p) && !mutex_taken) {
> +                       rtnl_lock();
> +                       mutex_taken = true;
> +               }
>                 ret = __tcf_idr_release(p, false, true);
>                 if (ret == ACT_P_DELETED)
>                         module_put(ops->owner);
>                 else if (ret < 0)
>                         return;
>         }
> +       if (mutex_taken)
> +               rtnl_unlock();
>         idr_destroy(&idrinfo->action_idr);
>  }
>  EXPORT_SYMBOL(tcf_idrinfo_destroy);
>
>
> >
> > What are you trying to improve here?
>
> Yeah, some of us are spending months of work to improve the RTNL
> situation, and we do not copy/paste why on every single patch :)
>
> I will capture the following in V2, thanks !
>
> Most netns do not have actions, yet deleting them is adding a lot of
> pressure on RTNL, which is for us the most contended mutex in the
> kernel.
>
> We are moving to a per-netns 'rtnl', so tc_action_net_exit() will not
> be able to grab 'rtnl' a single time for a batch of netns.
>
>
> Before the patch:
>
> perf probe -a rtnl_lock  # Note: This does not capture all calls, some
> of them might be inlined in net/core/rtnetlink.c
>
> perf record -e probe:rtnl_lock -a /bin/bash -c 'unshare -n "/bin/true"; sleep 1'
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.305 MB perf.data (25 samples) ]
>
> After the patch:
>
> perf record -e probe:rtnl_lock -a /bin/bash -c 'unshare -n "/bin/true"; sleep 1'
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.304 MB perf.data (9 samples) ]

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ