[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAAVpQUDbYnzyBuN5SS9bSTfJDvVH18QNXrXGTYzZRb3Cgk=mQg@mail.gmail.com>
Date: Wed, 28 Jan 2026 09:13:59 -0800
From: Kuniyuki Iwashima <kuniyu@...gle.com>
To: Qiliang Yuan <realwujing@...il.com>
Cc: brauner@...nel.org, davem@...emloft.net, edumazet@...gle.com,
horms@...nel.org, jlayton@...nel.org, kuba@...nel.org,
linux-kernel@...r.kernel.org, netdev@...r.kernel.org, pabeni@...hat.com,
sd@...asysnail.net, yuanql9@...natelecom.cn
Subject: Re: [PATCH net-next v3] netns: optimize netns cleaning by batching
unhash_nsid calls
On Wed, Jan 28, 2026 at 4:19 AM Qiliang Yuan <realwujing@...il.com> wrote:
>
> Hi Kuniyuki,
>
> On Tue, Jan 27, 2026 at 6:05 PM Kuniyuki Iwashima <kuniyu@...gle.com> wrote:
> >
> > On Tue, Jan 27, 2026 at 5:22 PM Qiliang Yuan <realwujing@...il.com> wrote:
> > >
> > > Currently, unhash_nsid() scans the entire net_namespace_list for each
> > > netns in a destruction batch during cleanup_net(). This leads to
> > > O(M_batch * N_system * M_nsids) complexity, where M_batch is the
> > > destruction batch size, N_system is the total number of namespaces,
> > > and M_nsids is the number of IDs in each IDR.
> > >
> > > Reduce the complexity to O(N_system * M_nsids) by introducing an
> > > 'is_dying' flag to mark namespaces being destroyed. This allows
> > > unhash_nsid() to perform a single-pass traversal over the system's
> > > namespaces. In this pass, for each survivor namespace, iterate
> > > through its netns_ids and remove any mappings that point to a marked
> > > namespace, effectively eliminating the M_batch multiplier.
> > >
> > > Signed-off-by: Qiliang Yuan <realwujing@...il.com>
> > > Signed-off-by: Qiliang Yuan <yuanql9@...natelecom.cn>
> >
> > Why two SOBs with the same person ?
>
> - Signed-off-by: Qiliang Yuan <realwujing@...il.com> (Personal email)
> - Signed-off-by: Qiliang Yuan <yuanql9@...natelecom.cn> (Work email)
>
> My work email often has trouble receiving external mailing list replies,
> so I've included both to ensure I don't miss any feedback and to
> properly attribute the work. The v8 version should have everything
> matching correctly now.
You can just CC two of them and keep one SOB.
>
> > > diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
> > > index a6e6a964a287..50fdd4f9bb3b 100644
> > > --- a/net/core/net_namespace.c
> > > +++ b/net/core/net_namespace.c
> > > @@ -624,9 +624,10 @@ void net_ns_get_ownership(const struct net *net, kuid_t *uid, kgid_t *gid)
> > > }
> > > EXPORT_SYMBOL_GPL(net_ns_get_ownership);
> > >
> > > -static void unhash_nsid(struct net *net, struct net *last)
> > > +static void unhash_nsid(struct net *last)
> > > {
> > > struct net *tmp;
> > > +
> > > /* This function is only called from cleanup_net() work,
> > > * and this work is the only process, that may delete
> > > * a net from net_namespace_list. So, when the below
> > > @@ -636,20 +637,34 @@ static void unhash_nsid(struct net *net, struct net *last)
> > > for_each_net(tmp) {
> > > int id;
> > >
> > > - spin_lock(&tmp->nsid_lock);
> > > - id = __peernet2id(tmp, net);
> > > - if (id >= 0)
> > > - idr_remove(&tmp->netns_ids, id);
> > > - spin_unlock(&tmp->nsid_lock);
> > > - if (id >= 0)
> > > - rtnl_net_notifyid(tmp, RTM_DELNSID, id, 0, NULL,
> > > - GFP_KERNEL);
> > > + for (id = 0; ; id++) {
> >
> > Doesn't this rather slow down in a common case where
> > init_net has ids for other netns since it is never dismantled ?
>
> Yes, you're right. In the original code, we only scanned 'tmp' for specific 'net'
> which was being killed. Now we are scanning all IDs in 'tmp' to find any dying
> peers.
>
> If 'tmp' (like init_net) has many long-lived netns IDs, we end up iterating through
> them even if none of them are dying.
>
> To address this and avoid the overhead, I can use idr_for_each() with a callback
> to find and collect dying IDs,
idr_for_each() sounds better to me.
If we replace list_del_rcu(&net->list); with
list_del_init_rcu(&net->list);, we can check net->list.pprev
instead of adding dying_net, which is a bit racy since
idr_for_each() could return a net which would have been
processed in the next cleanup_net() invocation.
> or keep the O(M_batch) outer loop but optimize the
> inner part if it's truly problematic.
>
> However, given that this is the cleanup path, I thought the batching benefit
> (N_system vs M_batch * N_system) would outweigh the per-netns IDR scan.
>
> I'll revert to a more efficient iteration or use idr_for_each() to handle this
> gracefully in v4.
>
> Thanks,
> Qiliang
Powered by blists - more mailing lists