[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADvbK_e_Etot3nzMC=FEt-cqoWfnER4SVOC5dOm6aH43iME1iA@mail.gmail.com>
Date: Sun, 6 Oct 2024 14:25:25 -0400
From: Xin Long <lucien.xin@...il.com>
To: Jiri Wiesner <jwiesner@...e.de>
Cc: Eric Dumazet <edumazet@...gle.com>, netdev@...r.kernel.org,
David Ahern <dsahern@...nel.org>, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
"David S. Miller" <davem@...emloft.net>
Subject: Re: [RFC PATCH] ipv6: route: release reference of dsts cached in sockets
On Thu, Oct 3, 2024 at 1:01 PM Jiri Wiesner <jwiesner@...e.de> wrote:
>
> On Wed, Oct 02, 2024 at 04:27:55PM -0400, Xin Long wrote:
> > On Tue, Oct 1, 2024 at 11:26 AM Jiri Wiesner <jwiesner@...e.de> wrote:
> > > I am afraid this patch is misguided. I would still like to find the source of the dst leak but I am also running out of time which the customer is willing to invest into investigating this issue.
> > Is your kernel including this commit?
> >
> > commit 28044fc1d4953b07acec0da4d2fc4784c57ea6fb
> > Author: Joanne Koong <joannelkoong@...il.com>
> > Date: Mon Aug 22 11:10:21 2022 -0700
> >
> > net: Add a bhash2 table hashed by port and address
> >
> > After this commit, it seems in tcp_v6_connect(), the 'goto failure'
> > may cause a dst leak.:
> >
> > dst = ip6_dst_lookup_flow(net, sk, &fl6, final_p);
> > ...
> > if (!saddr) {
> > saddr = &fl6.saddr;
> >
> > err = inet_bhash2_update_saddr(sk, saddr, AF_INET6);
> > if (err)
> > goto failure; <---
> > }
> > ...
> > ip6_dst_store(sk, dst, NULL, NULL);
>
> Thanks for pointing this out. 28044fc1d495 seems to be an interesting commit as far as the number of Fixes is concerned. The commit was not backported to the 5.14-based SLES kernels, for which the unbalaced refcount bug was reported. The commit is part of the 6.4-based SLES kernels so I will have to see if all the patches with Fixes tags have been backported.
> J.
Hi, Jiri,
We recently also encountered this
'unregister_netdevice: waiting for lo to become free. Usage count = X'
problem on our customer env after backporting
Commit 92f1655aa2b22 ("net: fix __dst_negative_advice() race"). [1]
The commit looks correct to me, so I guess it may uncover some existing
issues.
As it took a very long time to get reproduced on our customer env, which
made it impossible to debug. Also the issue existed even after
disabling IPv6.
It seems much easier to reproduce it on your customer env. So I'm wondering
- Was the testing on your customer env related to IPv6 ?
- Does the issue still exist after reverting the commit [1] ?
Thanks.
Powered by blists - more mailing lists