[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <daa3c21fe3580fc1905734143e47ac3323ee90a4.camel@siemens.com>
Date: Wed, 12 Nov 2025 08:02:14 +0000
From: "Sverdlin, Alexander" <alexander.sverdlin@...mens.com>
To: "kuniyu@...gle.com" <kuniyu@...gle.com>
CC: "davem@...emloft.net" <davem@...emloft.net>, "dsahern@...nel.org"
<dsahern@...nel.org>, "kuni1840@...il.com" <kuni1840@...il.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>, "pabeni@...hat.com"
<pabeni@...hat.com>, "horms@...nel.org" <horms@...nel.org>,
"edumazet@...gle.com" <edumazet@...gle.com>, "kuba@...nel.org"
<kuba@...nel.org>
Subject: Re: [PATCH v3 net-next 07/15] ipv6: mcast: Don't hold RTNL for
IPV6_DROP_MEMBERSHIP and MCAST_LEAVE_GROUP.
Hi Kuniyuki,
On Tue, 2025-11-11 at 19:02 -0800, Kuniyuki Iwashima wrote:
> > > From: Kuniyuki Iwashima <kuniyu@...gle.com>
> > >
> > > In __ipv6_sock_mc_drop(), per-socket mld data is protected by lock_sock(),
> > > and only __dev_get_by_index() and __in6_dev_get() require RTNL.
> > >
> > > Let's use dev_get_by_index() and in6_dev_get() and drop RTNL for
> > > IPV6_ADD_MEMBERSHIP and MCAST_JOIN_GROUP.
> > >
> > > Note that __ipv6_sock_mc_drop() is factorised to reuse in the next patch.
> > >
> > > Signed-off-by: Kuniyuki Iwashima <kuniyu@...gle.com>
> > > Reviewed-by: Eric Dumazet <edumazet@...gle.com>
> > > ---
> > > net/ipv6/ipv6_sockglue.c | 2 --
> > > net/ipv6/mcast.c | 47 +++++++++++++++++++++++-----------------
> > > 2 files changed, 27 insertions(+), 22 deletions(-)
> > >
> > > diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
> > > index cb0dc885cbe4..c8892d54821f 100644
> > > --- a/net/ipv6/ipv6_sockglue.c
> > > +++ b/net/ipv6/ipv6_sockglue.c
> > > @@ -121,10 +121,8 @@ static bool setsockopt_needs_rtnl(int optname)
> > > {
> > > switch (optname) {
> > > case IPV6_ADDRFORM:
> > > - case IPV6_DROP_MEMBERSHIP:
> > > case IPV6_JOIN_ANYCAST:
> > > case IPV6_LEAVE_ANYCAST:
> > > - case MCAST_LEAVE_GROUP:
> > > case MCAST_JOIN_SOURCE_GROUP:
> > > case MCAST_LEAVE_SOURCE_GROUP:
> > > case MCAST_BLOCK_SOURCE:
> > > diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
> > > index d55c1cb4189a..ed40f5b132ae 100644
> > > --- a/net/ipv6/mcast.c
> > > +++ b/net/ipv6/mcast.c
> > > @@ -253,14 +253,36 @@ int ipv6_sock_mc_join_ssm(struct sock *sk, int ifindex,
> > > /*
> > > * socket leave on multicast group
> > > */
> > > +static void __ipv6_sock_mc_drop(struct sock *sk, struct ipv6_mc_socklist *mc_lst)
> > > +{
> > > + struct net *net = sock_net(sk);
> > > + struct net_device *dev;
> > > +
> > > + dev = dev_get_by_index(net, mc_lst->ifindex);
> > > + if (dev) {
> > > + struct inet6_dev *idev = in6_dev_get(dev);
> > > +
> > > + ip6_mc_leave_src(sk, mc_lst, idev);
> > > +
> > > + if (idev) {
> > > + __ipv6_dev_mc_dec(idev, &mc_lst->addr);
> > > + in6_dev_put(idev);
> > > + }
> > > +
> > > + dev_put(dev);
> > > + } else {
> > > + ip6_mc_leave_src(sk, mc_lst, NULL);
> > > + }
> > > +
> > > + atomic_sub(sizeof(*mc_lst), &sk->sk_omem_alloc);
> > > + kfree_rcu(mc_lst, rcu);
> > > +}
> > > +
> > > int ipv6_sock_mc_drop(struct sock *sk, int ifindex, const struct in6_addr *addr)
> > > {
> > > struct ipv6_pinfo *np = inet6_sk(sk);
> > > - struct ipv6_mc_socklist *mc_lst;
> > > struct ipv6_mc_socklist __rcu **lnk;
> > > - struct net *net = sock_net(sk);
> > > -
> > > - ASSERT_RTNL();
> > > + struct ipv6_mc_socklist *mc_lst;
> > >
> > > if (!ipv6_addr_is_multicast(addr))
> > > return -EINVAL;
> > > @@ -270,23 +292,8 @@ int ipv6_sock_mc_drop(struct sock *sk, int ifindex, const struct in6_addr *addr)
> > > lnk = &mc_lst->next) {
> > > if ((ifindex == 0 || mc_lst->ifindex == ifindex) &&
> > > ipv6_addr_equal(&mc_lst->addr, addr)) {
> > > - struct net_device *dev;
> > > -
> > > *lnk = mc_lst->next;
> > > -
> > > - dev = __dev_get_by_index(net, mc_lst->ifindex);
> > > - if (dev) {
> > > - struct inet6_dev *idev = __in6_dev_get(dev);
> > > -
> > > - ip6_mc_leave_src(sk, mc_lst, idev);
> > > - if (idev)
> > > - __ipv6_dev_mc_dec(idev, &mc_lst->addr);
> > > - } else {
> > > - ip6_mc_leave_src(sk, mc_lst, NULL);
> > > - }
> > > -
> > > - atomic_sub(sizeof(*mc_lst), &sk->sk_omem_alloc);
> > > - kfree_rcu(mc_lst, rcu);
> > > + __ipv6_sock_mc_drop(sk, mc_lst);
> > > return 0;
> > > }
> > > }
> >
> > I'm getting the below stack, though unreliably, during
> > kernel-selftest/drivers/net/dsa/local_termination.sh runs with different new-next
> > revisions based on v6.18-rcX:
> >
> > RTNL: assertion failed at git/net/core/dev.c (9477)
> > WARNING: CPU: 1 PID: 527 at git/net/core/dev.c:9477 __dev_set_promiscuity+0x1d0/0x1e0
> > pc : __dev_set_promiscuity+0x1d0/0x1e0
> > Call trace:
> > __dev_set_promiscuity+0x1d0/0x1e0 (P)
> > __dev_set_rx_mode+0xf8/0x118
> > igmp6_group_dropped+0x1e8/0x618
> > __ipv6_dev_mc_dec+0x164/0x1d0
> > ipv6_sock_mc_drop+0x1ac/0x1e0
> > do_ipv6_setsockopt+0x1990/0x1e58
> > ipv6_setsockopt+0x74/0x100
> > udpv6_setsockopt+0x28/0x58
> > sock_common_setsockopt+0x7c/0xa0
> > do_sock_setsockopt+0xf8/0x250
> > __sys_setsockopt+0xa8/0x130
> > __arm64_sys_setsockopt+0x70/0x98
> > invoke_syscall+0x68/0x190
> > el0_svc_common.constprop.0+0x11c/0x150
> > do_el0_svc+0x38/0x50
> > el0_svc+0x4c/0x1e8
> > el0t_64_sync_handler+0xa0/0xe8
> > el0t_64_sync+0x198/0x1a0
> > irq event stamp: 138641
> > hardirqs last enabled at (138640): [<ffff80008013da14>] __up_console_sem+0x74/0x90
> > hardirqs last disabled at (138641): [<ffff800081823ee8>] el1_brk64+0x20/0x58
> > softirqs last enabled at (138610): [<ffff800081144814>] lock_sock_nested+0x8c/0xb8
> >
>
> Thanks for the report!
>
> > Do you have an idea what could be forgotten in the Subject patch?
> >
> > Do we need to drop ASSERT_RTNL() from __dev_set_promiscuity() now, or am I
> > too naive?
>
> hmm.. ASSERT_RTNL() is still needed there given not all callers
> hold netdev_lock() and ndo_change_rx_flags() could nest the call.
>
> But let me think if we can do something better than reverting it.
I'll be happy to test a patch, you definitely have more insights into the different locks
in this area... I was looking into it yesterday, but the netdev_lock_ops() in
dev_set_promiscuity() just doesn't look right as a replacement for __dev_set_promiscuity()
because it's optional (for some reason) and it seems we need some dumb per-net_device lock
here?
--
Alexander Sverdlin
Siemens AG
www.siemens.com
Powered by blists - more mailing lists