[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAVpQUDxNL7uQWmJLyy3FLJoTa53N3zam2CqxZc-5CGkVhxVbg@mail.gmail.com>
Date: Tue, 11 Nov 2025 19:02:47 -0800
From: Kuniyuki Iwashima <kuniyu@...gle.com>
To: "Sverdlin, Alexander" <alexander.sverdlin@...mens.com>
Cc: "edumazet@...gle.com" <edumazet@...gle.com>, "kuni1840@...il.com" <kuni1840@...il.com>,
"davem@...emloft.net" <davem@...emloft.net>, "kuba@...nel.org" <kuba@...nel.org>,
"dsahern@...nel.org" <dsahern@...nel.org>, "pabeni@...hat.com" <pabeni@...hat.com>,
"horms@...nel.org" <horms@...nel.org>, "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: [PATCH v3 net-next 07/15] ipv6: mcast: Don't hold RTNL for
IPV6_DROP_MEMBERSHIP and MCAST_LEAVE_GROUP.
On Tue, Nov 11, 2025 at 4:44 AM Sverdlin, Alexander
<alexander.sverdlin@...mens.com> wrote:
>
> Hello Kuniyuki,
>
> On Wed, 2025-07-02 at 16:01 -0700, Kuniyuki Iwashima wrote:
> > From: Kuniyuki Iwashima <kuniyu@...gle.com>
> >
> > In __ipv6_sock_mc_drop(), per-socket mld data is protected by lock_sock(),
> > and only __dev_get_by_index() and __in6_dev_get() require RTNL.
> >
> > Let's use dev_get_by_index() and in6_dev_get() and drop RTNL for
> > IPV6_ADD_MEMBERSHIP and MCAST_JOIN_GROUP.
> >
> > Note that __ipv6_sock_mc_drop() is factorised to reuse in the next patch.
> >
> > Signed-off-by: Kuniyuki Iwashima <kuniyu@...gle.com>
> > Reviewed-by: Eric Dumazet <edumazet@...gle.com>
> > ---
> > net/ipv6/ipv6_sockglue.c | 2 --
> > net/ipv6/mcast.c | 47 +++++++++++++++++++++++-----------------
> > 2 files changed, 27 insertions(+), 22 deletions(-)
> >
> > diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
> > index cb0dc885cbe4..c8892d54821f 100644
> > --- a/net/ipv6/ipv6_sockglue.c
> > +++ b/net/ipv6/ipv6_sockglue.c
> > @@ -121,10 +121,8 @@ static bool setsockopt_needs_rtnl(int optname)
> > {
> > switch (optname) {
> > case IPV6_ADDRFORM:
> > - case IPV6_DROP_MEMBERSHIP:
> > case IPV6_JOIN_ANYCAST:
> > case IPV6_LEAVE_ANYCAST:
> > - case MCAST_LEAVE_GROUP:
> > case MCAST_JOIN_SOURCE_GROUP:
> > case MCAST_LEAVE_SOURCE_GROUP:
> > case MCAST_BLOCK_SOURCE:
> > diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
> > index d55c1cb4189a..ed40f5b132ae 100644
> > --- a/net/ipv6/mcast.c
> > +++ b/net/ipv6/mcast.c
> > @@ -253,14 +253,36 @@ int ipv6_sock_mc_join_ssm(struct sock *sk, int ifindex,
> > /*
> > * socket leave on multicast group
> > */
> > +static void __ipv6_sock_mc_drop(struct sock *sk, struct ipv6_mc_socklist *mc_lst)
> > +{
> > + struct net *net = sock_net(sk);
> > + struct net_device *dev;
> > +
> > + dev = dev_get_by_index(net, mc_lst->ifindex);
> > + if (dev) {
> > + struct inet6_dev *idev = in6_dev_get(dev);
> > +
> > + ip6_mc_leave_src(sk, mc_lst, idev);
> > +
> > + if (idev) {
> > + __ipv6_dev_mc_dec(idev, &mc_lst->addr);
> > + in6_dev_put(idev);
> > + }
> > +
> > + dev_put(dev);
> > + } else {
> > + ip6_mc_leave_src(sk, mc_lst, NULL);
> > + }
> > +
> > + atomic_sub(sizeof(*mc_lst), &sk->sk_omem_alloc);
> > + kfree_rcu(mc_lst, rcu);
> > +}
> > +
> > int ipv6_sock_mc_drop(struct sock *sk, int ifindex, const struct in6_addr *addr)
> > {
> > struct ipv6_pinfo *np = inet6_sk(sk);
> > - struct ipv6_mc_socklist *mc_lst;
> > struct ipv6_mc_socklist __rcu **lnk;
> > - struct net *net = sock_net(sk);
> > -
> > - ASSERT_RTNL();
> > + struct ipv6_mc_socklist *mc_lst;
> >
> > if (!ipv6_addr_is_multicast(addr))
> > return -EINVAL;
> > @@ -270,23 +292,8 @@ int ipv6_sock_mc_drop(struct sock *sk, int ifindex, const struct in6_addr *addr)
> > lnk = &mc_lst->next) {
> > if ((ifindex == 0 || mc_lst->ifindex == ifindex) &&
> > ipv6_addr_equal(&mc_lst->addr, addr)) {
> > - struct net_device *dev;
> > -
> > *lnk = mc_lst->next;
> > -
> > - dev = __dev_get_by_index(net, mc_lst->ifindex);
> > - if (dev) {
> > - struct inet6_dev *idev = __in6_dev_get(dev);
> > -
> > - ip6_mc_leave_src(sk, mc_lst, idev);
> > - if (idev)
> > - __ipv6_dev_mc_dec(idev, &mc_lst->addr);
> > - } else {
> > - ip6_mc_leave_src(sk, mc_lst, NULL);
> > - }
> > -
> > - atomic_sub(sizeof(*mc_lst), &sk->sk_omem_alloc);
> > - kfree_rcu(mc_lst, rcu);
> > + __ipv6_sock_mc_drop(sk, mc_lst);
> > return 0;
> > }
> > }
>
> I'm getting the below stack, though unreliably, during
> kernel-selftest/drivers/net/dsa/local_termination.sh runs with different new-next
> revisions based on v6.18-rcX:
>
> RTNL: assertion failed at git/net/core/dev.c (9477)
> WARNING: CPU: 1 PID: 527 at git/net/core/dev.c:9477 __dev_set_promiscuity+0x1d0/0x1e0
> pc : __dev_set_promiscuity+0x1d0/0x1e0
> Call trace:
> __dev_set_promiscuity+0x1d0/0x1e0 (P)
> __dev_set_rx_mode+0xf8/0x118
> igmp6_group_dropped+0x1e8/0x618
> __ipv6_dev_mc_dec+0x164/0x1d0
> ipv6_sock_mc_drop+0x1ac/0x1e0
> do_ipv6_setsockopt+0x1990/0x1e58
> ipv6_setsockopt+0x74/0x100
> udpv6_setsockopt+0x28/0x58
> sock_common_setsockopt+0x7c/0xa0
> do_sock_setsockopt+0xf8/0x250
> __sys_setsockopt+0xa8/0x130
> __arm64_sys_setsockopt+0x70/0x98
> invoke_syscall+0x68/0x190
> el0_svc_common.constprop.0+0x11c/0x150
> do_el0_svc+0x38/0x50
> el0_svc+0x4c/0x1e8
> el0t_64_sync_handler+0xa0/0xe8
> el0t_64_sync+0x198/0x1a0
> irq event stamp: 138641
> hardirqs last enabled at (138640): [<ffff80008013da14>] __up_console_sem+0x74/0x90
> hardirqs last disabled at (138641): [<ffff800081823ee8>] el1_brk64+0x20/0x58
> softirqs last enabled at (138610): [<ffff800081144814>] lock_sock_nested+0x8c/0xb8
>
Thanks for the report!
> Do you have an idea what could be forgotten in the Subject patch?
>
> Do we need to drop ASSERT_RTNL() from __dev_set_promiscuity() now, or am I
> too naive?
hmm.. ASSERT_RTNL() is still needed there given not all callers
hold netdev_lock() and ndo_change_rx_flags() could nest the call.
But let me think if we can do something better than reverting it.
Powered by blists - more mailing lists