lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <daa3c21fe3580fc1905734143e47ac3323ee90a4.camel@siemens.com>
Date: Wed, 12 Nov 2025 08:02:14 +0000
From: "Sverdlin, Alexander" <alexander.sverdlin@...mens.com>
To: "kuniyu@...gle.com" <kuniyu@...gle.com>
CC: "davem@...emloft.net" <davem@...emloft.net>, "dsahern@...nel.org"
	<dsahern@...nel.org>, "kuni1840@...il.com" <kuni1840@...il.com>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>, "pabeni@...hat.com"
	<pabeni@...hat.com>, "horms@...nel.org" <horms@...nel.org>,
	"edumazet@...gle.com" <edumazet@...gle.com>, "kuba@...nel.org"
	<kuba@...nel.org>
Subject: Re: [PATCH v3 net-next 07/15] ipv6: mcast: Don't hold RTNL for
 IPV6_DROP_MEMBERSHIP and MCAST_LEAVE_GROUP.

Hi Kuniyuki,

On Tue, 2025-11-11 at 19:02 -0800, Kuniyuki Iwashima wrote:
> > > From: Kuniyuki Iwashima <kuniyu@...gle.com>
> > > 
> > > In __ipv6_sock_mc_drop(), per-socket mld data is protected by lock_sock(),
> > > and only __dev_get_by_index() and __in6_dev_get() require RTNL.
> > > 
> > > Let's use dev_get_by_index() and in6_dev_get() and drop RTNL for
> > > IPV6_ADD_MEMBERSHIP and MCAST_JOIN_GROUP.
> > > 
> > > Note that __ipv6_sock_mc_drop() is factorised to reuse in the next patch.
> > > 
> > > Signed-off-by: Kuniyuki Iwashima <kuniyu@...gle.com>
> > > Reviewed-by: Eric Dumazet <edumazet@...gle.com>
> > > ---
> > >   net/ipv6/ipv6_sockglue.c |  2 --
> > >   net/ipv6/mcast.c         | 47 +++++++++++++++++++++++-----------------
> > >   2 files changed, 27 insertions(+), 22 deletions(-)
> > > 
> > > diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
> > > index cb0dc885cbe4..c8892d54821f 100644
> > > --- a/net/ipv6/ipv6_sockglue.c
> > > +++ b/net/ipv6/ipv6_sockglue.c
> > > @@ -121,10 +121,8 @@ static bool setsockopt_needs_rtnl(int optname)
> > >   {
> > >        switch (optname) {
> > >        case IPV6_ADDRFORM:
> > > -     case IPV6_DROP_MEMBERSHIP:
> > >        case IPV6_JOIN_ANYCAST:
> > >        case IPV6_LEAVE_ANYCAST:
> > > -     case MCAST_LEAVE_GROUP:
> > >        case MCAST_JOIN_SOURCE_GROUP:
> > >        case MCAST_LEAVE_SOURCE_GROUP:
> > >        case MCAST_BLOCK_SOURCE:
> > > diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
> > > index d55c1cb4189a..ed40f5b132ae 100644
> > > --- a/net/ipv6/mcast.c
> > > +++ b/net/ipv6/mcast.c
> > > @@ -253,14 +253,36 @@ int ipv6_sock_mc_join_ssm(struct sock *sk, int ifindex,
> > >   /*
> > >    *   socket leave on multicast group
> > >    */
> > > +static void __ipv6_sock_mc_drop(struct sock *sk, struct ipv6_mc_socklist *mc_lst)
> > > +{
> > > +     struct net *net = sock_net(sk);
> > > +     struct net_device *dev;
> > > +
> > > +     dev = dev_get_by_index(net, mc_lst->ifindex);
> > > +     if (dev) {
> > > +             struct inet6_dev *idev = in6_dev_get(dev);
> > > +
> > > +             ip6_mc_leave_src(sk, mc_lst, idev);
> > > +
> > > +             if (idev) {
> > > +                     __ipv6_dev_mc_dec(idev, &mc_lst->addr);
> > > +                     in6_dev_put(idev);
> > > +             }
> > > +
> > > +             dev_put(dev);
> > > +     } else {
> > > +             ip6_mc_leave_src(sk, mc_lst, NULL);
> > > +     }
> > > +
> > > +     atomic_sub(sizeof(*mc_lst), &sk->sk_omem_alloc);
> > > +     kfree_rcu(mc_lst, rcu);
> > > +}
> > > +
> > >   int ipv6_sock_mc_drop(struct sock *sk, int ifindex, const struct in6_addr *addr)
> > >   {
> > >        struct ipv6_pinfo *np = inet6_sk(sk);
> > > -     struct ipv6_mc_socklist *mc_lst;
> > >        struct ipv6_mc_socklist __rcu **lnk;
> > > -     struct net *net = sock_net(sk);
> > > -
> > > -     ASSERT_RTNL();
> > > +     struct ipv6_mc_socklist *mc_lst;
> > > 
> > >        if (!ipv6_addr_is_multicast(addr))
> > >                return -EINVAL;
> > > @@ -270,23 +292,8 @@ int ipv6_sock_mc_drop(struct sock *sk, int ifindex, const struct in6_addr *addr)
> > >              lnk = &mc_lst->next) {
> > >                if ((ifindex == 0 || mc_lst->ifindex == ifindex) &&
> > >                    ipv6_addr_equal(&mc_lst->addr, addr)) {
> > > -                     struct net_device *dev;
> > > -
> > >                        *lnk = mc_lst->next;
> > > -
> > > -                     dev = __dev_get_by_index(net, mc_lst->ifindex);
> > > -                     if (dev) {
> > > -                             struct inet6_dev *idev = __in6_dev_get(dev);
> > > -
> > > -                             ip6_mc_leave_src(sk, mc_lst, idev);
> > > -                             if (idev)
> > > -                                     __ipv6_dev_mc_dec(idev, &mc_lst->addr);
> > > -                     } else {
> > > -                             ip6_mc_leave_src(sk, mc_lst, NULL);
> > > -                     }
> > > -
> > > -                     atomic_sub(sizeof(*mc_lst), &sk->sk_omem_alloc);
> > > -                     kfree_rcu(mc_lst, rcu);
> > > +                     __ipv6_sock_mc_drop(sk, mc_lst);
> > >                        return 0;
> > >                }
> > >        }
> > 
> > I'm getting the below stack, though unreliably, during
> > kernel-selftest/drivers/net/dsa/local_termination.sh runs with different new-next
> > revisions based on v6.18-rcX:
> > 
> > RTNL: assertion failed at git/net/core/dev.c (9477)
> > WARNING: CPU: 1 PID: 527 at git/net/core/dev.c:9477 __dev_set_promiscuity+0x1d0/0x1e0
> > pc : __dev_set_promiscuity+0x1d0/0x1e0
> > Call trace:
> >   __dev_set_promiscuity+0x1d0/0x1e0 (P)
> >   __dev_set_rx_mode+0xf8/0x118
> >   igmp6_group_dropped+0x1e8/0x618
> >   __ipv6_dev_mc_dec+0x164/0x1d0
> >   ipv6_sock_mc_drop+0x1ac/0x1e0
> >   do_ipv6_setsockopt+0x1990/0x1e58
> >   ipv6_setsockopt+0x74/0x100
> >   udpv6_setsockopt+0x28/0x58
> >   sock_common_setsockopt+0x7c/0xa0
> >   do_sock_setsockopt+0xf8/0x250
> >   __sys_setsockopt+0xa8/0x130
> >   __arm64_sys_setsockopt+0x70/0x98
> >   invoke_syscall+0x68/0x190
> >   el0_svc_common.constprop.0+0x11c/0x150
> >   do_el0_svc+0x38/0x50
> >   el0_svc+0x4c/0x1e8
> >   el0t_64_sync_handler+0xa0/0xe8
> >   el0t_64_sync+0x198/0x1a0
> > irq event stamp: 138641
> > hardirqs last  enabled at (138640): [<ffff80008013da14>] __up_console_sem+0x74/0x90
> > hardirqs last disabled at (138641): [<ffff800081823ee8>] el1_brk64+0x20/0x58
> > softirqs last  enabled at (138610): [<ffff800081144814>] lock_sock_nested+0x8c/0xb8
> > 
> 
> Thanks for the report!
> 
> > Do you have an idea what could be forgotten in the Subject patch?
> > 
> > Do we need to drop ASSERT_RTNL() from __dev_set_promiscuity() now, or am I
> > too naive?
> 
> hmm.. ASSERT_RTNL() is still needed there given not all callers
> hold netdev_lock() and ndo_change_rx_flags() could nest the call.
> 
> But let me think if we can do something better than reverting it.

I'll be happy to test a patch, you definitely have more insights into the different locks
in this area... I was looking into it yesterday, but the netdev_lock_ops() in
dev_set_promiscuity() just doesn't look right as a replacement for __dev_set_promiscuity()
because it's optional (for some reason) and it seems we need some dumb per-net_device lock
here?

-- 
Alexander Sverdlin
Siemens AG
www.siemens.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ