lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250316190138.14440-1-kuniyu@amazon.com>
Date: Sun, 16 Mar 2025 12:01:26 -0700
From: Kuniyuki Iwashima <kuniyu@...zon.com>
To: <idosch@...sch.org>, <stfomichev@...il.com>
CC: <andrew+netdev@...n.ch>, <bridge@...ts.linux.dev>, <davem@...emloft.net>,
	<edumazet@...gle.com>, <horms@...nel.org>, <kangyan91@...look.com>,
	<kuba@...nel.org>, <kuni1840@...il.com>, <kuniyu@...zon.com>,
	<netdev@...r.kernel.org>, <pabeni@...hat.com>, <razor@...ckwall.org>,
	<roopa@...dia.com>, <samsun1006219@...il.com>, <syzkaller@...glegroups.com>,
	<willemb@...gle.com>
Subject: Re: [PATCH v1 net] net: Remove RTNL dance for SIOCBRADDIF and SIOCBRDELIF.

From: Ido Schimmel <idosch@...sch.org>
Date: Sun, 16 Mar 2025 18:16:00 +0200
> On Thu, Mar 13, 2025 at 05:59:55PM -0700, Kuniyuki Iwashima wrote:
> > SIOCBRDELIF is passed to dev_ioctl() first and later forwarded to
> > br_ioctl_call(), which causes unnecessary RTNL dance and the splat
> > below [0] under RTNL pressure.
> > 
> > Let's say Thread A is trying to detach a device from a bridge and
> > Thread B is trying to remove the bridge.
> > 
> > In dev_ioctl(), Thread A bumps the bridge device's refcnt by
> > netdev_hold() and releases RTNL because the following br_ioctl_call()
> > also re-acquires RTNL.
> > 
> > In the race window, Thread B could acquire RTNL and try to remove
> > the bridge device.  Then, rtnl_unlock() by Thread B will release RTNL
> > and wait for netdev_put() by Thread A.
> > 
> > Thread A, however, must hold RTNL twice after the unlock in dev_ifsioc(),
> > which may take long under RTNL pressure, resulting in the splat by
> > Thread B.
> > 
> >   Thread A (SIOCBRDELIF)           Thread B (SIOCBRDELBR)
> >   ----------------------           ----------------------
> >   sock_ioctl                       sock_ioctl
> >   `- sock_do_ioctl                 `- br_ioctl_call
> >      `- dev_ioctl                     `- br_ioctl_stub
> >         |- rtnl_lock                     |
> >         |- dev_ifsioc                    '
> >         '  |- dev = __dev_get_by_name(...)
> >            |- netdev_hold(dev, ...)      .
> >        /   |- rtnl_unlock  ------.       |
> >        |   |- br_ioctl_call       `--->  |- rtnl_lock
> >   Race |   |  `- br_ioctl_stub           |- br_del_bridge
> >   Window   |     |                       |  |- dev = __dev_get_by_name(...)
> >        |   |     |  May take long        |  `- br_dev_delete(dev, ...)
> >        |   |     |  under RTNL pressure  |     `- unregister_netdevice_queue(dev, ...)
> >        |   |     |                |      `- rtnl_unlock
> >        |   |     |- rtnl_lock  <--|         `- netdev_run_todo
> >        |   |     |- ...           |            `- netdev_run_todo
> >        |   |     `- rtnl_unlock   |               |- __rtnl_unlock
> >        |   |                      |               |- netdev_wait_allrefs_any
> >        \   |- rtnl_lock  <--------'                  |
> >            |- netdev_put(dev, ...)  <----------------'  Wait refcnt decrement
> >                                                         and log splat below
> 
> Isn't the race window a bit smaller? dev_ifsioc() does netdev_put()
> before rtnl_lock().

Ah right, looks like I'm lost while writing.


> 
> > 
> > To avoid blocking SIOCBRDELBR unnecessarily, let's not call
> > dev_ioctl() for SIOCBRADDIF and SIOCBRDELIF.
> > 
> > In the dev_ioctl() path, we do the following:
> > 
> >   1. Copy struct ifreq by get_user_ifreq in sock_do_ioctl()
> >   2. Check CAP_NET_ADMIN in dev_ioctl()
> >   3. Call dev_load() in dev_ioctl()
> >   4. Fetch the master dev from ifr.ifr_name in dev_ifsioc()
> > 
> > 3. can be done by request_module() in br_ioctl_call(), so we move
> > 1., 2., and 4. to br_ioctl_stub().
> > 
> > Note that 2. is also checked later in add_del_if(), but it's better
> > performed before RTNL.
> > 
> > SIOCBRADDIF and SIOCBRDELIF have been processed in dev_ioctl() since
> > the pre-git era, and there seems to be no specific reason to process
> > them there.
> 
> I couldn't find an explanation as well.
> 
> Doesn't seem like we have any tests for the IOCTL path, but FWIW I
> verified that basic operations using brctl still work after this patch.

Thanks :)

> 
> > 
> > [0]:
> > unregister_netdevice: waiting for wpan3 to become free. Usage count = 2
> > ref_tracker: wpan3@...f8880662d8608 has 1/1 users at
> >      __netdev_tracker_alloc include/linux/netdevice.h:4282 [inline]
> >      netdev_hold include/linux/netdevice.h:4311 [inline]
> >      dev_ifsioc+0xc6a/0x1160 net/core/dev_ioctl.c:624
> >      dev_ioctl+0x255/0x10c0 net/core/dev_ioctl.c:826
> >      sock_do_ioctl+0x1ca/0x260 net/socket.c:1213
> >      sock_ioctl+0x23a/0x6c0 net/socket.c:1318
> >      vfs_ioctl fs/ioctl.c:51 [inline]
> >      __do_sys_ioctl fs/ioctl.c:906 [inline]
> >      __se_sys_ioctl fs/ioctl.c:892 [inline]
> >      __x64_sys_ioctl+0x1a4/0x210 fs/ioctl.c:892
> >      do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> >      do_syscall_64+0xcb/0x250 arch/x86/entry/common.c:83
> >      entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > 
> > Fixes: 893b19587534 ("net: bridge: fix ioctl locking")
> > Reported-by: syzkaller <syzkaller@...glegroups.com>
> > Reported-by: yan kang <kangyan91@...look.com>
> > Reported-by: yue sun <samsun1006219@...il.com>
> > Closes: https://lore.kernel.org/netdev/SY8P300MB0421225D54EB92762AE8F0F2A1D32@SY8P300MB0421.AUSP300.PROD.OUTLOOK.COM/
> > Signed-off-by: Kuniyuki Iwashima <kuniyu@...zon.com>
> 
> Thanks for the fix and the detailed commit message. One nit below.
> 
> > ---
> >  include/linux/if_bridge.h |  6 ++----
> >  net/bridge/br_ioctl.c     | 39 ++++++++++++++++++++++++++++++++++++---
> >  net/bridge/br_private.h   |  3 +--
> >  net/core/dev_ioctl.c      | 19 -------------------
> >  net/socket.c              | 19 +++++++++----------
> >  5 files changed, 48 insertions(+), 38 deletions(-)
> > 
> > diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
> > index 3ff96ae31bf6..c5fe3b2a53e8 100644
> > --- a/include/linux/if_bridge.h
> > +++ b/include/linux/if_bridge.h
> > @@ -65,11 +65,9 @@ struct br_ip_list {
> >  #define BR_DEFAULT_AGEING_TIME	(300 * HZ)
> >  
> >  struct net_bridge;
> > -void brioctl_set(int (*hook)(struct net *net, struct net_bridge *br,
> > -			     unsigned int cmd, struct ifreq *ifr,
> > +void brioctl_set(int (*hook)(struct net *net, unsigned int cmd,
> >  			     void __user *uarg));
> > -int br_ioctl_call(struct net *net, struct net_bridge *br, unsigned int cmd,
> > -		  struct ifreq *ifr, void __user *uarg);
> > +int br_ioctl_call(struct net *net, unsigned int cmd, void __user *uarg);
> >  
> >  #if IS_ENABLED(CONFIG_BRIDGE) && IS_ENABLED(CONFIG_BRIDGE_IGMP_SNOOPING)
> >  int br_multicast_list_adjacent(struct net_device *dev,
> > diff --git a/net/bridge/br_ioctl.c b/net/bridge/br_ioctl.c
> > index f213ed108361..b5a607f6da4e 100644
> > --- a/net/bridge/br_ioctl.c
> > +++ b/net/bridge/br_ioctl.c
> > @@ -394,10 +394,29 @@ static int old_deviceless(struct net *net, void __user *data)
> >  	return -EOPNOTSUPP;
> >  }
> >  
> > -int br_ioctl_stub(struct net *net, struct net_bridge *br, unsigned int cmd,
> > -		  struct ifreq *ifr, void __user *uarg)
> > +int br_ioctl_stub(struct net *net, unsigned int cmd, void __user *uarg)
> >  {
> >  	int ret = -EOPNOTSUPP;
> > +	struct ifreq ifr;
> > +
> > +	switch (cmd) {
> > +	case SIOCBRADDIF:
> > +	case SIOCBRDELIF: {
> 
> Why not a simple if statement? Unlikely that we will add more commands
> to this switch statement.

Exactly, will use if in v2.
Then the funky }} will look cleaner too.

Thank you both !

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ