linux-kernel - Re: [PATCH net] team: Fix ABBA deadlock caused by race in team_del

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20240703160210.83667-1-aha310510@gmail.com>
Date: Thu,  4 Jul 2024 01:02:10 +0900
From: Jeongjun Park <aha310510@...il.com>
To: michal.kubiak@...el.com
Cc: aha310510@...il.com,
	davem@...emloft.net,
	edumazet@...gle.com,
	jiri@...nulli.us,
	kuba@...nel.org,
	linux-kernel@...r.kernel.org,
	netdev@...r.kernel.org,
	pabeni@...hat.com,
	syzbot+705c61d60b091ef42c04@...kaller.appspotmail.com,
	syzkaller-bugs@...glegroups.com
Subject: Re: [PATCH net] team: Fix ABBA deadlock caused by race in team_del_slave

>
> On Wed, Jul 03, 2024 at 11:51:59PM +0900, Jeongjun Park wrote:
> >        CPU0                    CPU1
> >        ----                    ----
> >   lock(&rdev->wiphy.mtx);
> >                                lock(team->team_lock_key#4);
> >                                lock(&rdev->wiphy.mtx);
> >   lock(team->team_lock_key#4);
> >
> > Deadlock occurs due to the above scenario. Therefore,
> > modify the code as shown in the patch below to prevent deadlock.
> >
> > Regards,
> > Jeongjun Park.
>
> The commit message should contain the patch description only (without
> salutations, etc.).
>
> >
> > Reported-and-tested-by: syzbot+705c61d60b091ef42c04@...kaller.appspotmail.com
> > Fixes: 61dc3461b954 ("team: convert overall spinlock to mutex")
> > Signed-off-by: Jeongjun Park <aha310510@...il.com>
> > ---
> >  drivers/net/team/team_core.c | 14 ++++++++------
> >  1 file changed, 8 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/net/team/team_core.c b/drivers/net/team/team_core.c
> > index ab1935a4aa2c..3ac82df876b0 100644
> > --- a/drivers/net/team/team_core.c
> > +++ b/drivers/net/team/team_core.c
> > @@ -1970,11 +1970,12 @@ static int team_add_slave(struct net_device *dev, struct net_device *port_dev,
> >                           struct netlink_ext_ack *extack)
> >  {
> >         struct team *team = netdev_priv(dev);
> > -       int err;
> > +       int err, locked;
> > 
> > -       mutex_lock(&team->lock);
> > +       locked = mutex_trylock(&team->lock);
> >         err = team_port_add(team, port_dev, extack);
> > -       mutex_unlock(&team->lock);
> > +       if (locked)
> > +               mutex_unlock(&team->lock);
>
> This is not correct usage of 'mutex_trylock()' API. In such a case you
> could as well remove the lock completely from that part of code.
> If "mutex_trylock()" returns false it means the mutex cannot be taken
> (because it was already taken by other thread), so you should not modify
> the resources that were expected to be protected by the mutex.
> In other words, there is a risk of modifying resources using
> "team_port_add()" by several threads at a time.
>
> > 
> >         if (!err)
> >                 netdev_change_features(dev);
> > @@ -1985,11 +1986,12 @@ static int team_add_slave(struct net_device *dev, struct net_device *port_dev,
> >  static int team_del_slave(struct net_device *dev, struct net_device *port_dev)
> >  {
> >         struct team *team = netdev_priv(dev);
> > -       int err;
> > +       int err, locked;
> > 
> > -       mutex_lock(&team->lock);
> > +       locked = mutex_trylock(&team->lock);
> >         err = team_port_del(team, port_dev);
> > -       mutex_unlock(&team->lock);
> > +       if (locked)
> > +               mutex_unlock(&team->lock);
>
> The same story as in case of "team_add_slave()".
>
> > 
> >         if (err)
> >                 return err;
> > --
> >
>
> The patch does not seem to be a correct solution to remove a deadlock.
> Most probably a synchronization design needs an inspection.
> If you really want to use "mutex_trylock()" API, please consider several
> attempts of taking the mutex, but never modify the protected resources when
> the mutex is not taken successfully.
>

Thanks for your comment. I rewrote the patch based on those comments. 
This time, we modified it to return an error so that resources are not 
modified when a race situation occurs. We would appreciate your 
feedback on what this patch would be like.

> Thanks,
> Michal
>
>

Regards,
Jeongjun Park

---
 drivers/net/team/team_core.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/team/team_core.c b/drivers/net/team/team_core.c
index ab1935a4aa2c..43d7c73b25aa 100644
--- a/drivers/net/team/team_core.c
+++ b/drivers/net/team/team_core.c
@@ -1972,7 +1972,8 @@ static int team_add_slave(struct net_device *dev, struct net_device *port_dev,
        struct team *team = netdev_priv(dev);
        int err;
 
-       mutex_lock(&team->lock);
+       if (!mutex_trylock(&team->lock))
+               return -EBUSY;
        err = team_port_add(team, port_dev, extack);
        mutex_unlock(&team->lock);
 
@@ -1987,7 +1988,8 @@ static int team_del_slave(struct net_device *dev, struct net_device *port_dev)
        struct team *team = netdev_priv(dev);
        int err;
 
-       mutex_lock(&team->lock);
+       if (!mutex_trylock(&team->lock))
+               return -EBUSY;
        err = team_port_del(team, port_dev);
        mutex_unlock(&team->lock);
 
--