netdev - Re: [PATCH net] team: Fix ABBA deadlock caused by race in team_del

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iKbP4r+uAkHiz8_pdMB9XWoyRWR0NJ7ZuNCOr+LiFr9zg@mail.gmail.com>
Date: Wed, 3 Jul 2024 18:30:05 +0200
From: Eric Dumazet <edumazet@...gle.com>
To: Jeongjun Park <aha310510@...il.com>
Cc: michal.kubiak@...el.com, davem@...emloft.net, jiri@...nulli.us, 
	kuba@...nel.org, linux-kernel@...r.kernel.org, netdev@...r.kernel.org, 
	pabeni@...hat.com, syzbot+705c61d60b091ef42c04@...kaller.appspotmail.com, 
	syzkaller-bugs@...glegroups.com
Subject: Re: [PATCH net] team: Fix ABBA deadlock caused by race in team_del_slave

On Wed, Jul 3, 2024 at 6:02 PM Jeongjun Park <aha310510@...il.com> wrote:
>
> >
> > On Wed, Jul 03, 2024 at 11:51:59PM +0900, Jeongjun Park wrote:
> > >        CPU0                    CPU1
> > >        ----                    ----
> > >   lock(&rdev->wiphy.mtx);
> > >                                lock(team->team_lock_key#4);
> > >                                lock(&rdev->wiphy.mtx);
> > >   lock(team->team_lock_key#4);
> > >
> > > Deadlock occurs due to the above scenario. Therefore,
> > > modify the code as shown in the patch below to prevent deadlock.
> > >
> > > Regards,
> > > Jeongjun Park.
> >
> > The commit message should contain the patch description only (without
> > salutations, etc.).
> >
> > >
> > > Reported-and-tested-by: syzbot+705c61d60b091ef42c04@...kaller.appspotmail.com
> > > Fixes: 61dc3461b954 ("team: convert overall spinlock to mutex")
> > > Signed-off-by: Jeongjun Park <aha310510@...il.com>
> > > ---
> > >  drivers/net/team/team_core.c | 14 ++++++++------
> > >  1 file changed, 8 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/drivers/net/team/team_core.c b/drivers/net/team/team_core.c
> > > index ab1935a4aa2c..3ac82df876b0 100644
> > > --- a/drivers/net/team/team_core.c
> > > +++ b/drivers/net/team/team_core.c
> > > @@ -1970,11 +1970,12 @@ static int team_add_slave(struct net_device *dev, struct net_device *port_dev,
> > >                           struct netlink_ext_ack *extack)
> > >  {
> > >         struct team *team = netdev_priv(dev);
> > > -       int err;
> > > +       int err, locked;
> > >
> > > -       mutex_lock(&team->lock);
> > > +       locked = mutex_trylock(&team->lock);
> > >         err = team_port_add(team, port_dev, extack);
> > > -       mutex_unlock(&team->lock);
> > > +       if (locked)
> > > +               mutex_unlock(&team->lock);
> >
> > This is not correct usage of 'mutex_trylock()' API. In such a case you
> > could as well remove the lock completely from that part of code.
> > If "mutex_trylock()" returns false it means the mutex cannot be taken
> > (because it was already taken by other thread), so you should not modify
> > the resources that were expected to be protected by the mutex.
> > In other words, there is a risk of modifying resources using
> > "team_port_add()" by several threads at a time.
> >
> > >
> > >         if (!err)
> > >                 netdev_change_features(dev);
> > > @@ -1985,11 +1986,12 @@ static int team_add_slave(struct net_device *dev, struct net_device *port_dev,
> > >  static int team_del_slave(struct net_device *dev, struct net_device *port_dev)
> > >  {
> > >         struct team *team = netdev_priv(dev);
> > > -       int err;
> > > +       int err, locked;
> > >
> > > -       mutex_lock(&team->lock);
> > > +       locked = mutex_trylock(&team->lock);
> > >         err = team_port_del(team, port_dev);
> > > -       mutex_unlock(&team->lock);
> > > +       if (locked)
> > > +               mutex_unlock(&team->lock);
> >
> > The same story as in case of "team_add_slave()".
> >
> > >
> > >         if (err)
> > >                 return err;
> > > --
> > >
> >
> > The patch does not seem to be a correct solution to remove a deadlock.
> > Most probably a synchronization design needs an inspection.
> > If you really want to use "mutex_trylock()" API, please consider several
> > attempts of taking the mutex, but never modify the protected resources when
> > the mutex is not taken successfully.
> >
>
> Thanks for your comment. I rewrote the patch based on those comments.
> This time, we modified it to return an error so that resources are not
> modified when a race situation occurs. We would appreciate your
> feedback on what this patch would be like.
>
> > Thanks,
> > Michal
> >
> >
>
> Regards,
> Jeongjun Park
>
> ---
>  drivers/net/team/team_core.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/team/team_core.c b/drivers/net/team/team_core.c
> index ab1935a4aa2c..43d7c73b25aa 100644
> --- a/drivers/net/team/team_core.c
> +++ b/drivers/net/team/team_core.c
> @@ -1972,7 +1972,8 @@ static int team_add_slave(struct net_device *dev, struct net_device *port_dev,
>         struct team *team = netdev_priv(dev);
>         int err;
>
> -       mutex_lock(&team->lock);
> +       if (!mutex_trylock(&team->lock))
> +               return -EBUSY;
>         err = team_port_add(team, port_dev, extack);
>         mutex_unlock(&team->lock);
>
> @@ -1987,7 +1988,8 @@ static int team_del_slave(struct net_device *dev, struct net_device *port_dev)
>         struct team *team = netdev_priv(dev);
>         int err;
>
> -       mutex_lock(&team->lock);
> +       if (!mutex_trylock(&team->lock))
> +               return -EBUSY;
>         err = team_port_del(team, port_dev);
>         mutex_unlock(&team->lock);
>
> --

Failing team_del_slave() is not an option. It will add various issues.