[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZoVrzGBouwEQU3Bu@localhost.localdomain>
Date: Wed, 3 Jul 2024 17:18:36 +0200
From: Michal Kubiak <michal.kubiak@...el.com>
To: Jeongjun Park <aha310510@...il.com>
CC: <jiri@...nulli.us>,
<syzbot+705c61d60b091ef42c04@...kaller.appspotmail.com>,
<davem@...emloft.net>, <edumazet@...gle.com>, <kuba@...nel.org>,
<linux-kernel@...r.kernel.org>, <netdev@...r.kernel.org>,
<pabeni@...hat.com>, <syzkaller-bugs@...glegroups.com>
Subject: Re: [PATCH net] team: Fix ABBA deadlock caused by race in
team_del_slave
On Wed, Jul 03, 2024 at 11:51:59PM +0900, Jeongjun Park wrote:
> CPU0 CPU1
> ---- ----
> lock(&rdev->wiphy.mtx);
> lock(team->team_lock_key#4);
> lock(&rdev->wiphy.mtx);
> lock(team->team_lock_key#4);
>
> Deadlock occurs due to the above scenario. Therefore,
> modify the code as shown in the patch below to prevent deadlock.
>
> Regards,
> Jeongjun Park.
The commit message should contain the patch description only (without
salutations, etc.).
>
> Reported-and-tested-by: syzbot+705c61d60b091ef42c04@...kaller.appspotmail.com
> Fixes: 61dc3461b954 ("team: convert overall spinlock to mutex")
> Signed-off-by: Jeongjun Park <aha310510@...il.com>
> ---
> drivers/net/team/team_core.c | 14 ++++++++------
> 1 file changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/net/team/team_core.c b/drivers/net/team/team_core.c
> index ab1935a4aa2c..3ac82df876b0 100644
> --- a/drivers/net/team/team_core.c
> +++ b/drivers/net/team/team_core.c
> @@ -1970,11 +1970,12 @@ static int team_add_slave(struct net_device *dev, struct net_device *port_dev,
> struct netlink_ext_ack *extack)
> {
> struct team *team = netdev_priv(dev);
> - int err;
> + int err, locked;
>
> - mutex_lock(&team->lock);
> + locked = mutex_trylock(&team->lock);
> err = team_port_add(team, port_dev, extack);
> - mutex_unlock(&team->lock);
> + if (locked)
> + mutex_unlock(&team->lock);
This is not correct usage of 'mutex_trylock()' API. In such a case you
could as well remove the lock completely from that part of code.
If "mutex_trylock()" returns false it means the mutex cannot be taken
(because it was already taken by other thread), so you should not modify
the resources that were expected to be protected by the mutex.
In other words, there is a risk of modifying resources using
"team_port_add()" by several threads at a time.
>
> if (!err)
> netdev_change_features(dev);
> @@ -1985,11 +1986,12 @@ static int team_add_slave(struct net_device *dev, struct net_device *port_dev,
> static int team_del_slave(struct net_device *dev, struct net_device *port_dev)
> {
> struct team *team = netdev_priv(dev);
> - int err;
> + int err, locked;
>
> - mutex_lock(&team->lock);
> + locked = mutex_trylock(&team->lock);
> err = team_port_del(team, port_dev);
> - mutex_unlock(&team->lock);
> + if (locked)
> + mutex_unlock(&team->lock);
The same story as in case of "team_add_slave()".
>
> if (err)
> return err;
> --
>
The patch does not seem to be a correct solution to remove a deadlock.
Most probably a synchronization design needs an inspection.
If you really want to use "mutex_trylock()" API, please consider several
attempts of taking the mutex, but never modify the protected resources when
the mutex is not taken successfully.
Thanks,
Michal
Powered by blists - more mailing lists