[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230426190206.ni2au5mpjc5oty67@fpc>
Date: Wed, 26 Apr 2023 22:02:06 +0300
From: Fedor Pchelkin <pchelkin@...ras.ru>
To: Hillf Danton <hdanton@...a.com>
Cc: Toke Høiland-Jørgensen <toke@...e.dk>,
Kalle Vallo <kvalo@...nel.org>,
syzbot+f2cb6e0ffdb961921e4d@...kaller.appspotmail.com,
linux-wireless@...r.kernel.org, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org,
Alexey Khoroshilov <khoroshilov@...ras.ru>,
lvc-project@...uxtesting.org
Subject: Re: [PATCH v3 1/2] wifi: ath9k: fix races between ath9k_wmi_cmd and
ath9k_wmi_ctrl_rx
On Wed, Apr 26, 2023 at 07:07:08AM +0800, Hillf Danton wrote:
> Given similar wait timeout[1], just taking lock on the waiter side is not
> enough wrt fixing the race, because in case job done on the waker side,
> waiter needs to wait again after timeout.
>
As I understand you correctly, you mean the case when a timeout occurs
during ath9k_wmi_ctrl_rx() callback execution. I suppose if a timeout has
occurred on a waiter's side, it should return immediately and doesn't have
to care in which state the callback has been at that moment.
AFAICS, this is controlled properly with taking a wmi_lock on waiter and
waker sides, and there is no data corruption.
If a callback has not managed to do its work entirely (performing a
completion and subsequently waking waiting thread is included here), then,
well, it is considered a timeout, in my opinion.
Your suggestion makes a wmi_cmd call to give a little more chance for the
belated callback to complete (although timeout has actually expired). That
is probably good, but increasing a timeout value makes that job, too. I
don't think it makes any sense on real hardware.
Or do you mean there is data corruption that is properly fixed with your
patch?
That is, I agree there can be a situation when a callback makes all the
logical work it should and it just hasn't got enough time to perform a
completion before a timeout on waiter's side occurs. And this behaviour
can be named "racy". But, technically, this seems to be a rather valid
timeout.
> [1] https://lore.kernel.org/lkml/9d9b9652-c1ac-58e9-2eab-9256c17b1da2@I-love.SAKURA.ne.jp/
>
I don't think it's a similar case because wait_for_completion_state() is
interruptible while wait_for_completion_timeout() is not.
> A correct fix looks like after putting pieces together
>
> +++ b/drivers/net/wireless/ath/ath9k/wmi.c
> @@ -238,6 +238,7 @@ static void ath9k_wmi_ctrl_rx(void *priv
> spin_unlock_irqrestore(&wmi->wmi_lock, flags);
> goto free_skb;
> }
> + wmi->last_seq_id = 0;
> spin_unlock_irqrestore(&wmi->wmi_lock, flags);
>
> /* WMI command response */
> @@ -339,9 +340,20 @@ int ath9k_wmi_cmd(struct wmi *wmi, enum
>
> time_left = wait_for_completion_timeout(&wmi->cmd_wait, timeout);
> if (!time_left) {
> + unsigned long flags;
> + int wait = 0;
> +
> ath_dbg(common, WMI, "Timeout waiting for WMI command: %s\n",
> wmi_cmd_to_name(cmd_id));
> - wmi->last_seq_id = 0;
> +
> + spin_lock_irqsave(&wmi->wmi_lock, flags);
> + if (wmi->last_seq_id == 0) /* job done on the waker side? */
> + wait = 1;
> + else
> + wmi->last_seq_id = 0;
> + spin_unlock_irqrestore(&wmi->wmi_lock, flags);
> + if (wait)
> + wait_for_completion(&wmi->cmd_wait);
> mutex_unlock(&wmi->op_mutex);
> return -ETIMEDOUT;
> }
Powered by blists - more mailing lists