lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <20230518154424.62urbguy4rxetkty@fpc> Date: Thu, 18 May 2023 18:44:24 +0300 From: Fedor Pchelkin <pchelkin@...ras.ru> To: Hillf Danton <hdanton@...a.com> Cc: Toke Høiland-Jørgensen <toke@...e.dk>, Kalle Vallo <kvalo@...nel.org>, syzbot+f2cb6e0ffdb961921e4d@...kaller.appspotmail.com, linux-wireless@...r.kernel.org, netdev@...r.kernel.org, linux-kernel@...r.kernel.org, Alexey Khoroshilov <khoroshilov@...ras.ru>, lvc-project@...uxtesting.org Subject: Re: [PATCH v3 1/2] wifi: ath9k: fix races between ath9k_wmi_cmd and ath9k_wmi_ctrl_rx On Thu, May 18, 2023 at 06:24:37PM +0800, Hillf Danton wrote: > Fedor Pchelkin <pchelkin@...ras.ru> writes: > > > On Wed, Apr 26, 2023 at 07:07:08AM +0800, Hillf Danton wrote: > >> Given similar wait timeout[1], just taking lock on the waiter side is not > >> enough wrt fixing the race, because in case job done on the waker side, > >> waiter needs to wait again after timeout. > >> > > > > As I understand you correctly, you mean the case when a timeout occurs > > during ath9k_wmi_ctrl_rx() callback execution. I suppose if a timeout has > > occurred on a waiter's side, it should return immediately and doesn't have > > to care in which state the callback has been at that moment. > > > > AFAICS, this is controlled properly with taking a wmi_lock on waiter and > > waker sides, and there is no data corruption. > > > > If a callback has not managed to do its work entirely (performing a > > completion and subsequently waking waiting thread is included here), then, > > well, it is considered a timeout, in my opinion. > > > > Your suggestion makes a wmi_cmd call to give a little more chance for the > > belated callback to complete (although timeout has actually expired). That > > is probably good, but increasing a timeout value makes that job, too. I > > don't think it makes any sense on real hardware. > > > > Or do you mean there is data corruption that is properly fixed with your patch? > > Given complete() not paired with wait_for_completion(), what is the > difference after this patch? The main thing in the patch is making ath9k_wmi_ctrl_rx() release wmi_lock after calling ath9k_wmi_rsp_callback() which does copying data into the shared wmi->cmd_rsp_buf buffer. Otherwise there can occur a data corrupting scenario outlined in the patch description (added it here, too). On Tue, 25 Apr 2023 22:26:06 +0300, Fedor Pchelkin wrote: > CPU0 CPU1 > > ath9k_wmi_cmd(...) > mutex_lock(&wmi->op_mutex) > ath9k_wmi_cmd_issue(...) > wait_for_completion_timeout(...) > --- > timeout > --- > /* the callback is being processed > * before last_seq_id became zero > */ > ath9k_wmi_ctrl_rx(...) > spin_lock_irqsave(...) > /* wmi->last_seq_id check here > * doesn't detect timeout yet > */ > spin_unlock_irqrestore(...) > /* last_seq_id is zeroed to > * indicate there was a timeout > */ > wmi->last_seq_id = 0 > mutex_unlock(&wmi->op_mutex) > return -ETIMEDOUT > > ath9k_wmi_cmd(...) > mutex_lock(&wmi->op_mutex) > /* the buffer is replaced with > * another one > */ > wmi->cmd_rsp_buf = rsp_buf > wmi->cmd_rsp_len = rsp_len > ath9k_wmi_cmd_issue(...) > spin_lock_irqsave(...) > spin_unlock_irqrestore(...) > wait_for_completion_timeout(...) > /* the continuation of the > * callback left after the first > * ath9k_wmi_cmd call > */ > ath9k_wmi_rsp_callback(...) > /* copying data designated > * to already timeouted > * WMI command into an > * inappropriate wmi_cmd_buf > */ > memcpy(...) > complete(&wmi->cmd_wait) > /* awakened by the bogus callback > * => invalid return result > */ > mutex_unlock(&wmi->op_mutex) > return 0 So before the patch the wmi->last_seq_id check in ath9k_wmi_ctrl_rx() wasn't helpful in case wmi->last_seq_id value was changed during ath9k_wmi_rsp_callback() execution because of the next ath9k_wmi_cmd() call. With the proposed patch the wmi->last_seq_id check in ath9k_wmi_ctrl_rx() accomplishes its job as: - the next ath9k_wmi_cmd call changes last_seq_id value under lock so it either waits for a belated ath9k_wmi_ctrl_rx() to finish or updates last_seq_id value so that the timeout check in ath9k_wmi_ctrl_rx() indicates that the waiter side has timeouted and we shouldn't further process the callback. - memcopying in ath9k_wmi_rsp_callback() is made to a valid place if the last_seq_id check was successful under the lock.
Powered by blists - more mailing lists