linux-kernel - Re: [PATCH v2] wifi: ath9k: fix races between ath9k_wmi_cmd and ath9k_wmi_ctrl

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <20230425075426.ubfnohsqe3c2cjdq@fpc>
Date:   Tue, 25 Apr 2023 10:54:26 +0300
From:   Fedor Pchelkin <pchelkin@...ras.ru>
To:     Hillf Danton <hdanton@...a.com>
Cc:     Toke Høiland-Jørgensen <toke@...e.dk>,
        Kalle Valo <kvalo@...nel.org>, linux-kernel@...r.kernel.org,
        syzbot+f2cb6e0ffdb961921e4d@...kaller.appspotmail.com,
        syzbot+df61b36319e045c00a08@...kaller.appspotmail.com,
        linux-wireless@...r.kernel.org, netdev@...r.kernel.org,
        Alexey Khoroshilov <khoroshilov@...ras.ru>,
        lvc-project@...uxtesting.org
Subject: Re: [PATCH v2] wifi: ath9k: fix races between ath9k_wmi_cmd and
 ath9k_wmi_ctrl_rx

On Tue, Apr 25, 2023 at 11:38:32AM +0800, Hillf Danton wrote:
> On 24 Apr 2023 22:18:26 +0300 Fedor Pchelkin <pchelkin@...ras.ru>
> > Currently, the synchronization between ath9k_wmi_cmd() and
> > ath9k_wmi_ctrl_rx() is exposed to a race condition which, although being
> > rather unlikely, can lead to invalid behaviour of ath9k_wmi_cmd().
> > 
> > Consider the following scenario:
> > 
> > CPU0					CPU1
> > 
> > ath9k_wmi_cmd(...)
> >   mutex_lock(&wmi->op_mutex)
> >   ath9k_wmi_cmd_issue(...)
> >   wait_for_completion_timeout(...)
> >   ---
> >   timeout
> >   ---
> > 					/* the callback is being processed
> > 					 * before last_seq_id became zero
> > 					 */
> > 					ath9k_wmi_ctrl_rx(...)
> > 					  spin_lock_irqsave(...)
> > 					  /* wmi->last_seq_id check here
> > 					   * doesn't detect timeout yet
> > 					   */
> > 					  spin_unlock_irqrestore(...)
> >   /* last_seq_id is zeroed to
> >    * indicate there was a timeout
> >    */
> >   wmi->last_seq_id = 0
> 
> Without	wmi->wmi_lock held, updating last_seq_id on the waiter side
> means it is random on the waker side, so the fix below is incorrect.
> 

Thank you for noticing! Of course that should be done.

> >   mutex_unlock(&wmi->op_mutex)
> >   return -ETIMEDOUT
> > 
> > ath9k_wmi_cmd(...)
> >   mutex_lock(&wmi->op_mutex)
> >   /* the buffer is replaced with
> >    * another one
> >    */
> >   wmi->cmd_rsp_buf = rsp_buf
> >   wmi->cmd_rsp_len = rsp_len
> >   ath9k_wmi_cmd_issue(...)
> >     spin_lock_irqsave(...)
> >     spin_unlock_irqrestore(...)
> >   wait_for_completion_timeout(...)
> > 					/* the continuation of the
> > 					 * callback left after the first
> > 					 * ath9k_wmi_cmd call
> > 					 */
> > 					  ath9k_wmi_rsp_callback(...)
> > 					    /* copying data designated
> > 					     * to already timeouted
> > 					     * WMI command into an
> > 					     * inappropriate wmi_cmd_buf
> > 					     */
> > 					    memcpy(...)
> > 					    complete(&wmi->cmd_wait)
> >   /* awakened by the bogus callback
> >    * => invalid return result
> >    */
> >   mutex_unlock(&wmi->op_mutex)
> >   return 0
> > 
> > To fix this, move ath9k_wmi_rsp_callback() under wmi_lock inside
> > ath9k_wmi_ctrl_rx() so that the wmi->cmd_wait can be completed only for
> > initially designated wmi_cmd call, otherwise the path would be rejected
> > with last_seq_id check.
> > 
> > Also move recording the rsp buffer and length into ath9k_wmi_cmd_issue()
> > under the same wmi_lock with last_seq_id update to avoid their racy
> > changes.
> 
> Better in a seperate one.

Well, they are parts of the same problem but now it seems more relevant
to divide the patch in two: the first one for incorrect last_seq_id
synchronization and the second one for recording rsp buffer under the
lock. Thanks!