lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230103212846.2zwooig4c5sp4rl7@halaney-x13s>
Date:   Tue, 3 Jan 2023 15:28:46 -0600
From:   Andrew Halaney <ahalaney@...hat.com>
To:     Johan Hovold <johan+linaro@...nel.org>
Cc:     "James E.J. Bottomley" <jejb@...ux.ibm.com>,
        "Martin K. Petersen" <martin.petersen@...cle.com>,
        Alim Akhtar <alim.akhtar@...sung.com>,
        Avri Altman <avri.altman@....com>,
        Bart Van Assche <bvanassche@....org>,
        linux-scsi@...r.kernel.org, linux-kernel@...r.kernel.org,
        stable@...r.kernel.org, Can Guo <quic_cang@...cinc.com>
Subject: Re: [PATCH] scsi: ufs: core: fix devfreq deadlocks

On Thu, Dec 22, 2022 at 11:21:21AM +0100, Johan Hovold wrote:
> There is a lock inversion and rwsem read-lock recursion in the devfreq
> target callback which can lead to deadlocks.
> 
> Specifically, ufshcd_devfreq_scale() already holds a clk_scaling_lock
> read lock when toggling the write booster, which involves taking the
> dev_cmd mutex before taking another clk_scaling_lock read lock.
> 
> This can lead to a deadlock if another thread:
> 
>   1) tries to acquire the dev_cmd and clk_scaling locks in the correct
>      order, or
> 
>   2) takes a clk_scaling write lock before the attempt to take the
>      clk_scaling read lock a second time.
> 
> Fix this by dropping the clk_scaling_lock before toggling the write
> booster as was done before commit 0e9d4ca43ba8 ("scsi: ufs: Protect some
> contexts from unexpected clock scaling").
> 
> While the devfreq callbacks are already serialised, add a second
> serialising mutex to handle the unlikely case where a callback triggered
> through the devfreq sysfs interface is racing with a request to disable
> clock scaling through the UFS controller 'clkscale_enable' sysfs
> attribute. This could otherwise lead to the write booster being left
> disabled after having disabled clock scaling.
> 
> Also take the new mutex in ufshcd_clk_scaling_allow() to make sure that
> any pending write booster update has completed on return.
> 
> Note that this currently only affects Qualcomm platforms since commit
> 87bd05016a64 ("scsi: ufs: core: Allow host driver to disable wb toggling
> during clock scaling").
> 
> The lock inversion (i.e. 1 above) was reported by lockdep as:
> 
>  ======================================================
>  WARNING: possible circular locking dependency detected
>  6.1.0-next-20221216 #211 Not tainted
>  ------------------------------------------------------
>  kworker/u16:2/71 is trying to acquire lock:
>  ffff076280ba98a0 (&hba->dev_cmd.lock){+.+.}-{3:3}, at: ufshcd_query_flag+0x50/0x1c0
> 
>  but task is already holding lock:
>  ffff076280ba9cf0 (&hba->clk_scaling_lock){++++}-{3:3}, at: ufshcd_devfreq_scale+0x2b8/0x380
> 
>  which lock already depends on the new lock.
> [  +0.011606]
>  the existing dependency chain (in reverse order) is:
> 
>  -> #1 (&hba->clk_scaling_lock){++++}-{3:3}:
>         lock_acquire+0x68/0x90
>         down_read+0x58/0x80
>         ufshcd_exec_dev_cmd+0x70/0x2c0
>         ufshcd_verify_dev_init+0x68/0x170
>         ufshcd_probe_hba+0x398/0x1180
>         ufshcd_async_scan+0x30/0x320
>         async_run_entry_fn+0x34/0x150
>         process_one_work+0x288/0x6c0
>         worker_thread+0x74/0x450
>         kthread+0x118/0x120
>         ret_from_fork+0x10/0x20
> 
>  -> #0 (&hba->dev_cmd.lock){+.+.}-{3:3}:
>         __lock_acquire+0x12a0/0x2240
>         lock_acquire.part.0+0xcc/0x220
>         lock_acquire+0x68/0x90
>         __mutex_lock+0x98/0x430
>         mutex_lock_nested+0x2c/0x40
>         ufshcd_query_flag+0x50/0x1c0
>         ufshcd_query_flag_retry+0x64/0x100
>         ufshcd_wb_toggle+0x5c/0x120
>         ufshcd_devfreq_scale+0x2c4/0x380
>         ufshcd_devfreq_target+0xf4/0x230
>         devfreq_set_target+0x84/0x2f0
>         devfreq_update_target+0xc4/0xf0
>         devfreq_monitor+0x38/0x1f0
>         process_one_work+0x288/0x6c0
>         worker_thread+0x74/0x450
>         kthread+0x118/0x120
>         ret_from_fork+0x10/0x20
> 
>  other info that might help us debug this:
>   Possible unsafe locking scenario:
>         CPU0                    CPU1
>         ----                    ----
>    lock(&hba->clk_scaling_lock);
>                                 lock(&hba->dev_cmd.lock);
>                                 lock(&hba->clk_scaling_lock);
>    lock(&hba->dev_cmd.lock);
> 
>   *** DEADLOCK ***
> 
> Fixes: 0e9d4ca43ba8 ("scsi: ufs: Protect some contexts from unexpected clock scaling")
> Cc: stable@...r.kernel.org      # 5.12
> Cc: Can Guo <quic_cang@...cinc.com>
> Signed-off-by: Johan Hovold <johan+linaro@...nel.org>

Tested-by: Andrew Halaney <ahalaney@...hat.com>

Thanks for taking the time to fix this, I can't get any UFS related
splats to show after a brief round of testing now.

For what it is worth, the change looks good to me as well but I'll leave
it to someone familiar with the UFS core to add a proper Reviewed-by tag.

Thanks,
Andrew

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ