linux-kernel - Re: [PATCH v1 1/5] ufs: mcq: Add supporting functions for mcq abort

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <e3d2ea6c-3078-dbdc-6000-f8ea1446381e@quicinc.com>
Date:   Wed, 12 Apr 2023 23:08:31 -0700
From:   "Bao D. Nguyen" <quic_nguyenb@...cinc.com>
To:     Powen Kao (高伯文) <Powen.Kao@...iatek.com>,
        "beanhuo@...ron.com" <beanhuo@...ron.com>,
        "avri.altman@....com" <avri.altman@....com>,
        "bvanassche@....org" <bvanassche@....org>,
        "quic_asutoshd@...cinc.com" <quic_asutoshd@...cinc.com>,
        "martin.petersen@...cle.com" <martin.petersen@...cle.com>,
        "mani@...nel.org" <mani@...nel.org>,
        "quic_cang@...cinc.com" <quic_cang@...cinc.com>,
        "adrian.hunter@...el.com" <adrian.hunter@...el.com>
CC:     "linux-scsi@...r.kernel.org" <linux-scsi@...r.kernel.org>,
        "alim.akhtar@...sung.com" <alim.akhtar@...sung.com>,
        "jejb@...ux.ibm.com" <jejb@...ux.ibm.com>,
        Stanley Chu (朱原陞) 
        <stanley.chu@...iatek.com>,
        "Arthur.Simchaev@....com" <Arthur.Simchaev@....com>,
        "ebiggers@...gle.com" <ebiggers@...gle.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v1 1/5] ufs: mcq: Add supporting functions for mcq abort

On 4/11/2023 6:14 AM, Powen Kao (高伯文) wrote:
> Hi Bao D.,
>
> We have done some test based on your RFC v3 patches and an issue is
> reported.

Hi Powen,

Thank you very much for catching this issue. It seems that I cannot use 
read_poll_timeout() because it sleeps while holding the spin_lock().

I will make the change to poll the registers in a tight loop with 
udelay(20) polling interval in the next revision.

Thanks,
Bao

>
>     kworker/u16:4: BUG: scheduling while atomic:
>     kworker/u16:4/5736/0x00000002
>     kworker/u16:4: [name:core&]Preemption disabled at:
>     kworker/u16:4: [<ffffffef97e33024>] ufshcd_mcq_sq_cleanup+0x9c/0x27c
>     kworker/u16:4: CPU: 2 PID: 5736 Comm: kworker/u16:4 Tainted: G
>     S      W  OE
>     kworker/u16:4: Workqueue: ufs_eh_wq_0 ufshcd_err_handler
>     kworker/u16:4: Call trace:
>     kworker/u16:4:  dump_backtrace+0x108/0x15c
>     kworker/u16:4:  show_stack+0x20/0x30
>     kworker/u16:4:  dump_stack_lvl+0x6c/0x8c
>     kworker/u16:4:  dump_stack+0x20/0x44
>     kworker/u16:4:  __schedule_bug+0xd4/0x100
>     kworker/u16:4:  __schedule+0x660/0xa5c
>     kworker/u16:4:  schedule+0x80/0xec
>     kworker/u16:4:  schedule_hrtimeout_range_clock+0xa0/0x140
>     kworker/u16:4:  schedule_hrtimeout_range+0x1c/0x30
>     kworker/u16:4:  usleep_range_state+0x88/0xd8
>     kworker/u16:4:  ufshcd_mcq_sq_cleanup+0x170/0x27c
>     kworker/u16:4:  ufshcd_clear_cmds+0x78/0x184
>     kworker/u16:4:  ufshcd_wait_for_dev_cmd+0x234/0x348
>     kworker/u16:4:  ufshcd_exec_dev_cmd+0x220/0x298
>     kworker/u16:4:  ufshcd_verify_dev_init+0x68/0x124
>     kworker/u16:4:  ufshcd_probe_hba+0x390/0x9bc
>     kworker/u16:4:  ufshcd_host_reset_and_restore+0x74/0x158
>     kworker/u16:4:  ufshcd_reset_and_restore+0x70/0x31c
>     kworker/u16:4:  ufshcd_err_handler+0xad4/0xe58
>     kworker/u16:4:  process_one_work+0x214/0x5b8
>     kworker/u16:4:  worker_thread+0x2d4/0x448
>     kworker/u16:4:  kthread+0x110/0x1e0
>     kworker/u16:4:  ret_from_fork+0x10/0x20
>     kworker/u16:4: ------------[ cut here ]------------
>
>
> On Wed, 2023-03-29 at 03:01 -0700, Bao D. Nguyen wrote:
>
>> +/**
>> + * ufshcd_mcq_sq_cleanup - Clean up Submission Queue resources
>> + * associated with the pending command.
>> + * @hba - per adapter instance.
>> + * @task_tag - The command's task tag.
>> + * @result - Result of the Clean up operation.
>> + *
>> + * Returns 0 and result on completion. Returns error code if
>> + * the operation fails.
>> + */
>> +int ufshcd_mcq_sq_cleanup(struct ufs_hba *hba, int task_tag, int
>> *result)
>> +{
>> +       struct ufshcd_lrb *lrbp = &hba->lrb[task_tag];
>> +       struct scsi_cmnd *cmd = lrbp->cmd;
>> +       struct ufs_hw_queue *hwq;
>> +       void __iomem *reg, *opr_sqd_base;
>> +       u32 nexus, i, val;
>> +       int err;
>> +
>> +       if (task_tag != hba->nutrs - UFSHCD_NUM_RESERVED) {
>> +               if (!cmd)
>> +                       return FAILED;
>> +               hwq = ufshcd_mcq_req_to_hwq(hba,
>> scsi_cmd_to_rq(cmd));
>> +       } else {
>> +               hwq = hba->dev_cmd_queue;
>> +       }
>> +
>> +       i = hwq->id;
>> +
>> +       spin_lock(&hwq->sq_lock);
> As spin_lock() disable preemption
>
>> +
>> +       /* stop the SQ fetching before working on it */
>> +       err = ufshcd_mcq_sq_stop(hba, hwq);
>> +       if (err)
>> +               goto unlock;
>> +
>> +       /* SQCTI = EXT_IID, IID, LUN, Task Tag */
>> +       nexus = lrbp->lun << 8 | task_tag;
>> +       opr_sqd_base = mcq_opr_base(hba, OPR_SQD, i);
>> +       writel(nexus, opr_sqd_base + REG_SQCTI);
>> +
>> +       /* SQRTCy.ICU = 1 */
>> +       writel(SQ_ICU, opr_sqd_base + REG_SQRTC);
>> +
>> +       /* Poll SQRTSy.CUS = 1. Return result from SQRTSy.RTC */
>> +       reg = opr_sqd_base + REG_SQRTS;
>> +       err = read_poll_timeout(readl, val, val & SQ_CUS, 20,
>> +                               MCQ_POLL_US, false, reg);
> read_poll_timeout() was ufshcd_mcq_poll_register() in last patch,
> right? ufshcd_mcq_poll_register() calls usleep_range() causing KE as
> reported above. Same issue seems to still exist as read_poll_timeout()
> sleeps.
>
> Skipping ufshcd_mcq_sq_cleanup() by returning FAILED directly to
> trigger reset in ufshcd error handler successfully recover host.
>
> BTW, is there maybe a change list between RFC v3 and this v1 patch? :)
> Thanks
>
> Po-Wen
>
>