linux-kernel - Re: [PATCH v2 5/5] ufs: core: Add error handling for MCQ mode

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <0bb28ed3-8b4f-77f3-5648-adb42604f37e@acm.org>
Date:   Thu, 4 May 2023 11:04:58 -0700
From:   Bart Van Assche <bvanassche@....org>
To:     "Bao D. Nguyen" <quic_nguyenb@...cinc.com>,
        quic_asutoshd@...cinc.com, quic_cang@...cinc.com, mani@...nel.org,
        Powen.Kao@...iatek.com, stanley.chu@...iatek.com,
        adrian.hunter@...el.com, beanhuo@...ron.com, avri.altman@....com,
        martin.petersen@...cle.com
Cc:     linux-scsi@...r.kernel.org, Alim Akhtar <alim.akhtar@...sung.com>,
        "James E.J. Bottomley" <jejb@...ux.ibm.com>,
        open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2 5/5] ufs: core: Add error handling for MCQ mode

On 5/3/23 21:18, Bao D. Nguyen wrote:
> On 4/25/2023 5:21 PM, Bart Van Assche wrote:
>> On 4/17/23 14:05, Bao D. Nguyen wrote:
>>> +        /* MCQ mode */
>>> +        if (is_mcq_enabled(hba))
>>> +            return ufshcd_clear_cmds(hba, 1UL << lrbp->task_tag);
>>
>> The above code will trigger an overflow if lrbp->task_tag >= 8 * sizeof(unsigned long). That's not acceptable.
> This ufshcd_clear_cmds() uses a bit map. There are multiple places in the UFS code have this limitation if the queue depth grows to be greater than 64. I am thinking:
> 1. Current ufs controllers in the market probably support queue depth 64 or less, so it may not be a problem today if host controller cap is set to 64 queue depth, but can be a problem in multiple places in the code later.
> 2. In mcq mode, we can pass a tag number into this api ufshcd_clear_cmds(); while in SDB mode, pass the tag's bit mask as before.
> 3. Use sbitmask() to support large queue depth? Thanks for any suggestions.

The UFS driver is the only block driver I know that tracks which commands
are pending in a bitmap. Please pass the lrbp pointer or the task_tag directly
to ufshcd_clear_cmds() instead of passing a bitmap to that function. Please
also introduce a loop in ufshcd_eh_device_reset_handler() around the
ufshcd_clear_cmds() call instead of passing a bitmap to ufshcd_clear_cmds().

>>>   static irqreturn_t ufshcd_transfer_req_compl(struct ufs_hba *hba)
>>>   {
>>> +    struct ufshcd_lrb *lrbp;
>>> +    u32 hwq_num, utag;
>>> +    int tag;
>>> +
>>>       /* Resetting interrupt aggregation counters first and reading the
>>>        * DOOR_BELL afterward allows us to handle all the completed requests.
>>>        * In order to prevent other interrupts starvation the DB is read once
>>> @@ -5580,7 +5590,22 @@ static irqreturn_t ufshcd_transfer_req_compl(struct ufs_hba *hba)
>>>        * Ignore the ufshcd_poll() return value and return IRQ_HANDLED since we
>>>        * do not want polling to trigger spurious interrupt complaints.
>>>        */
>>> -    ufshcd_poll(hba->host, UFSHCD_POLL_FROM_INTERRUPT_CONTEXT);
>>> +    if (!is_mcq_enabled(hba)) {
>>> +        ufshcd_poll(hba->host, UFSHCD_POLL_FROM_INTERRUPT_CONTEXT);
>>> +        goto out;
>>> +    }
>>> +
>>> +    /* MCQ mode */
>>> +    for (tag = 0; tag < hba->nutrs; tag++) {
>>> +        lrbp = &hba->lrb[tag];
>>> +        if (lrbp->cmd) {
>>> +            utag = blk_mq_unique_tag(scsi_cmd_to_rq(lrbp->cmd));
>>> +            hwq_num = blk_mq_unique_tag_to_hwq(utag);
>>> +            ufshcd_poll(hba->host, hwq_num);
>>> +        }
>>> +    }
>>
>> Is my understanding correct that ufshcd_transfer_req_compl() is only called from single doorbell code paths and hence that the above change is not necessary?
> ufshcd_transfer_req_compl() can be invoked from MCQ mode such as the ufshcd_err_handler() as below:
> ufshcd_err_handler()-->ufshcd_complete_requests()-->ufshcd_transfer_req_compl()

Since there are multiple statements in ufshcd_transfer_req_compl() that assume
SDB mode (resetting SDB interrupt aggregation and calling ufshcd_poll()), please
move the is_mcq_enabled() test from ufshcd_transfer_req_compl() into the
callers of that function.

Thanks,

Bart.