[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d72d0a12-f3a4-4b4d-8b3b-5e59937a21d3@linux.intel.com>
Date: Wed, 17 Jan 2024 17:00:32 +0800
From: Ethan Zhao <haifeng.zhao@...ux.intel.com>
To: Baolu Lu <baolu.lu@...ux.intel.com>, kevin.tian@...el.com,
bhelgaas@...gle.com, dwmw2@...radead.org, will@...nel.org,
robin.murphy@....com, lukas@...ner.de
Cc: linux-pci@...r.kernel.org, iommu@...ts.linux.dev,
linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH v10 0/5] fix vt-d hard lockup when hotplug ATS capable
device
On 1/17/2024 1:38 PM, Ethan Zhao wrote:
>
> On 1/17/2024 1:26 PM, Ethan Zhao wrote:
>>
>> On 1/17/2024 11:24 AM, Baolu Lu wrote:
>>> On 2024/1/15 15:58, Ethan Zhao wrote:
>>>> -static int qi_check_fault(struct intel_iommu *iommu, int index,
>>>> int wait_index)
>>>> +static int qi_check_fault(struct intel_iommu *iommu, int index,
>>>> int wait_index,
>>>> + pci_dev *target_pdev)
>>>> {
>>>> u32 fault;
>>>> int head, tail;
>>>> + u64 iqe_err, ice_sid;
>>>> struct q_inval *qi = iommu->qi;
>>>> int shift = qi_shift(iommu);
>>>>
>>>> if (qi->desc_status[wait_index] == QI_ABORT)
>>>> return -EAGAIN;
>>>>
>>>> + /*
>>>> + * If the ATS invalidation target device is gone this
>>>> moment (surprise
>>>> + * removed, died, no response) don't try this request
>>>> again. this
>>>> + * request will not get valid result anymore. but the
>>>> request was
>>>> + * already submitted to hardware and we predict to get a
>>>> ITE in
>>>> + * followed batch of request, if so, it will get handled then.
>>>> + */
>>>
>>> We can't leave the ITE triggered by this request for the next one,
>>> which
>>> has no context about why this happened. Perhaps move below code down to
>>> the segment that handles ITEs.
>>
>> Here, the invalidation request has been issued to hardware but target
>> device
>>
>> gone, we can't loop and wait for the ITE for this request to happen,
>> and we
>>
>> bail out here because we hold lock_irqsave lock , the ITE still could
>> happen
>>
>> with later batch request in the future, though it is not triggered
>> by that request,
>>
>> but it could still be cleaned/handled. move it to the fault() segment
>> ?,there means
>
> That moment, the ITE was triggered by previous requests, they are not
> in current
>
> context, also shouldn't be retried, they have response time over
> expected.
>
> but triggered ITE fault blocks this patch request, we should retry
> this batch request.
>
> we just clean the fault and retry it. nothing missed.
>
>
> Thanks,
>
> Ethan
>
>>
>> ITE already happened, no need to check target presence anymore.
>>
>> did I miss something about the context lost ?
>>
>>>
>>> Another concern is about qi_dump_fault(), which pr_err's the fault
>>> message as long as the register is set. Some faults are predictable,
>>> such as cache invalidation for surprise-removed devices.
>>> Unconditionally
>>> reporting errors with pr_err() may lead the user to believe that a more
>>> serious hardware error has occurred. Probably we can refine this
>>> part of
>>> the code as well.
>>
>> Agree, may refine them in seperated series ?
>>
>> loop and always retry IQE, ICE don't make sense per my
>> understanding. if
>>
>> IQE happened retry it will always reproduce the fault, because
>> request is the same.
>>
>> we could fix them together in other patches.
>>
>>
>> Thanks,
>>
>> Ethan
>>
>>>
>>> Others look sane to me.
>>>
>>>> + if (target_pdev && !pci_device_is_present(target_pdev))
>>>> + return -EINVAL;
>>>> +
>>>> fault = readl(iommu->reg + DMAR_FSTS_REG);
>>>> if (fault & (DMA_FSTS_IQE | DMA_FSTS_ITE | DMA_FSTS_ICE))
>>>> qi_dump_fault(iommu, fault);
>>>> @@ -1315,6 +1327,13 @@ static int qi_check_fault(struct intel_iommu
>>>> *iommu, int index, int wait_index)
>>>> tail = readl(iommu->reg + DMAR_IQT_REG);
>>>> tail = ((tail >> shift) - 1 + QI_LENGTH) % QI_LENGTH;
>>>>
>>>> + /*
>>>> + * SID field is valid only when the ITE field is
>>>> Set in FSTS_REG
>>>> + * see Intel VT-d spec r4.1, section 11.4.9.9
>>>> + */
>>>> + iqe_err = dmar_readq(iommu->reg + DMAR_IQER_REG);
>>>> + ice_sid = DMAR_IQER_REG_ITESID(iqe_err);
>>>> +
>>>> writel(DMA_FSTS_ITE, iommu->reg + DMAR_FSTS_REG);
>>>> pr_info("Invalidation Time-out Error (ITE)
>>>> cleared\n");
>>>>
>>>> @@ -1324,6 +1343,16 @@ static int qi_check_fault(struct intel_iommu
>>>> *iommu, int index, int wait_index)
>>>> head = (head - 2 + QI_LENGTH) % QI_LENGTH;
>>>> } while (head != tail);
>>>>
>>>> + /*
>>>> + * If got ITE, we need to check if the sid of ITE
>>>> is the same as
>>>> + * current ATS invalidation target device, if yes,
>>>> don't try this
>>>> + * request anymore, the target device has a
>>>> response time beyound
>>>> + * expected. 0 value of ice_sid means old device,
>>>> no ice_sid value.
>>>> + */
>>>> + if (target_pdev && ice_sid && ice_sid ==
>>>> + pci_dev_id(pci_physfn(target_pdev))
>>>> + return -ETIMEDOUT;
>>>> +
>>>> if (qi->desc_status[wait_index] == QI_ABORT)
>>>> return -EAGAIN;
>>>> }
>>>
>>> Best regards,
>>> baolu
>>
>
new proposed qi_check_fault() change as following:
- remove the lines of
if (qi->desc_status[wait_index] == QI_ABORT)
return -EAGAIN;
reason see the comment in the code.
- others, the same as last proposed code.
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -1267,15 +1267,33 @@ static void qi_dump_fault(struct intel_iommu
*iommu, u32 fault)
(unsigned long long)desc->qw1);
}
-static int qi_check_fault(struct intel_iommu *iommu, int index, int
wait_index)
+static int qi_check_fault(struct intel_iommu *iommu, int index, int
wait_index,
+ pci_dev *target_pdev)
{
u32 fault;
int head, tail;
+ u64 iqe_err, ice_sid;
struct q_inval *qi = iommu->qi;
int shift = qi_shift(iommu);
+ /*
+ * If the inv_wait_dsc got QI_ABORT here, means an ITE happens,
retry
+ * this batch of invalidation request without clearing the ITE fault
+ * will be trapped into dead loop. remov this line.
+ * see Intel VT-d spec r4.1 sec 6.5.2.10, 6.5.2.8
+ *
+ * if (qi->desc_status[wait_index] == QI_ABORT)
+ * return -EAGAIN;
+ */
- if (qi->desc_status[wait_index] == QI_ABORT)
- return -EAGAIN;
+ /*
+ * If the ATS invalidation target device is gone this moment
(surprise
+ * removed, died, no response) don't try this request again. this
+ * request will not get valid result anymore. but the request was
+ * already submitted to hardware and we predict to get a ITE in
+ * followed batch of request, if so, it will get handled then.
+ */
+ if (target_pdev && !pci_device_is_present(target_pdev))
+ return -EINVAL;
fault = readl(iommu->reg + DMAR_FSTS_REG);
if (fault & (DMA_FSTS_IQE | DMA_FSTS_ITE | DMA_FSTS_ICE))
@@ -1315,6 +1333,13 @@ static int qi_check_fault(struct intel_iommu
*iommu, int index, int wait_index)
tail = readl(iommu->reg + DMAR_IQT_REG);
tail = ((tail >> shift) - 1 + QI_LENGTH) % QI_LENGTH;
+ /*
+ * SID field is valid only when the ITE field is Set in
FSTS_REG
+ * see Intel VT-d spec r4.1, section 11.4.9.9
+ */
+ iqe_err = dmar_readq(iommu->reg + DMAR_IQER_REG);
+ ice_sid = DMAR_IQER_REG_ITESID(iqe_err);
+
writel(DMA_FSTS_ITE, iommu->reg + DMAR_FSTS_REG);
pr_info("Invalidation Time-out Error (ITE) cleared\n");
@@ -1324,6 +1349,16 @@ static int qi_check_fault(struct intel_iommu
*iommu, int index, int wait_index)
head = (head - 2 + QI_LENGTH) % QI_LENGTH;
} while (head != tail);
+ /*
+ * If got ITE, we need to check if the sid of ITE is the
same as
+ * current ATS invalidation target device, if yes, don't
try this
+ * request anymore, the target device has a response
time beyound
+ * expected. 0 value of ice_sid means old device, no
ice_sid value.
+ */
+ if (target_pdev && ice_sid && ice_sid ==
+ pci_dev_id(pci_physfn(target_pdev))
+ return -ETIMEDOUT;
+
if (qi->desc_status[wait_index] == QI_ABORT)
return -EAGAIN;
}
@@ -1344,7 +1379,7 @@ static int qi_check_fault(struct intel_iommu
*iommu, int index, int wait_index)
* can be part of the submission but it will not be polled for completion.
*/
int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc,
- unsigned int count, unsigned long options)
+ unsigned int count, unsigned long options, pci_dev
*target_pdev)
{
struct q_inval *qi = iommu->qi;
s64 devtlb_start_ktime = 0;
@@ -1430,7 +1465,7 @@ int qi_submit_sync(struct intel_iommu *iommu,
struct qi_desc *desc,
* a deadlock where the interrupt context can wait
indefinitely
* for free slots in the queue.
*/
- rc = qi_check_fault(iommu, index, wait_index);
+ rc = qi_check_fault(iommu, index, wait_index, target_pdev);
if (rc)
break;
Thanks,
Ethan
Powered by blists - more mailing lists