linux-kernel - Re: [RFC PATCH v10 0/5] fix vt-d hard lockup when hotplug ATS capable device

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <dcd27bc8-5eca-41ae-bb86-fd8e657ccfed@linux.intel.com>
Date: Wed, 17 Jan 2024 13:38:03 +0800
From: Ethan Zhao <haifeng.zhao@...ux.intel.com>
To: Baolu Lu <baolu.lu@...ux.intel.com>, kevin.tian@...el.com,
 bhelgaas@...gle.com, dwmw2@...radead.org, will@...nel.org,
 robin.murphy@....com, lukas@...ner.de
Cc: linux-pci@...r.kernel.org, iommu@...ts.linux.dev,
 linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH v10 0/5] fix vt-d hard lockup when hotplug ATS capable
 device


On 1/17/2024 1:26 PM, Ethan Zhao wrote:
>
> On 1/17/2024 11:24 AM, Baolu Lu wrote:
>> On 2024/1/15 15:58, Ethan Zhao wrote:
>>> -static int qi_check_fault(struct intel_iommu *iommu, int index, int 
>>> wait_index)
>>> +static int qi_check_fault(struct intel_iommu *iommu, int index, int 
>>> wait_index,
>>> +                  pci_dev *target_pdev)
>>>   {
>>>          u32 fault;
>>>          int head, tail;
>>> +       u64 iqe_err, ice_sid;
>>>          struct q_inval *qi = iommu->qi;
>>>          int shift = qi_shift(iommu);
>>>
>>>          if (qi->desc_status[wait_index] == QI_ABORT)
>>>                  return -EAGAIN;
>>>
>>> +       /*
>>> +        * If the ATS invalidation target device is gone this moment 
>>> (surprise
>>> +        * removed, died, no response) don't try this request again. 
>>> this
>>> +        * request will not get valid result anymore. but the 
>>> request was
>>> +        * already submitted to hardware and we predict to get a ITE in
>>> +        * followed batch of request, if so, it will get handled then.
>>> +        */
>>
>> We can't leave the ITE triggered by this request for the next one, which
>> has no context about why this happened. Perhaps move below code down to
>> the segment that handles ITEs.
>
> Here, the invalidation request has been issued to hardware but target 
> device
>
> gone, we can't loop and wait for the ITE for this request to happen, 
> and we
>
> bail out here because we hold lock_irqsave lock , the ITE still could 
> happen
>
> with later batch request in the future,  though it is not triggered by 
> that request,
>
> but it could still be cleaned/handled. move it to the fault() segment 
> ?,there means

That moment, the ITE was triggered by previous requests, they are not in 
current

context, also shouldn't be retried, they have response time over expected.

but triggered ITE fault blocks this patch request, we should retry this 
batch request.

we just clean the fault and retry it.  nothing missed.


Thanks,

Ethan

>
> ITE already happened, no need to check target presence anymore.
>
> did I miss something about the context lost ?
>
>>
>> Another concern is about qi_dump_fault(), which pr_err's the fault
>> message as long as the register is set. Some faults are predictable,
>> such as cache invalidation for surprise-removed devices. Unconditionally
>> reporting errors with pr_err() may lead the user to believe that a more
>> serious hardware error has occurred. Probably we can refine this part of
>> the code as well.
>
> Agree, may refine them in seperated series ?
>
> loop and always retry IQE, ICE don't make sense per my understanding.  if
>
> IQE happened retry it will always reproduce the fault, because request 
> is the same.
>
> we could fix them together in other patches.
>
>
> Thanks,
>
> Ethan
>
>>
>> Others look sane to me.
>>
>>> +       if (target_pdev && !pci_device_is_present(target_pdev))
>>> +               return -EINVAL;
>>> +
>>>          fault = readl(iommu->reg + DMAR_FSTS_REG);
>>>          if (fault & (DMA_FSTS_IQE | DMA_FSTS_ITE | DMA_FSTS_ICE))
>>>                  qi_dump_fault(iommu, fault);
>>> @@ -1315,6 +1327,13 @@ static int qi_check_fault(struct intel_iommu 
>>> *iommu, int index, int wait_index)
>>>                  tail = readl(iommu->reg + DMAR_IQT_REG);
>>>                  tail = ((tail >> shift) - 1 + QI_LENGTH) % QI_LENGTH;
>>>
>>> +               /*
>>> +                * SID field is valid only when the ITE field is Set 
>>> in FSTS_REG
>>> +                * see Intel VT-d spec r4.1, section 11.4.9.9
>>> +                */
>>> +               iqe_err = dmar_readq(iommu->reg + DMAR_IQER_REG);
>>> +               ice_sid = DMAR_IQER_REG_ITESID(iqe_err);
>>> +
>>>                  writel(DMA_FSTS_ITE, iommu->reg + DMAR_FSTS_REG);
>>>                  pr_info("Invalidation Time-out Error (ITE) 
>>> cleared\n");
>>>
>>> @@ -1324,6 +1343,16 @@ static int qi_check_fault(struct intel_iommu 
>>> *iommu, int index, int wait_index)
>>>                          head = (head - 2 + QI_LENGTH) % QI_LENGTH;
>>>                  } while (head != tail);
>>>
>>> +               /*
>>> +                * If got ITE, we need to check if the sid of ITE is 
>>> the same as
>>> +                * current ATS invalidation target device, if yes, 
>>> don't try this
>>> +                * request anymore, the target device has a response 
>>> time beyound
>>> +                * expected. 0 value of ice_sid means old device, no 
>>> ice_sid value.
>>> +                */
>>> +               if (target_pdev && ice_sid && ice_sid ==
>>> +                   pci_dev_id(pci_physfn(target_pdev))
>>> +                               return -ETIMEDOUT;
>>> +
>>>                  if (qi->desc_status[wait_index] == QI_ABORT)
>>>                          return -EAGAIN;
>>>          }
>>
>> Best regards,
>> baolu
>