lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 11 Jan 2024 10:31:25 +0800
From: Baolu Lu <baolu.lu@...ux.intel.com>
To: Ethan Zhao <haifeng.zhao@...ux.intel.com>, kevin.tian@...el.com,
 bhelgaas@...gle.com, dwmw2@...radead.org, will@...nel.org,
 robin.murphy@....com, lukas@...ner.de
Cc: baolu.lu@...ux.intel.com, linux-pci@...r.kernel.org,
 iommu@...ts.linux.dev, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH v10 5/5] iommu/vt-d: don't loop for timeout ATS
 Invalidation request forever

On 1/10/24 4:40 PM, Ethan Zhao wrote:
> 
> On 1/10/2024 1:28 PM, Baolu Lu wrote:
>> On 12/29/23 1:05 AM, Ethan Zhao wrote:
>>> When the ATS Invalidation request timeout happens, the qi_submit_sync()
>>> will restart and loop for the invalidation request forever till it is
>>> done, it will block another Invalidation thread such as the fq_timer
>>> to issue invalidation request, cause the system lockup as following
>>>
>>> [exception RIP: native_queued_spin_lock_slowpath+92]
>>>
>>> RIP: ffffffffa9d1025c RSP: ffffb202f268cdc8 RFLAGS: 00000002
>>>
>>> RAX: 0000000000000101 RBX: ffffffffab36c2a0 RCX: 0000000000000000
>>>
>>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffab36c2a0
>>>
>>> RBP: ffffffffab36c2a0 R8: 0000000000000001 R9: 0000000000000000
>>>
>>> R10: 0000000000000010 R11: 0000000000000018 R12: 0000000000000000
>>>
>>> R13: 0000000000000004 R14: ffff9e10d71b1c88 R15: ffff9e10d71b1980
>>>
>>> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
>>>
>>> (the left part of exception see the hotplug case of ATS capable device)
>>>
>>> If one endpoint device just no response to the ATS Invalidation request,
>>> but is not gone, it will bring down the whole system, to avoid such
>>> case, don't try the timeout ATS Invalidation request forever.
>>>
>>> Signed-off-by: Ethan Zhao <haifeng.zhao@...ux.intel.com>
>>> ---
>>>   drivers/iommu/intel/dmar.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
>>> index 0a8d628a42ee..9edb4b44afca 100644
>>> --- a/drivers/iommu/intel/dmar.c
>>> +++ b/drivers/iommu/intel/dmar.c
>>> @@ -1453,7 +1453,7 @@ int qi_submit_sync(struct intel_iommu *iommu, 
>>> struct qi_desc *desc,
>>>       reclaim_free_desc(qi);
>>>       raw_spin_unlock_irqrestore(&qi->q_lock, flags);
>>>   -    if (rc == -EAGAIN)
>>> +    if (rc == -EAGAIN && type !=QI_DIOTLB_TYPE && type != 
>>> QI_DEIOTLB_TYPE)
>>>           goto restart;
>>>         if (iotlb_start_ktime)
>>
>> Above is also unnecessary if qi_check_fault() returns -ETIMEDOUT,
>> instead of -EAGAIN. Or did I miss anything?
> 
> It is pro if we fold it into qi_check_fault(), the con is we have to add
> 
> more parameter to qi_check_fault(), no need check invalidation type
> 
> of QI_DIOTLB_TYPE&QI_DEIOTLB_TYPE in qi_check_fault() ?

No need to check the request type as multiple requests might be batched
together in a single call. This is also the reason why I asked you to
add a flag bit to this helper and make the intention explicit, say,

"This includes requests to interact with a PCI endpoint. The device may
  become unavailable at any time, so do not attempt to retry if ITE is
  detected and the device has gone away."

Best regards,
baolu

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ