[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4d50c00a-9718-4ec5-bdef-ea14c7727ff4@linux.intel.com>
Date: Thu, 11 Jan 2024 15:14:12 +0800
From: Ethan Zhao <haifeng.zhao@...ux.intel.com>
To: Baolu Lu <baolu.lu@...ux.intel.com>, "Tian, Kevin"
<kevin.tian@...el.com>, "Liu, Yi L" <yi.l.liu@...el.com>,
"joro@...tes.org" <joro@...tes.org>,
"alex.williamson@...hat.com" <alex.williamson@...hat.com>,
"jgg@...dia.com" <jgg@...dia.com>,
"robin.murphy@....com" <robin.murphy@....com>
Cc: "cohuck@...hat.com" <cohuck@...hat.com>,
"eric.auger@...hat.com" <eric.auger@...hat.com>,
"nicolinc@...dia.com" <nicolinc@...dia.com>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"mjrosato@...ux.ibm.com" <mjrosato@...ux.ibm.com>,
"chao.p.peng@...ux.intel.com" <chao.p.peng@...ux.intel.com>,
"yi.y.sun@...ux.intel.com" <yi.y.sun@...ux.intel.com>,
"peterx@...hat.com" <peterx@...hat.com>,
"jasowang@...hat.com" <jasowang@...hat.com>,
"shameerali.kolothum.thodi@...wei.com"
<shameerali.kolothum.thodi@...wei.com>, "lulu@...hat.com" <lulu@...hat.com>,
"suravee.suthikulpanit@....com" <suravee.suthikulpanit@....com>,
"iommu@...ts.linux.dev" <iommu@...ts.linux.dev>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-kselftest@...r.kernel.org" <linux-kselftest@...r.kernel.org>,
"Duan, Zhenzhong" <zhenzhong.duan@...el.com>,
"joao.m.martins@...cle.com" <joao.m.martins@...cle.com>,
"Zeng, Xin" <xin.zeng@...el.com>, "Zhao, Yan Y" <yan.y.zhao@...el.com>,
"j.granados@...sung.com" <j.granados@...sung.com>
Subject: Re: [PATCH v8 07/10] iommu/vt-d: Allow qi_submit_sync() to return the
QI faults
On 1/1/2024 11:34 AM, Baolu Lu wrote:
> On 12/28/23 2:17 PM, Tian, Kevin wrote:
>>> raw_spin_lock_irqsave(&qi->q_lock, flags);
>>> /*
>>> @@ -1430,7 +1439,7 @@ int qi_submit_sync(struct intel_iommu *iommu,
>>> struct qi_desc *desc,
>>> * a deadlock where the interrupt context can wait
>>> indefinitely
>>> * for free slots in the queue.
>>> */
>>> - rc = qi_check_fault(iommu, index, wait_index);
>>> + rc = qi_check_fault(iommu, index, wait_index, fault);
>>> if (rc)
>>> break;
>> and as replied in another thread let's change qi_check_fault to return
>> -ETIMEDOUT to break the restart loop when fault pointer is valid.
>
> It's fine to break the retry loop when fault happens and the fault
> pointer is valid. Please don't forget to add an explanation comment
> around the code. Something like:
>
> /*
> * The caller is able to handle the fault by itself. The IOMMU driver
> * should not attempt to retry this request.
> */
If caller could pass desc with mixed iotlb & devtlb invalidation request,
it would be problematic/difficult for caller or qi_submit_sync() to do
error handling, imagine a case like,
1. call qi_submit_sync() with iotlb & devltb.
2. qi_submit_sync() detects the target device is dead.
3. break the loop, or will block other invalidation submitter / hang.
4. it is hard for qi_submit_sync() to extract those iotlb invalidation
to retry.
5. it is also difficult for caller to retry the iotlb invalidation, or
leave iotlb out-of-sync. ---there is no sync at all, device is gone.
and if only ITE fault hit, but target device is there && configuration
space reading okay, the ITE is probably left by previous request for
other device, not triggered by this batch, the question is we couldn't
identify the ITE device is just the same as current target ? if the same,
then breaking out is reasonable, or just leave the problem to caller,
something in the request batch is bad, some requests someone request
befoere is bad, but the request is not from the same caller.
Thanks,
Ethan
>
> Best regards,
> baolu
>
Powered by blists - more mailing lists