[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <446ffe79-3a44-5d41-459f-b698a1cc361b@arm.com>
Date:   Wed, 1 Jul 2020 20:14:10 +0100
From:   Robin Murphy <robin.murphy@....com>
To:     Krishna Reddy <vdumpa@...dia.com>,
        Jonathan Hunter <jonathanh@...dia.com>
Cc:     Sachin Nikam <Snikam@...dia.com>,
        "nicoleotsuka@...il.com" <nicoleotsuka@...il.com>,
        Mikko Perttunen <mperttunen@...dia.com>,
        Bryan Huntsman <bhuntsman@...dia.com>,
        "will@...nel.org" <will@...nel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Pritesh Raithatha <praithatha@...dia.com>,
        Timo Alho <talho@...dia.com>,
        "iommu@...ts.linux-foundation.org" <iommu@...ts.linux-foundation.org>,
        Nicolin Chen <nicolinc@...dia.com>,
        "linux-tegra@...r.kernel.org" <linux-tegra@...r.kernel.org>,
        Yu-Huan Hsu <YHsu@...dia.com>,
        Thierry Reding <treding@...dia.com>,
        "linux-arm-kernel@...ts.infradead.org" 
        <linux-arm-kernel@...ts.infradead.org>,
        Bitan Biswas <bbiswas@...dia.com>
Subject: Re: [PATCH v8 3/3] iommu/arm-smmu: Add global/context fault
 implementation hooks
On 2020-07-01 19:48, Krishna Reddy wrote:
>>>> +    for (inst = 0; inst < nvidia_smmu->num_inst; inst++) {
>>>> +            irq_ret = nvidia_smmu_global_fault_inst(irq, smmu, inst);
>>>> +            if (irq_ret == IRQ_HANDLED)
>>>> +                    return irq_ret;
>>>
>>> Any chance there could be more than one SMMU faulting by the time we
>>> service the interrupt?
> 
>> It certainly seems plausible if the interconnect is automatically load-balancing requests across the SMMU instances - say a driver bug caused a buffer to be unmapped too early, there could be many in-flight accesses to parts of that buffer that aren't all taking the same path and thus could now fault in parallel.
>> [ And anyone inclined to nitpick global vs. context faults, s/unmap a buffer/tear down a domain/ ;) ]
>> Either way I think it would be easier to reason about if we just handled these like a typical shared interrupt and always checked all the instances.
> 
> It would be optimal to check at the same time across all instances.
> 
>>>> +            for (idx = 0; idx < smmu->num_context_banks; idx++) {
>>>> +                    irq_ret = nvidia_smmu_context_fault_bank(irq, smmu,
>>>> +                                                             idx,
>>>> + inst);
>>>> +
>>>> +                    if (irq_ret == IRQ_HANDLED)
>>>> +                            return irq_ret;
>>>
>>> Any reason why we don't check all banks?
> 
>> As above, we certainly shouldn't bail out without checking the bank for the offending domain across all of its instances, and I guess the way this works means that we would have to iterate all the banks to achieve that.
> 
> With shared irq line, the context fault identification is not optimal already.  Reading all the context banks all the time can be additional mmio read overhead. But, it may not hurt the real use cases as these happen only when there are bugs.
Right, I did ponder the idea of a whole programmatic 
"request_context_irq" hook that would allow registering the handler for 
both interrupts with the appropriate context bank and instance data, but 
since all interrupts are currently unexpected it seems somewhat hard to 
justify the extra complexity. Obviously we can revisit this in future if 
you want to start actually doing something with faults like the qcom GPU 
folks do.
Robin.
Powered by blists - more mailing lists
 
