linux-kernel - Re: [RFC] iommu: arm-smmu: stall support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAF6AEGuutkqjrWk4jagE=p-NwHgxdiPZjjsaFsfwtczK568j+A@mail.gmail.com>
Date:   Tue, 19 Sep 2017 10:23:43 -0400
From:   Rob Clark <robdclark@...il.com>
To:     Joerg Roedel <joro@...tes.org>
Cc:     "iommu@...ts.linux-foundation.org" <iommu@...ts.linux-foundation.org>,
        linux-arm-msm <linux-arm-msm@...r.kernel.org>,
        Jordan Crouse <jcrouse@...eaurora.org>,
        Will Deacon <will.deacon@....com>,
        Robin Murphy <robin.murphy@....com>,
        "linux-arm-kernel@...ts.infradead.org" 
        <linux-arm-kernel@...ts.infradead.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [RFC] iommu: arm-smmu: stall support

On Tue, Sep 19, 2017 at 8:30 AM, Joerg Roedel <joro@...tes.org> wrote:
> Hi Rob,
>
> thanks for the RFC patch. I have some comments about the interface to
> the IOMMU-API below.
>
> On Thu, Sep 14, 2017 at 03:44:33PM -0400, Rob Clark wrote:
>> +/**
>> + * iommu_domain_resume - Resume translations for a domain after a fault.
>> + *
>> + * This can be called at some point after the fault handler is called,
>> + * allowing the user of the IOMMU to (for example) handle the fault
>> + * from a task context.  It is illegal to call this if
>> + * iommu_domain_set_attr(STALL) failed.
>> + *
>> + * @domain:    the domain to resume
>> + * @terminate: if true, the translation that triggered the fault should
>> + *    be terminated, else it should be retried.
>> + */
>> +void iommu_domain_resume(struct iommu_domain *domain, bool terminate)
>> +{
>> +     /* invalid to call if iommu_domain_set_attr(STALL) failed: */
>> +     if (WARN_ON(!domain->ops->domain_resume))
>> +             return;
>> +     domain->ops->domain_resume(domain, terminate);
>> +}
>> +EXPORT_SYMBOL_GPL(iommu_domain_resume);
>
> So this function is being called by the device driver owning the domain,
> right?

yes, this was my plan

> I don't think that the resume call-back you added needs to be exposed
> like this. It is better to do the page-fault handling completly in the
> iommu-code, including calling the resume call-back and just let the
> device-driver provide a per-domain call-back to let it handle the fault
> and map in the required pages.

I would like to decide in the IRQ whether or not to queue work or not,
because when we get a gpu fault, we tend to get 1000's of gpu faults
all at once (and I really only need to handle the first one).  I
suppose that could also be achieved by having a special return value
from the fault handler to say "call me again from a wq"..

Note that in the drm driver I already have a suitable wq to queue the
work, so it really doesn't buy me anything to have the iommu driver
toss things off to a wq for me.  Might be a different situation for
other drivers (but I guess mostly other drivers are using iommu API
indirectly via dma-mapping?)

> The interface could look like this:
>
>         * New function iommu_domain_enable_stalls(domain) - When
>           this function returns the domain is in stall-handling mode. A
>           iommu_domain_disable_stalls() might make sense too, not sure
>           about that.

I don't particularly see a use-case for disabling stalls, fwiw

BR,
-R

>         * When stalls are enabled for a domain, report_iommu_fault()
>           queues the fault to a workqueue (so that its handler can
>           block) and in the workqueue you call ->resume() based on the
>           return value of the handler.
>
> As a side-note, as there has been discussion on this: For now it doesn't
> make sense to merge this with the SVM page-fault handling efforts, as
> this path is different enough (SVM will call handle_mm_fault() as the
> handler, for example).
>
>
> Regards,
>
>         Joerg
>