[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210108181830.GA5457@willie-the-truck>
Date: Fri, 8 Jan 2021 18:18:30 +0000
From: Will Deacon <will@...nel.org>
To: Sai Prakash Ranjan <saiprakash.ranjan@...eaurora.org>
Cc: isaacm@...eaurora.org, Rob Clark <robdclark@...il.com>,
Jordan Crouse <jcrouse@...eaurora.org>,
linux-arm-msm@...r.kernel.org, Joerg Roedel <joro@...tes.org>,
linux-kernel@...r.kernel.org,
Akhil P Oommen <akhilpo@...eaurora.org>,
iommu@...ts.linux-foundation.org,
Robin Murphy <robin.murphy@....com>,
linux-arm-kernel@...ts.infradead.org
Subject: Re: [PATCH] iommu/io-pgtable-arm: Allow non-coherent masters to use
system cache
On Fri, Jan 08, 2021 at 11:17:25AM +0530, Sai Prakash Ranjan wrote:
> On 2021-01-07 22:27, isaacm@...eaurora.org wrote:
> > On 2021-01-06 03:56, Will Deacon wrote:
> > > On Thu, Dec 24, 2020 at 12:10:07PM +0530, Sai Prakash Ranjan wrote:
> > > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY
> > > > flag")
> > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
> > > > the memory type setting required for the non-coherent masters to use
> > > > system cache. Now that system cache support for GPU is added, we will
> > > > need to mark the memory as normal sys-cached for GPU to use
> > > > system cache.
> > > > Without this, the system cache lines are not allocated for GPU.
> > > > We use
> > > > the IO_PGTABLE_QUIRK_ARM_OUTER_WBWA quirk instead of a page
> > > > protection
> > > > flag as the flag cannot be exposed via DMA api because of no in-tree
> > > > users.
> > > >
> > > > Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@...eaurora.org>
> > > > ---
> > > > drivers/iommu/io-pgtable-arm.c | 3 +++
> > > > 1 file changed, 3 insertions(+)
> > > >
> > > > diff --git a/drivers/iommu/io-pgtable-arm.c
> > > > b/drivers/iommu/io-pgtable-arm.c
> > > > index 7c9ea9d7874a..3fb7de8304a2 100644
> > > > --- a/drivers/iommu/io-pgtable-arm.c
> > > > +++ b/drivers/iommu/io-pgtable-arm.c
> > > > @@ -415,6 +415,9 @@ static arm_lpae_iopte
> > > > arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
> > > > else if (prot & IOMMU_CACHE)
> > > > pte |= (ARM_LPAE_MAIR_ATTR_IDX_CACHE
> > > > << ARM_LPAE_PTE_ATTRINDX_SHIFT);
> > > > + else if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_OUTER_WBWA)
> > > > + pte |= (ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE
> > > > + << ARM_LPAE_PTE_ATTRINDX_SHIFT);
> > > > }
> > >
> > While this approach of enabling system cache globally for both page
> > tables and other buffers
> > works for the GPU usecase, this isn't ideal for other clients that use
> > system cache. For example,
> > video clients only want to cache a subset of their buffers in the
> > system cache, due to the sizing constraint
> > imposed by how much of the system cache they can use. So, it would be
> > ideal to have
> > a way of expressing the desire to use the system cache on a per-buffer
> > basis. Additionally,
> > our video clients use the DMA layer, and since the requirement is for
> > caching in the system cache
> > to be a per buffer attribute, it seems like we would have to have a
> > DMA attribute to express
> > this on a per-buffer basis.
> >
>
> I did bring this up initially [1], also where is this video client
> in upstream? AFAIK, only system cache user in upstream is GPU.
> We cannot add any DMA attribute unless there is any user upstream
> as per [2], so when the support for such a client is added, wouldn't
> ((data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_OUTER_WBWA) || PROT_FLAG)
> work?
Hmm, I think this is another case where we need to separate out the
page-table walker attributes from the access attributes. Currently,
IO_PGTABLE_QUIRK_ARM_OUTER_WBWA applies _only_ to the page-table walker
and I don't think it makes any sense for that to be per-buffer (how would
you even manage that?). However, if we want to extend this to data accesses
and we know that there are valid use-cases where this should be per-buffer,
then shoe-horning it in with the walker quirk does not feel like the best
thing to do.
As a starting point, we could:
1. Rename IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC
2. Add a new prot flag IOMMU_LLC
3. Have the GPU pass the new prot for its buffer mappings
Does that work? One thing I'm not sure about is whether IOMMU_CACHE should
imply IOMMU_LLC, or whether there is a use-case for inner-cacheable, outer
non-cacheable mappings for a coherent device. Have you ever seen that sort
of thing before?
Will
Powered by blists - more mailing lists