[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210511091452.721e9a03@jacob-builder>
Date: Tue, 11 May 2021 09:14:52 -0700
From: Jacob Pan <jacob.jun.pan@...ux.intel.com>
To: Jason Gunthorpe <jgg@...dia.com>
Cc: LKML <linux-kernel@...r.kernel.org>,
iommu@...ts.linux-foundation.org, Joerg Roedel <joro@...tes.org>,
Lu Baolu <baolu.lu@...ux.intel.com>,
Jean-Philippe Brucker <jean-philippe@...aro.com>,
Christoph Hellwig <hch@...radead.org>,
Yi Liu <yi.l.liu@...el.com>, Raj Ashok <ashok.raj@...el.com>,
"Tian, Kevin" <kevin.tian@...el.com>,
Dave Jiang <dave.jiang@...el.com>, wangzhou1@...ilicon.com,
zhangfei.gao@...aro.org, vkoul@...nel.org,
jacob.jun.pan@...ux.intel.com,
David Woodhouse <dwmw2@...radead.org>
Subject: Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit
flags
Hi Jason,
On Tue, 11 May 2021 08:48:48 -0300, Jason Gunthorpe <jgg@...dia.com> wrote:
> On Mon, May 10, 2021 at 08:31:45PM -0700, Jacob Pan wrote:
> > Hi Jason,
> >
> > On Mon, 10 May 2021 20:37:49 -0300, Jason Gunthorpe <jgg@...dia.com>
> > wrote:
> > > On Mon, May 10, 2021 at 06:25:07AM -0700, Jacob Pan wrote:
> > >
> > > > +/*
> > > > + * The IOMMU_SVA_BIND_SUPERVISOR flag requests a PASID which can be
> > > > used only
> > > > + * for access to kernel addresses. No IOTLB flushes are
> > > > automatically done
> > > > + * for kernel mappings; it is valid only for access to the kernel's
> > > > static
> > > > + * 1:1 mapping of physical memory — not to vmalloc or even module
> > > > mappings.
> > > > + * A future API addition may permit the use of such ranges, by
> > > > means of an
> > > > + * explicit IOTLB flush call (akin to the DMA API's unmap method).
> > > > + *
> > > > + * It is unlikely that we will ever hook into
> > > > flush_tlb_kernel_range() to
> > > > + * do such IOTLB flushes automatically.
> > > > + */
> > > > +#define IOMMU_SVA_BIND_SUPERVISOR BIT(0)
> > >
> > > Huh? That isn't really SVA, can you call it something saner please?
> > >
> > This is shared kernel virtual address, I am following the SVA lib naming
> > since this is where the flag will be used. Why this is not SVA? Kernel
> > virtual address is still virtual address. Is it due to direct map?
>
> As the above explains it doesn't actually synchronize the kernel's
> address space it just shoves the direct map into the IOMMU.
>
There is no duplicated kernel direct map in IOMMU.
> I suppose a different IOMMU implementation might point the PASID directly
> at the kernel's page table and avoid those limitations - but since
> that isn't portable it seems irrelevant.
>
This is what we are doing here. We allocate a supervisor PASID and put
the kernel page table (init_mm pgd) in this PASID entry.
> Since the only thing it really maps is the direct map I would just
> call it direct_map, or all physical or something.
>
Good idea. It makes things clear to the callers. They must only use direct
map memory for DMA.
> How does this interact with the DMA APIs?
DMA API would use RID2PASID (PASID 0), so it is separated by PASIDs.
> How do you get CPU cache
> flushing/etc into PASID operations that don't trigger IOMMU updates?
>
Sorry, I am not following. This is used for direct map only.
> Honestly, I'm not convinced we should have "kernel SVA" at all.. Why
> does IDXD use normal DMA on the RID for kernel controlled accesses?
>
Using SVA simplifies the work submission, there is no need to do map/unmap.
Just bind PASID with init_mm, then submit work directly either with ENQCMDS
(supervisor version of ENQCMD) to a shared workqueue or put the supervisor
PASID in the descriptor for dedicated workqueue.
> > > Is it really a PASID that always has all of physical memory mapped
> > > into it? Sounds dangerous. What is it for?
> >
> > Yes. It is to bind DMA request w/ PASID with init_mm/init_top_pgt. Per
> > PCIe spec PASID TLP prefix has "Privileged Mode Requested" bit. VT-d
> > supports this with "Privileged-mode-Requested (PR) flag (to distinguish
> > user versus supervisor access)". Each PASID entry has a SRE (Supervisor
> > Request Enable) bit.
>
> The PR flag is only needed if the underlying IOMMU is directly
> processing the CPU page tables. For cases where the IOMMU is using its
> own page table format and has its own copies the PR flag shouldn't be
> used.
>
We are doing the former case. There is no IOMMU page tables for the direct
map.
> Jason
Thanks,
Jacob
Powered by blists - more mailing lists