linux-kernel - Re: [PATCH v2 3/8] iommu/vt-d: Implement device

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220317112304.61cf8953@jacob-builder>
Date:   Thu, 17 Mar 2022 11:23:04 -0700
From:   Jacob Pan <jacob.jun.pan@...ux.intel.com>
To:     Jason Gunthorpe <jgg@...dia.com>
Cc:     iommu@...ts.linux-foundation.org,
        LKML <linux-kernel@...r.kernel.org>,
        Joerg Roedel <joro@...tes.org>,
        Christoph Hellwig <hch@...radead.org>,
        Lu Baolu <baolu.lu@...ux.intel.com>,
        Jean-Philippe Brucker <jean-philippe@...aro.com>,
        Jacob Pan <jacob.jun.pan@...el.com>,
        Raj Ashok <ashok.raj@...el.com>,
        "Kumar, Sanjay K" <sanjay.k.kumar@...el.com>,
        Dave Jiang <dave.jiang@...el.com>,
        Tony Luck <tony.luck@...el.com>,
        "Zanussi, Tom" <tom.zanussi@...el.com>,
        Dan Williams <dan.j.williams@...el.com>,
        "Tian, Kevin" <kevin.tian@...el.com>, Yi Liu <yi.l.liu@...el.com>,
        jacob.jun.pan@...ux.intel.com
Subject: Re: [PATCH v2 3/8] iommu/vt-d: Implement device_pasid domain attach
 ops

Hi Jason,

On Thu, 17 Mar 2022 10:23:08 -0300, Jason Gunthorpe <jgg@...dia.com> wrote:

> On Wed, Mar 16, 2022 at 05:49:59PM -0700, Jacob Pan wrote:
> 
> > > I would expect real applications will try to use the same PASID for
> > > the same IOVA map to optimize IOTLB caching.
> > > 
> > > Is there a use case for that I'm missing?
> > >   
> > Yes. it would be more efficient for PASID selective domain TLB flush.
> > But on VT-d IOTLB is also tagged by domain ID, domain flush can use DID
> > if there are many PASIDs. Not sure about other archs. Agree that
> > optimizing PASIDs for TLB flush should be a common goal.  
> 
> If you sort the list of (device, pasid) tuples can something like VT-d
> collapse all the same devices and just issue one DID invalidation:
> 
>  list_for_each()
>     if (itm->device == last_invalidated_device)
>           continue;
>     invalidate(itm->device);
>     last_invalidated_device = itm->device;
> 
I assume this is for devTLB since IOMMU's IOTLB flush doesn't care about
device. I think it works for device-wide invalidation.

> While something that was per-pasid could issue per-pasid invalidations
> from the same data structure?
> 
yes. we can use the same data structure for PASID selective devTLB but 
 list_for_each()
     if (itm->pasid == pasid_to_be_invalidated;
	     invalidate(itm->device, pasid);

For IOMMU's IOTLB, we also have two granularities
1. domain-wide
2. pasid-wide
For #1, we just use DID to invalidate w/o traverse the list.
For #2, we just need to sanity check the pasid is indeed attached by going
through the list.

Seems to work!

> > > Otherwise your explanation is what I was imagining as well.
> > > 
> > > I would also think about expanding your struct so that the device
> > > driver can track per-device per-domain data as well, that seems
> > > useful IIRC?
> > >   
> > yes, at least both VT-d and FSL drivers have struct device_domain_info.
> >   
> > > ie put a 'sizeof_iommu_dev_pasid_data' in the domain->ops and
> > > allocate that much memory so the driver can use the trailer space for
> > > its own purpose.
> > >   
> > That sounds great to have but not sure i understood correctly how to do
> > it.
> > 
> > Do you mean for each vendor driver's struct device_domain_info (or
> > equivalent), we carve out sizeof_iommu_dev_pasid_data as common data,
> > then the rest of the space is vendor specific? I don't feel I get your
> > point, could you elaborate?  
> 
> I've seen it done two ways..
> 
> With a flex array:
> 
>  struct iommu_device_data {
>      struct list_head list
>      ioasid_t pasid;
>      struct device *dev;
>      [..]
>      u64 device_data[];
>  }
> 
>  struct intel_device_data {
>       [..]
>  }
>  struct iommu_device_data *dev_data;
>  struct intel_device_data *intel_data = (void *)&dev_data->device_data;
> 
> Or with container of:
> 
>  struct iommu_device_data {
>      struct list_head list
>      ioasid_t pasid;
>      struct device *dev;
>      [..]
>  }
> 
>  struct intel_device_data {
>      struct iommu_device_data iommu; // must be first
>      [...]
>  }
>  struct iommu_device_data *dev_data;
>  struct intel_device_data *intel_data = container_of(dev_data, struct
> intel_device_data, iommu);
> 
> In either case you'd add a size_t to the domain->ops specifying how
> much extra memory for the core code to allocate when it manages the
> datastructure. The first case allocates based on struct_size, the
> second case allocates what is specified.
> 
> Look at INIT_RDMA_OBJ_SIZE() for some more complicated example how the
> latter can work. I like it because it has the nice container_of
> pattern in drivers, the downside is it requires a BUILD_BUG_ON to
> check that the driver ordered its struct properly.
> 
> The point is to consolidate all the code for allocating and walking
> the data structure without having to force two allocations and extra
> pointer indirections on performance paths.
Make sense, very neat. Vendor driver would not need to do allocations. Let
me give that a try. Seems #2 has better type safety.

Thank you so much for the thorough explanation!

Jacob