[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240523151605.GP20229@nvidia.com>
Date: Thu, 23 May 2024 12:16:05 -0300
From: Jason Gunthorpe <jgg@...dia.com>
To: Bjorn Helgaas <helgaas@...nel.org>
Cc: Vidya Sagar <vidyas@...dia.com>, corbet@....net, bhelgaas@...gle.com,
galshalom@...dia.com, leonro@...dia.com, treding@...dia.com,
jonathanh@...dia.com, mmoshrefjava@...dia.com, shahafs@...dia.com,
vsethi@...dia.com, sdonthineni@...dia.com, jan@...dia.com,
tdave@...dia.com, linux-doc@...r.kernel.org,
linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org,
kthota@...dia.com, mmaddireddy@...dia.com, sagar.tv@...il.com,
Joerg Roedel <joro@...tes.org>, Will Deacon <will@...nel.org>,
Robin Murphy <robin.murphy@....com>, iommu@...ts.linux.dev
Subject: Re: [PATCH V3] PCI: Extend ACS configurability
On Thu, May 23, 2024 at 09:59:36AM -0500, Bjorn Helgaas wrote:
> [+cc iommu folks]
>
> On Thu, May 23, 2024 at 12:05:28PM +0530, Vidya Sagar wrote:
> > For iommu_groups to form correctly, the ACS settings in the PCIe fabric
> > need to be setup early in the boot process, either via the BIOS or via
> > the kernel disable_acs_redir parameter.
>
> Can you point to the iommu code that is involved here? It sounds like
> the iommu_groups are built at boot time and are immutable after that?
They are created when the struct device is plugged
in. pci_device_group() does the logic.
Notably groups can't/don't change if details like ACS change after the
groups are setup.
There are alot of instructions out there telling people to boot their
servers and then manually change the ACS flags with set_pci or
something, and these are not good instructions since it defeats the
VFIO group based security mechanisms.
> If we need per-device ACS config that depends on the workload, it
> seems kind of problematic to only be able to specify this at boot
> time. I guess we would need to reboot if we want to run a workload
> that needs a different config?
Basically. The main difference I'd see is if the server is a VM host
or running bare metal apps. You can get more efficicenty if you change
things for the bare metal case, and often bare metal will want to turn
the iommu off while a VM host often wants more of it turned on.
> Is this the iommu usage model we want in the long term?
There is some path to more dynamic behavior here, but it would require
separating groups into two components - devices that are together
because they are physically sharing translation (aliases and things)
from devices that are together because they share a security boundary
(ACS).
It is more believable we could dynamically change security group
assigments for VFIO than translation group assignment. I don't know
anyone interested in this right now - Alex and I have only talked
about it as a possibility a while back.
FWIW I don't view patch as excluding more dynamisism in the future,
but it is the best way to work with the current state of affairs, and
definitely better than set_pci instructions.
Thanks,
Jason
Powered by blists - more mailing lists