[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZoY01iLmzoV4fIPG@Asurada-Nvidia>
Date: Wed, 3 Jul 2024 22:36:22 -0700
From: Nicolin Chen <nicolinc@...dia.com>
To: Baolu Lu <baolu.lu@...ux.intel.com>
CC: Jason Gunthorpe <jgg@...pe.ca>, Kevin Tian <kevin.tian@...el.com>, "Joerg
Roedel" <joro@...tes.org>, Will Deacon <will@...nel.org>, Robin Murphy
<robin.murphy@....com>, Jean-Philippe Brucker <jean-philippe@...aro.org>, "Yi
Liu" <yi.l.liu@...el.com>, Jacob Pan <jacob.jun.pan@...ux.intel.com>, "Joel
Granados" <j.granados@...sung.com>, <iommu@...ts.linux.dev>,
<virtualization@...ts.linux-foundation.org>, <linux-kernel@...r.kernel.org>,
Jason Gunthorpe <jgg@...dia.com>
Subject: Re: [PATCH v8 06/10] iommufd: Add iommufd fault object
On Thu, Jul 04, 2024 at 10:59:45AM +0800, Baolu Lu wrote:
> > On Tue, Jul 02, 2024 at 02:34:40PM +0800, Lu Baolu wrote:
> >
> > > An iommufd fault object provides an interface for delivering I/O page
> > > faults to user space. These objects are created and destroyed by user
> > > space, and they can be associated with or dissociated from hardware page
> > > table objects during page table allocation or destruction.
> > >
> > > User space interacts with the fault object through a file interface. This
> > > interface offers a straightforward and efficient way for user space to
> > > handle page faults. It allows user space to read fault messages
> > > sequentially and respond to them by writing to the same file. The file
> > > interface supports reading messages in poll mode, so it's recommended that
> > > user space applications use io_uring to enhance read and write efficiency.
> > >
> > > A fault object can be associated with any iopf-capable iommufd_hw_pgtable
> > > during the pgtable's allocation. All I/O page faults triggered by devices
> > > when accessing the I/O addresses of an iommufd_hw_pgtable are routed
> > > through the fault object to user space. Similarly, user space's responses
> > > to these page faults are routed back to the iommu device driver through
> > > the same fault object.
> > There is a need for VIOMMU object to report HW fault to VMM. For
> > example, a HW-accelerated VCMDQ may encounter HW errors. HW will
> > raise an IRQ to the host kernel and the host kernel will forward
> > it to the guest. I think we can have a viommu->fault, similar to
> > the hwpt->fault introduced by this series. This viommu->fault can
> > also benefit nested IOMMU for reporting translation error.
> >
> > I learned that this hwpt->fault is exclusively for IOPF/PRI. And
> > Jason suggested me to add a different one for VIOMMU. Yet, after
> > taking a closer look, I found the fault object in this series is
> > seemingly quite generic at the uAPI level: its naming/structure,
> > and the way how it's allocated and passed to hwpt, despite being
> > highly correlated with IOPF in its fops code. So, I feel that we
> > might have a chance of reusing it for different fault types:
> >
> > +enum iommu_fault_type {
> > + IOMMU_FAULT_TYPE_HWPT_IOPF,
> > + IOMMU_FAULT_TYPE_VIOMMU_IRQ,
> > +};
> >
> > struct iommu_fault_alloc {
> > __u32 size;
> > __u32 flags;
> > + __u32 type; /* enum iommu_fault_type */
> > __u32 out_fault_id;
> > __u32 out_fault_fd;
> > };
> >
> > I understand that this is already v8. So, maybe we can, for now,
> > apply the small diff above with an IOMMU_FAULT_TYPE_HWPT_IOPF type
> > check in the ioctl handler. And a decoupling for the iopf fops in
> > the ioctl handler can come later in the viommu series:
> > switch (type) {
> > case IOMMU_FAULT_TYPE_HWPT_IOPF:
> > filep = anon_inode_getfile("[iommufd-pgfault]",
> > &iommufd_fault_fops_iopf);
> > case IOMMU_FAULT_TYPE_VIOMMU_IRQ:
> > filep = anon_inode_getfile("[iommufd-viommu-irq]",
> > &iommufd_fault_fops_viommu);
> > default:
> > return -EOPNOSUPP;
> > }
> >
> > Since you are the designer here, I think you have a better 10000
> > foot view -- maybe I am missing something here implying that the
> > fault object can't be really reused by viommu.
> >
> > Would you mind sharing some thoughts here?
>
> I think this is a choice between "two different objects" vs. "same
> object with different FD interfaces". If I understand it correctly, your
> proposal of unrecoverable fault delivery is not limited to vcmdq, but
> generic to all unrecoverable events that userspace should be aware of
> when the passed-through device is affected.
It's basically IRQ forwarding, not confined to unrecoverable
faults. For example, a VCMDQ used by the guest kernel would
raise an HW IRQ if the guest kernel issues an illegal command
to the HW Queue assigned to it. The host kernel will receive
the IRQ, so it needs a way to forward it to the VM for guest
kernel to recover the HW queue.
The way that we define the structure can follow what we have
for hwpt_alloc/invalidate uAPIs, i.e. driver data/event. And
such an event can carry unrecoverable translation faults too.
SMMU at least reports DMA translation faults using an eventQ
in its own native language.
> From a hardware architecture perspective, the interfaces for
> unrecoverable events don't always match the page faults. For example,
> VT-d architecture defines a PR queue for page faults, but uses a
> register set to report unrecoverable events. The 'reason', 'request id'
> and 'pasid' fields of the register set indicate what happened on the
> hardware. New unrecoverable events will not be reported until the
> previous one has been fetched.
Understood. I don't think we can share the majority pieces in
the fault.c. Just the "IOMMU_FAULT_QUEUE_ALLOC" ioctl itself
looks way too general to be limited to page-fault usage only.
So, I feel we can share, for example:
IOMMU_FAULT_QUEUE_ALLOC (type=hwpt_iopf) -> fault_id=1
IOMMU_HWPT_ALLOC (fault_id=1) -> hwpt_id=2
IOMMU_FAULT_QUEUE_ALLOC (type=viommu_irq) -> fault_id=3
IOMMU_VIOMMU_ALLOC (fault_id=2) -> viommu_id=4
The handler will direct to different fops as I drafted in my
previous mail.
> With the above being said, I have no strong opinions between these two
> choices. Jason and Kevin should have more insights.
Thanks. Jason is out of office this week, so hopefully Kevin
may shed some light. I personally feel that we don't need to
largely update this series until we add VIOMMU. Yet, it would
be convenient if we add a "type" in the uAPI with this series.
Thank you
Nic
Powered by blists - more mailing lists