[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z5AUNVHzJPATVqAY@nvidia.com>
Date: Tue, 21 Jan 2025 13:40:05 -0800
From: Nicolin Chen <nicolinc@...dia.com>
To: Jason Gunthorpe <jgg@...dia.com>
CC: <kevin.tian@...el.com>, <corbet@....net>, <will@...nel.org>,
<joro@...tes.org>, <suravee.suthikulpanit@....com>, <robin.murphy@....com>,
<dwmw2@...radead.org>, <baolu.lu@...ux.intel.com>, <shuah@...nel.org>,
<linux-kernel@...r.kernel.org>, <iommu@...ts.linux.dev>,
<linux-arm-kernel@...ts.infradead.org>, <linux-kselftest@...r.kernel.org>,
<linux-doc@...r.kernel.org>, <eric.auger@...hat.com>,
<jean-philippe@...aro.org>, <mdf@...nel.org>, <mshavit@...gle.com>,
<shameerali.kolothum.thodi@...wei.com>, <smostafa@...gle.com>,
<ddutile@...hat.com>, <yi.l.liu@...el.com>, <patches@...ts.linux.dev>
Subject: Re: [PATCH v5 08/14] iommufd/viommu: Add iommufd_viommu_report_event
helper
On Tue, Jan 21, 2025 at 05:14:04PM -0400, Jason Gunthorpe wrote:
> On Tue, Jan 21, 2025 at 01:02:16PM -0800, Nicolin Chen wrote:
> > On Tue, Jan 21, 2025 at 04:09:24PM -0400, Jason Gunthorpe wrote:
> > > On Tue, Jan 21, 2025 at 11:55:16AM -0800, Nicolin Chen wrote:
> > > > > > > IOMMU_VEVENTQ_STATE_OVERFLOW with a 0 length event is seen if events
> > > > > > > have been lost and no subsequent events are present. It exists to
> > > > > > > ensure timely delivery of the overflow event to userspace. counter
> > > > > > > will be the sequence number of the next successful event.
> > > > > >
> > > > > > So userspace should first read the header to decide whether or not
> > > > > > to read a vEVENT. If header is overflows, it should skip the vEVENT
> > > > > > struct and read the next header?
> > > > >
> > > > > Yes, but there won't be a next header. overflow would always be the
> > > > > last thing in a read() response. If there is another event then
> > > > > overflow is indicated by non-monotonic count.
> > > >
> > > > I am not 100% sure why "overflow would always be the last thing
> > > > in a read() response". I thought that kernel should immediately
> > > > report an overflow to user space when the vEVENTQ is overflowed.
> > >
> > > As below, if you observe overflow then it was at the end of the kernel
> > > queue and there is no further events after it. So it should always end
> > > up last.
> > >
> > > Perhaps we could enforce this directly in the kernel's read by making
> > > it the only, first and last, response to read.
> >
> > Hmm, since the overflow node is the last node in the list, isn't
> > it already an enforcement it's the only/first/last node to read?
> > (Assuming we choose to delete the overflow node from the list if
> > new event can be inserted.)
>
> Since we don't hold the spinlock the whole time there is a race where
> we could pull the overflow off and then another entry could be pushed
> while we do the copy_to_user.
I see. I'll be careful around that. I can imagine that one tricky
thing can be to restore the overflow node back to the list when a
copy_to_user fails..
> > > > Yet, thinking about this once again: user space actually has its
> > > > own queue. There's probably no point in letting it know about an
> > > > overflow immediately when the kernel vEVENTQ overflows until its
> > > > own user queue overflows after it reads the entire vEVENTQ so it
> > > > can trigger a vHW event/irq to the VM?
> > >
> > > The kernel has no idea what userspace is doing, the kernel's job
> > > should be to ensure timely delivery of all events, if an event is lost
> > > it should ensure timely delivery of the lost event notification. There
> > > is little else it can do.
> >
> > Yet, "timely" means still having an entire-queue-size-long delay
> > since the overflow node is at the end of the queue, right?
>
> Yes, but also in this case the vIOMMU isn't experiancing an overflow
> so it doesn't need to know about it.
>
> The main point here is if we somehow loose an event the vIOMMU driver
> may need to do something to recover from the lost event. It doesn't
> become relavent until the lost event is present in the virtual HW
> queue.
Ack.
> There is also the minor detail of what happens if the hypervisor HW
> queue overflows - I don't know the answer here. It is security
> concerning since the VM can spam DMA errors at high rate. :|
In my view, the hypervisor queue is the vHW queue for the VM, so
it should act like a HW, which means it's up to the guest kernel
driver that handles the high rate DMA errors..
The entire thing is complicated since we have a double buffer: a
kernel queue and a hypervisor queue. I am not very sure if both
queues should have the same depth or perhaps the kernel one might
have a slightly bigger size to buffer the hypervisor one. If only
kernel could directly add events to the hypervisor queue and also
set the overflow flag in the hypervisor..
Thanks
Nicolin
Powered by blists - more mailing lists