[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BN9PR11MB52766B673C70439A78E16B518C969@BN9PR11MB5276.namprd11.prod.outlook.com>
Date: Thu, 28 Jul 2022 04:05:04 +0000
From: "Tian, Kevin" <kevin.tian@...el.com>
To: Jason Gunthorpe <jgg@...dia.com>,
Alex Williamson <alex.williamson@...hat.com>
CC: Yishai Hadas <yishaih@...dia.com>,
"saeedm@...dia.com" <saeedm@...dia.com>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"kuba@...nel.org" <kuba@...nel.org>,
"Martins, Joao" <joao.m.martins@...cle.com>,
"leonro@...dia.com" <leonro@...dia.com>,
"maorg@...dia.com" <maorg@...dia.com>,
"cohuck@...hat.com" <cohuck@...hat.com>
Subject: RE: [PATCH V2 vfio 03/11] vfio: Introduce DMA logging uAPIs
> From: Jason Gunthorpe <jgg@...dia.com>
> Sent: Tuesday, July 26, 2022 11:05 PM
>
> On Tue, Jul 26, 2022 at 08:03:20AM -0600, Alex Williamson wrote:
>
> > I raised the same concern myself, the reason for having a limit is
> > clear, but focusing on a single use case and creating an arbitrary
> > "good enough" limit that isn't exposed to userspace makes this an
> > implementation detail that can subtly break userspace. For instance,
> > what if userspace comes to expect the limit is 1000 and we decide to be
> > even more strict? If only a few 10s of entries are used, why isn't 100
> > more than sufficient?
>
> So lets use the number of elements that will fit in PAGE_SIZE as the
> guideline. It means the kernel can memdup the userspace array into a
> single kernel page of memory to process it, which seems reasonably
> future proof in that we won't need to make it lower. Thus we can
> promise we won't make it smaller.
>
> However, remember, this isn't even the real device limit - this is
> just the limit that the core kernel code will accept to marshal the
> data to pass internally the driver.
>
> I fully expect that the driver will still refuse ranges in certain
> configurations even if they can be marshaled.
>
> This is primarily why I don't think it make sense to expose some
> internal limit that is not even the real "will the call succeed"
> parameters.
>
> The API is specifically designed as 'try and fail' to allow the
> drivers flexibility it how they map the requested ranges to their
> internal operations.
>
> > We change it, we break userspace. OTOH, if we simply make use of
> > that reserved field to expose the limit, now we have a contract with
> > userspace and we can change our implementation because that detail
> > of the implementation is visible to userspace. Thanks,
>
> I think this is not correct, just because we made it discoverable does
> not absolve the kernel of compatibility. If we change the limit, eg to
> 1, and a real userspace stops working then we still broke userspace.
iiuc Alex's suggestion doesn't conflict with the 'try and fail' model.
By using the reserved field of vfio_device_feature_dma_logging_control
to return the limit of the specified page_size from a given tracker,
the user can quickly retry and adapt to that limit if workable.
Otherwise what would be an efficient policy for user to retry after
a failure? Say initially user requests 100 ranges with 4K page size
but the tracker can only support 10 ranges. w/o a hint returned
from the tracker then the user just blindly try 100, 90, 80, ... or
using a bisect algorithm?
>
> Complaining that userspace does not check the discoverable limit
> doesn't help matters - I seem to remember Linus has written about this
> in recent times even.
>
> So, it is ultimately not different from 'try and fail', unless we
> implement some algorithm in qemu - an algorithm that would duplicate
> the one we already have in the kernel :\
>
> Jason
Powered by blists - more mailing lists