netdev - RE: [PATCH V2 vfio 03/11] vfio: Introduce DMA logging uAPIs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <BN9PR11MB52766B673C70439A78E16B518C969@BN9PR11MB5276.namprd11.prod.outlook.com>
Date:   Thu, 28 Jul 2022 04:05:04 +0000
From:   "Tian, Kevin" <kevin.tian@...el.com>
To:     Jason Gunthorpe <jgg@...dia.com>,
        Alex Williamson <alex.williamson@...hat.com>
CC:     Yishai Hadas <yishaih@...dia.com>,
        "saeedm@...dia.com" <saeedm@...dia.com>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "kuba@...nel.org" <kuba@...nel.org>,
        "Martins, Joao" <joao.m.martins@...cle.com>,
        "leonro@...dia.com" <leonro@...dia.com>,
        "maorg@...dia.com" <maorg@...dia.com>,
        "cohuck@...hat.com" <cohuck@...hat.com>
Subject: RE: [PATCH V2 vfio 03/11] vfio: Introduce DMA logging uAPIs

> From: Jason Gunthorpe <jgg@...dia.com>
> Sent: Tuesday, July 26, 2022 11:05 PM
> 
> On Tue, Jul 26, 2022 at 08:03:20AM -0600, Alex Williamson wrote:
> 
> > I raised the same concern myself, the reason for having a limit is
> > clear, but focusing on a single use case and creating an arbitrary
> > "good enough" limit that isn't exposed to userspace makes this an
> > implementation detail that can subtly break userspace.  For instance,
> > what if userspace comes to expect the limit is 1000 and we decide to be
> > even more strict?  If only a few 10s of entries are used, why isn't 100
> > more than sufficient?
> 
> So lets use the number of elements that will fit in PAGE_SIZE as the
> guideline. It means the kernel can memdup the userspace array into a
> single kernel page of memory to process it, which seems reasonably
> future proof in that we won't need to make it lower. Thus we can
> promise we won't make it smaller.
> 
> However, remember, this isn't even the real device limit - this is
> just the limit that the core kernel code will accept to marshal the
> data to pass internally the driver.
> 
> I fully expect that the driver will still refuse ranges in certain
> configurations even if they can be marshaled.
> 
> This is primarily why I don't think it make sense to expose some
> internal limit that is not even the real "will the call succeed"
> parameters.
> 
> The API is specifically designed as 'try and fail' to allow the
> drivers flexibility it how they map the requested ranges to their
> internal operations.
> 
> > We change it, we break userspace.  OTOH, if we simply make use of
> > that reserved field to expose the limit, now we have a contract with
> > userspace and we can change our implementation because that detail
> > of the implementation is visible to userspace.  Thanks,
> 
> I think this is not correct, just because we made it discoverable does
> not absolve the kernel of compatibility. If we change the limit, eg to
> 1, and a real userspace stops working then we still broke userspace.

iiuc Alex's suggestion doesn't conflict with the 'try and fail' model.
By using the reserved field of vfio_device_feature_dma_logging_control
to return the limit of the specified page_size from a given tracker, 
the user can quickly retry and adapt to that limit if workable.

Otherwise what would be an efficient policy for user to retry after
a failure? Say initially user requests 100 ranges with 4K page size
but the tracker can only support 10 ranges. w/o a hint returned
from the tracker then the user just blindly try 100, 90, 80, ... or 
using a bisect algorithm?

> 
> Complaining that userspace does not check the discoverable limit
> doesn't help matters - I seem to remember Linus has written about this
> in recent times even.
> 
> So, it is ultimately not different from 'try and fail', unless we
> implement some algorithm in qemu - an algorithm that would duplicate
> the one we already have in the kernel :\
> 
> Jason