netdev - Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20211025082857.4baa4794.alex.williamson@redhat.com>
Date:   Mon, 25 Oct 2021 08:28:57 -0600
From:   Alex Williamson <alex.williamson@...hat.com>
To:     Jason Gunthorpe <jgg@...dia.com>
Cc:     Cornelia Huck <cohuck@...hat.com>,
        Yishai Hadas <yishaih@...dia.com>, bhelgaas@...gle.com,
        saeedm@...dia.com, linux-pci@...r.kernel.org, kvm@...r.kernel.org,
        netdev@...r.kernel.org, kuba@...nel.org, leonro@...dia.com,
        kwankhede@...dia.com, mgurtovoy@...dia.com, maorg@...dia.com,
        "Dr. David Alan Gilbert" <dgilbert@...hat.com>
Subject: Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver
 for mlx5 devices

On Mon, 25 Oct 2021 09:29:38 -0300
Jason Gunthorpe <jgg@...dia.com> wrote:

> On Thu, Oct 21, 2021 at 03:47:29PM -0600, Alex Williamson wrote:
> > I recall that we previously suggested a very strict interpretation of
> > clearing the _RUNNING bit, but again I'm questioning if that's a real
> > requirement or simply a nice-to-have feature for some undefined
> > debugging capability.  In raising the p2p DMA issue, we can see that a
> > hard stop independent of other devices is not really practical but I
> > also don't see that introducing a new state bit solves this problem any
> > more elegantly than proposed here.  Thanks,  
> 
> I still disagree with this - the level of 'frozenness' of a device is
> something that belongs in the defined state exposed to userspace, not
> as a hidden internal state that userspace can't see.
> 
> It makes the state transitions asymmetric between suspend/resume as
> resume does have a defined uAPI state for each level of frozeness and
> suspend does not.
> 
> With the extra bit resume does:
>   
>   0000, 0100, 1000, 0001
> 
> And suspend does:
> 
>   0001, 1001, 0010, 0000
> 
> However, without the extra bit suspend is only
>   
>   001,  010, 000
> 
> With hidden state inside the 010

And what is the device supposed to do if it receives a DMA while in
this strictly defined stopped state?  If it generates an unsupported
request, that can trigger a fatal platform error.  If it silently drops
the DMA, then we have data loss.  We're defining a catch-22 scenario
for drivers versus placing the onus on the user to quiesce the set of
devices in order to consider the migration status as valid.  Thanks,

Alex