lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YJ6KHznWTKvKQZEl@kroah.com>
Date:   Fri, 14 May 2021 16:33:03 +0200
From:   Greg KH <gregkh@...uxfoundation.org>
To:     Dan Williams <dan.j.williams@...el.com>
Cc:     "Chen, Mike Ximing" <mike.ximing.chen@...el.com>,
        Netdev <netdev@...r.kernel.org>,
        David Miller <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Arnd Bergmann <arnd@...db.de>,
        Pierre-Louis Bossart <pierre-louis.bossart@...ux.intel.com>,
        "Brandeburg, Jesse" <jesse.brandeburg@...el.com>,
        KVM list <kvm@...r.kernel.org>,
        "Raj, Ashok" <ashok.raj@...el.com>
Subject: Re: [PATCH v10 00/20] dlb: introduce DLB device driver

On Wed, May 12, 2021 at 12:07:31PM -0700, Dan Williams wrote:
> [ add kvm@...r.kernel.org for VFIO discussion ]
> 
> 
> On Tue, Mar 16, 2021 at 2:01 AM Greg KH <gregkh@...uxfoundation.org> wrote:
> [..]
> > > Ioctl interface
> > > Kernel driver provides ioctl interface for user applications to setup and configure dlb domains, ports, queues, scheduling types, credits,
> > > sequence numbers, and links between ports and queues.  Applications also use the interface to start, stop and inquire the dlb operations.
> >
> > What applications use any of this?  What userspace implementation today
> > interacts with this?  Where is that code located?
> >
> > Too many TLAs here, I have even less of an understanding of what this
> > driver is supposed to be doing, and what this hardware is now than
> > before.
> >
> > And here I thought I understood hardware devices, and if I am confused,
> > I pity anyone else looking at this code...
> >
> > You all need to get some real documentation together to explain
> > everything here in terms that anyone can understand.  Without that, this
> > code is going nowhere.
> 
> Hi Greg,
> 
> So, for the last few weeks Mike and company have patiently waded
> through my questions and now I think we are at a point to work through
> the upstream driver architecture options and tradeoffs. You were not
> alone in struggling to understand what this device does because it is
> unlike any other accelerator Linux has ever considered. It shards /
> load balances a data stream for processing by CPU threads. This is
> typically a network appliance function / protocol, but could also be
> any other generic thread pool like the kernel's padata. It saves the
> CPU cycles spent load balancing work items and marshaling them through
> a thread pool pipeline. For example, in DPDK applications, DLB2 frees
> up entire cores that would otherwise be consumed with scheduling and
> work distribution. A separate proof-of-concept, using DLB2 to
> accelerate the kernel's "padata" thread pool for a crypto workload,
> demonstrated ~150% higher throughput with hardware employed to manage
> work distribution and result ordering. Yes, you need a sufficiently
> high touch / high throughput protocol before the software load
> balancing overhead coordinating CPU threads starts to dominate the
> performance, but there are some specific workloads willing to switch
> to this regime.
> 
> The primary consumer to date has been as a backend for the event
> handling in the userspace networking stack, DPDK. DLB2 has an existing
> polled-mode-userspace driver for that use case. So I said, "great,
> just add more features to that userspace driver and you're done". In
> fact there was DLB1 hardware that also had a polled-mode-userspace
> driver. So, the next question is "what's changed in DLB2 where a
> userspace driver is no longer suitable?". The new use case for DLB2 is
> new hardware support for a host driver to carve up device resources
> into smaller sets (vfio-mdevs) that can be assigned to guests (Intel
> calls this new hardware capability SIOV: Scalable IO Virtualization).
> 
> Hardware resource management is difficult to handle in userspace
> especially when bare-metal hardware events need to coordinate with
> guest-VM device instances. This includes a mailbox interface for the
> guest VM to negotiate resources with the host driver. Another more
> practical roadblock for a "DLB2 in userspace" proposal is the fact
> that it implements what are in-effect software-defined-interrupts to
> go beyond the scalability limits of PCI MSI-x (Intel calls this
> Interrupt Message Store: IMS). So even if hardware resource management
> was awkwardly plumbed into a userspace daemon there would still need
> to be kernel enabling for device-specific extensions to
> drivers/vfio/pci/vfio_pci_intrs.c for it to understand the IMS
> interrupts of DLB2 in addition to PCI MSI-x.
> 
> While that still might be solvable in userspace if you squint at it, I
> don't think Linux end users are served by pushing all of hardware
> resource management to userspace. VFIO is mostly built to pass entire
> PCI devices to guests, or in coordination with a kernel driver to
> describe a subset of the hardware to a virtual-device (vfio-mdev)
> interface. The rub here is that to date kernel drivers using VFIO to
> provision mdevs have some existing responsibilities to the core kernel
> like a network driver or DMA offload driver. The DLB2 driver offers no
> such service to the kernel for its primary role of accelerating a
> userspace data-plane. I am assuming here that  the padata
> proof-of-concept is interesting, but not a compelling reason to ship a
> driver compared to giving end users competent kernel-driven
> hardware-resource assignment for deploying DLB2 virtual instances into
> guest VMs.
> 
> My "just continue in userspace" suggestion has no answer for the IMS
> interrupt and reliable hardware resource management support
> requirements. If you're with me so far we can go deeper into the
> details, but in answer to your previous questions most of the TLAs
> were from the land of "SIOV" where the VFIO community should be
> brought in to review. The driver is mostly a configuration plane where
> the fast path data-plane is entirely in userspace. That configuration
> plane needs to manage hardware events and resourcing on behalf of
> guest VMs running on a partitioned subset of the device. There are
> worthwhile questions about whether some of the uapi can be refactored
> to common modules like uacce, but I think we need to get to a first
> order understanding on what DLB2 is and why the kernel has a role
> before diving into the uapi discussion.
> 
> Any clearer?

A bit, yes, thanks.

> So, in summary drivers/misc/ appears to be the first stop in the
> review since a host driver needs to be established to start the VFIO
> enabling campaign. With my community hat on, I think requiring
> standalone host drivers is healthier for Linux than broaching the
> subject of VFIO-only drivers. Even if, as in this case, the initial
> host driver is mostly implementing a capability that could be achieved
> with a userspace driver.

Ok, then how about a much "smaller" kernel driver for all of this, and a
whole lot of documentation to describe what is going on and what all of
the TLAs are.

thanks,

greg k-h

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ