[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPcyv4htddEBB9ePPSheH+rO+=VJULeHzx0gc384if7qXTUHHg@mail.gmail.com>
Date: Fri, 12 Mar 2021 17:39:22 -0800
From: Dan Williams <dan.j.williams@...el.com>
To: "Chen, Mike Ximing" <mike.ximing.chen@...el.com>
Cc: Greg KH <gregkh@...uxfoundation.org>,
Netdev <netdev@...r.kernel.org>,
David Miller <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
Arnd Bergmann <arnd@...db.de>,
Pierre-Louis Bossart <pierre-louis.bossart@...ux.intel.com>,
"Brandeburg, Jesse" <jesse.brandeburg@...el.com>
Subject: Re: [PATCH v10 00/20] dlb: introduce DLB device driver
On Fri, Mar 12, 2021 at 1:55 PM Chen, Mike Ximing
<mike.ximing.chen@...el.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Dan Williams <dan.j.williams@...el.com>
> > Sent: Friday, March 12, 2021 2:18 AM
> > To: Greg KH <gregkh@...uxfoundation.org>
> > Cc: Chen, Mike Ximing <mike.ximing.chen@...el.com>; Netdev <netdev@...r.kernel.org>; David Miller
> > <davem@...emloft.net>; Jakub Kicinski <kuba@...nel.org>; Arnd Bergmann <arnd@...db.de>; Pierre-
> > Louis Bossart <pierre-louis.bossart@...ux.intel.com>
> > Subject: Re: [PATCH v10 00/20] dlb: introduce DLB device driver
> >
> > On Wed, Mar 10, 2021 at 1:02 AM Greg KH <gregkh@...uxfoundation.org> wrote:
> > >
> > > On Wed, Feb 10, 2021 at 11:54:03AM -0600, Mike Ximing Chen wrote:
> > > > Intel DLB is an accelerator for the event-driven programming model of
> > > > DPDK's Event Device Library[2]. The library is used in packet processing
> > > > pipelines that arrange for multi-core scalability, dynamic load-balancing,
> > > > and variety of packet distribution and synchronization schemes
> > >
> > > The more that I look at this driver, the more I think this is a "run
> > > around" the networking stack. Why are you all adding kernel code to
> > > support DPDK which is an out-of-kernel networking stack? We can't
> > > support that at all.
> > >
> > > Why not just use the normal networking functionality instead of this
> > > custom char-device-node-monstrosity?
> >
> > Hey Greg,
> >
> > I've come to find out that this driver does not bypass kernel
> > networking, and the kernel functionality I thought it bypassed, IPC /
> > Scheduling, is not even in the picture in the non-accelerated case. So
> > given you and I are both confused by this submission that tells me
> > that the problem space needs to be clarified and assumptions need to
> > be enumerated.
> >
> > > What is missing from todays kernel networking code that requires this
> > > run-around?
> >
> > Yes, first and foremost Mike, what are the kernel infrastructure gaps
> > and pain points that led up to this proposal?
>
> Hi Greg/Dan,
>
> Sorry for the confusion. The cover letter and document did not articulate
> clearly the problem being solved by DLB. We will update the document in
> the next revision.
I'm not sure this answers Greg question about what is missing from
today's kernel implementation?
> In a brief description, Intel DLB is an accelerator that replaces shared memory
> queuing systems. Large modern server-class CPUs, with local caches
> for each core, tend to incur costly cache misses, cross core snoops
> and contentions. The impact becomes noticeable at high (messages/sec)
> rates, such as are seen in high throughput packet processing and HPC
> applications. DLB is used in high rate pipelines that require a variety of packet
> distribution & synchronization schemes. It can be leveraged to accelerate
> user space libraries, such as DPDK eventdev. It could show similar benefits in
> frameworks such as PADATA in the Kernel - if the messaging rate is sufficiently
> high.
Where is PADATA limited by distribution and synchronization overhead?
It's meant for parallelizable work that has minimal communication
between the work units, ordering is about it's only synchronization
overhead, not messaging. It's used for ipsec crypto and page init.
Even potential future bulk work usages that might benefit from PADATA
like like md-raid, ksm, or kcopyd do not have any messaging overhead.
> As can be seen in the following diagram, DLB operations come into the
> picture only after packets are received by Rx core from the networking
> devices. WCs are the worker cores which process packets distributed by DLB.
> (In case the diagram gets mis-formatted, please see attached file).
>
>
> WC1 WC4
> +-----+ +----+ +---+ / \ +---+ / \ +---+ +----+ +-----+
> |NIC | |Rx | |DLB| / \ |DLB| / \ |DLB| |Tx | |NIC |
> |Ports|---|Core|---| |-----WC2----| |-----WC5----| |---|Core|---|Ports|
> +-----+ -----+ +---+ \ / +---+ \ / +---+ +----+ ------+
> \ / \ /
> WC3 WC6
>
> At its heart DLB consists of resources than can be assigned to
> VDEVs/applications in a flexible manner, such as ports, queues, credits to use
> queues, sequence numbers, etc.
All of those objects are managed in userspace today in the unaccelerated case?
> We support up to 16/32 VF/VDEVs (depending
> on version) with SRIOV and SIOV. Role of the kernel driver includes VDEV
> Composition (vdcm module), functional level reset, live migration, error
> handling, power management, and etc..
Need some more specificity here. What about those features requires
the kernel to get involved with a DLB2 specific ABI to manage ports,
queues, credits, sequence numbers, etc...?
Powered by blists - more mailing lists