[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <VI1PR0402MB387142F1D317717D7382DB3CE0B40@VI1PR0402MB3871.eurprd04.prod.outlook.com>
Date: Fri, 22 May 2020 13:58:41 +0000
From: Ioana Ciornei <ioana.ciornei@....com>
To: Jakub Kicinski <kuba@...nel.org>
CC: "davem@...emloft.net" <davem@...emloft.net>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: RE: [PATCH v2 net-next 0/7] dpaa2-eth: add support for Rx traffic
classes
> Subject: Re: [PATCH v2 net-next 0/7] dpaa2-eth: add support for Rx traffic
> classes
>
> On Wed, 20 May 2020 20:24:43 +0000 Ioana Ciornei wrote:
> > > Subject: Re: [PATCH v2 net-next 0/7] dpaa2-eth: add support for Rx
> > > traffic classes
> > >
> > > On Wed, 20 May 2020 15:10:42 +0000 Ioana Ciornei wrote:
> > > > DPAA2 has frame queues per each Rx traffic class and the decision
> > > > from which queue to pull frames from is made by the HW based on
> > > > the queue priority within a channel (there is one channel per each CPU).
> > >
> > > IOW you're reading the descriptor for the device memory/iomem
> > > address and the HW will return the next descriptor based on configured
> priority?
> >
> > That's the general idea but the decision is not made on a frame by
> > frame bases but rather on a dequeue operation which can, at a maximum,
> > return
> > 16 frame descriptors at a time.
>
> I see!
>
> > > Presumably strict priority?
> >
> > Only the two highest traffic classes are in strict priority, while the
> > other 6 TCs form two priority tiers - medium(4 TCs) and low (last two TCs).
> >
> > > > If this should be modeled in software, then I assume there should
> > > > be a NAPI instance for each traffic class and the stack should
> > > > know in which order to call the poll() callbacks so that the priority is
> respected.
> > >
> > > Right, something like that. But IMHO not needed if HW can serve the
> > > right descriptor upon poll.
> >
> > After thinking this through I don't actually believe that multiple
> > NAPI instances would solve this in any circumstance at all:
> >
> > - If you have hardware prioritization with full scheduling on dequeue
> > then job on the driver side is already done.
> > - If you only have hardware assist for prioritization (ie hardware
> > gives you multiple rings but doesn't tell you from which one to
> > dequeue) then you can still use a single NAPI instance just fine and pick the
> highest priority non-empty ring on-the-fly basically.
> >
> > What I am having trouble understanding is how the fully software
> > implementation of this possible new Rx qdisc should work. Somehow the
> > skb->priority should be taken into account when the skb is passing
> > though the stack (ie a higher priority skb should surpass another
> > previously received skb even if the latter one was received first, but its priority
> queue is congested).
>
> I'd think the SW implementation would come down to which ring to service first.
> If there are multiple rings on the host NAPI can try to read from highest priority
> ring first and then move on to next prio.
> Not sure if there would be a use case for multiple NAPIs for busy polling or not.
>
> I was hoping we can solve this with the new ring config API (which is coming any
> day now, ehh) - in which I hope user space will be able to assign rings to NAPI
> instances, all we would have needed would be also controlling the querying
> order. But that doesn't really work for you, it seems, since the selection is
> offloaded to HW :S
>
Yes, I would need only the configuration of traffic classes and their priorities and
not the software prioritization.
I'll keep a close eye on the mailing list to see what the new ring config API
that you're referring to is adding.
> > I don't have a very deep understanding of the stack but I am thinking
> > that the
> > enqueue_to_backlog()/process_backlog() area could be a candidate place
> > for sorting out bottlenecks. In case we do that I don't see why a
> > qdisc would be necessary at all and not have everybody benefit from
> prioritization based on skb->priority.
>
> I think once the driver picks the frame up it should run with it to completion (+/-
> GRO). We have natural batching with NAPI processing.
> Every NAPI budget high priority rings get a chance to preempt lower ones.
Powered by blists - more mailing lists