[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CO2PR11MB00884BBC91DE36D3FA78796C972D0@CO2PR11MB0088.namprd11.prod.outlook.com>
Date: Thu, 23 Jun 2016 21:06:20 +0000
From: Yuval Mintz <Yuval.Mintz@...gic.com>
To: Alexander Duyck <alexander.duyck@...il.com>
CC: Eric Dumazet <eric.dumazet@...il.com>,
Rick Jones <rick.jones2@....com>,
Manish Chopra <manish.chopra@...gic.com>,
David Miller <davem@...emloft.net>,
netdev <netdev@...r.kernel.org>,
Ariel Elior <Ariel.Elior@...gic.com>,
"Tom Herbert" <tom@...bertland.com>,
Hannes Frederic Sowa <hannes@...hat.com>
Subject: Re: [PATCH net-next 0/5] qed/qede: Tunnel hardware GRO support
>>> Then again, if you're basically saying every HW-assisted offload on
>>> receive should be done under LRO flag, what would be the use case
>>> where a GRO-assisted offload would help?
>>> I.e., afaik LRO is superior to GRO in `brute force' -
>>> it creates better packed packets and utilizes memory better
>>> [with all the obvious cons such as inability for defragmentation].
>>> So if you'd have the choice of having an adpater perform 'classic'
>>> LRO aggregation or something that resembles a GRO packet,
>>> what would be the gain from doing the latter?
> LRO and GRO shouldn't really differ in packing or anything like that.
> The big difference between the two is that LRO is destructive while
> GRO is not. Specifically in the case of GRO you should be able to
> take the resultant frame, feed it through GSO, and get the original
> stream of frames back out. So you can pack the frames however you
> want the only key is that you must capture all the correct offsets and
> set the gso_size correct for the flow.
While the implementation might lack in things [such as issues with
future implementation], following your logic it is GRO - I.e., forwarding
scenarios work fine with HW assisted GRO.
>> Just to relate to bnx2x/qede differences in current implementation -
>> when this GRO hw-offload was added to bnx2x, it has already
>> supported classical LRO, and due to above statement whenever LRO
>> was set driver aggregated incoming traffic as classic LRO.
>> I agree that in hindsight the lack of distinction between sw/hw GRO
>> was hurting us.
> In the case of bnx2x it sounds like you have issues that are
> significantly hurting the performance versus classic software GRO. If
> that is the case you might want to simply flip the logic for the
> module parameter that Rick mentioned and just disable the hardware
> assisted GRO unless it is specifically requested.
A bit hard to flip; The module parameter also disables LRO support.
And given that module parameters is mostly a thing of the past, I
don't think we should strive fixing things through additional changes
in that area.
> > qede isn't implementing LRO, so we could easily mark this feature
> > under LRO there - but question is, given that the adapter can support
> > LRO, if we're going to suffer from all the shotrages that arise from
> > putting this feature under LRO, why should we bother?
> The idea is to address feature isolation. The fact is the hardware
> exists outside of kernel control. If you end up linking an internal
> kernel feature to your device like this you are essentially stripping
> the option of using the kernel feature.
> I would prefer to see us extend LRO to support "close enough GRO"
> instead of have us extend GRO to also include LRO.
Again - why? What's the benefit of HW doing LRO and trying to
control imitate GRO, if it's still carrying all the LRO baggage
[specifically, disabling it on forwarding] as opposed to simply
doing classic LRO?
> > You can argue that we might need a new feature bit for control
> > over such a feature; If we don't do that, is there any gain in all of this?
> I would argue that yes there are many cases where we will be able to
> show gain. The fact is there is a strong likelihood of the GRO on
> your parts having some differences either now, or at some point in the
> future as the code evolves. As I mentioned there was already some
> talk about possibly needing to push the UDP tunnel aggregation out of
> GRO and perhaps handling it sometime after IP look up had verified
> that the destination was in fact a local address in the namespace. In
> addition it makes the changes to include the tunnel encapsulation much
> more acceptable as LRO is already naturally dropped in the routing and
> bridging cases if I recall correctly.
I think it all boils down to the question of "do we actually want to have
HW-assisted GRO?". If we do [and not necessarily for the UDP-tunnel
scenario] then we need to have it distinct from LRO, otherwise there's
very little gain. If we believe tGRO should remain SW-only, then
I think the discussion is mott; We need to stop trying this, and offload
only LRO - in which case we can aggregate it in whichever 'destructive'
[correct] format we like, without trying to have it resemble GRO.
Powered by blists - more mailing lists