lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKgT0Uc7cCu-5SwrB+4Mv6OM+V1jDwefOoG-Wr61C4F1g1BmFA@mail.gmail.com>
Date:	Thu, 23 Jun 2016 10:07:25 -0700
From:	Alexander Duyck <alexander.duyck@...il.com>
To:	Yuval Mintz <Yuval.Mintz@...gic.com>
Cc:	Eric Dumazet <eric.dumazet@...il.com>,
	Rick Jones <rick.jones2@....com>,
	Manish Chopra <manish.chopra@...gic.com>,
	David Miller <davem@...emloft.net>,
	netdev <netdev@...r.kernel.org>,
	Ariel Elior <Ariel.Elior@...gic.com>,
	Tom Herbert <tom@...bertland.com>,
	Hannes Frederic Sowa <hannes@...hat.com>
Subject: Re: [PATCH net-next 0/5] qed/qede: Tunnel hardware GRO support

On Wed, Jun 22, 2016 at 9:17 PM, Yuval Mintz <Yuval.Mintz@...gic.com> wrote:
>> Then again, if you're basically saying every HW-assisted offload on
>> receive should be done under LRO flag, what would be the use case
>> where a GRO-assisted offload would help?
>
>> I.e., afaik LRO is superior to GRO in `brute force' -
>> it creates better packed packets and utilizes memory better
>> [with all the obvious cons such as inability for defragmentation].
>> So if you'd have the choice of having an adpater perform 'classic'
>> LRO aggregation or something that resembles a GRO packet,
>> what would be the gain from doing the latter?

LRO and GRO shouldn't really differ in packing or anything like that.
The big difference between the two is that LRO is destructive while
GRO is not.  Specifically in the case of GRO you should be able to
take the resultant frame, feed it through GSO, and get the original
stream of frames back out.  So you can pack the frames however you
want the only key is that you must capture all the correct offsets and
set the gso_size correct for the flow.

> Just to relate to bnx2x/qede differences in current implementation -
> when this GRO hw-offload was added to bnx2x, it has already
> supported classical LRO, and due to above statement whenever LRO
> was set driver aggregated incoming traffic as classic LRO.
> I agree that in hindsight the lack of distinction between sw/hw GRO
> was hurting us.

In the case of bnx2x it sounds like you have issues that are
significantly hurting the performance versus classic software GRO.  If
that is the case you might want to simply flip the logic for the
module parameter that Rick mentioned and just disable the hardware
assisted GRO unless it is specifically requested.

> qede isn't implementing LRO, so we could easily mark this feature
> under LRO there - but question is, given that the adapter can support
> LRO, if we're going to suffer from all the shotrages that arise from
> putting this feature under LRO, why should we bother?

The idea is to address feature isolation.  The fact is the hardware
exists outside of kernel control.  If you end up linking an internal
kernel feature to your device like this you are essentially stripping
the option of using the kernel feature.

I would prefer to see us extend LRO to support "close enough GRO"
instead of have us extend GRO to also include LRO.  That way when we
encounter issues like the FW limitation that Rick encountered he can
just go in and disable the LRO and have true GRO kick in which would
be significantly better than having to poke around through
documentation to find a module parameter that can force the feature
off.  Really the fact that you have to use a module parameter is
frowned upon as well as most drivers aren't supposed to be using those
in the netdev tree.

> You can argue that we might need a new feature bit for control
> over such a feature; If we don't do that, is there any gain in all of this?

I would argue that yes there are many cases where we will be able to
show gain.  The fact is there is a strong likelihood of the GRO on
your parts having some differences either now, or at some point in the
future as the code evolves.  As I mentioned there was already some
talk about possibly needing to push the UDP tunnel aggregation out of
GRO and perhaps handling it sometime after IP look up had verified
that the destination was in fact a local address in the namespace.  In
addition it makes the changes to include the tunnel encapsulation much
more acceptable as LRO is already naturally dropped in the routing and
bridging cases if I recall correctly.

- Alex

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ