netdev - Re: [PATCH net-next v3 1/5] net: Introduce NETIF_F_GRO

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKgT0Uejgm4HSsStBSqLH5_1rVTsi5JxGve99Kdqmd=hqrC4mw@mail.gmail.com>
Date:   Sun, 10 Dec 2017 09:02:33 -0800
From:   Alexander Duyck <alexander.duyck@...il.com>
To:     Michael Chan <michael.chan@...adcom.com>
Cc:     David Miller <davem@...emloft.net>,
        Netdev <netdev@...r.kernel.org>,
        Andrew Gospodarek <andrew.gospodarek@...adcom.com>,
        Ariel Elior <Ariel.Elior@...ium.com>,
        everest-linux-l2@...ium.com
Subject: Re: [PATCH net-next v3 1/5] net: Introduce NETIF_F_GRO_HW.

On Sat, Dec 9, 2017 at 10:40 PM, Michael Chan <michael.chan@...adcom.com> wrote:
> On Sat, Dec 9, 2017 at 2:04 PM, Alexander Duyck
> <alexander.duyck@...il.com> wrote:
>> On Sat, Dec 9, 2017 at 1:31 PM, Michael Chan <michael.chan@...adcom.com> wrote:
>>> On Sat, Dec 9, 2017 at 10:50 AM, Alexander Duyck
>>> <alexander.duyck@...il.com> wrote:
>>>> So I would disagree with it being a subset of NETIF_F_GRO. If anything
>>>> it is an alternative to NETIF_F_GRO. It is performing GRO much earlier
>>>> at the device level in the case of hardware drivers. My concern is
>>>> this is probably going to end up applying to things other than just
>>>> hardware drivers though. For example what is to prevent this from
>>>> being applied to something like a virtio/tap interface? It seems like
>>>> this should be something that would be easy to implement in software.
>>>
>>> If you do it in software, it's called NETIF_F_GRO.  We already have
>>> it.  The whole point of the new flag is that if the device has
>>> software GRO enabled, and if the device supports GRO_HW, then we can
>>> do a subset of GRO in hardware (hopefully faster).
>>
>> I can see what you are getting at. But GRO_HW with GRO stacked on top
>> of it won't necessarily be the fastest form of GRO. If you have a
>> GRO_HW implementation that is complete enough people may want to
>> disable Software GRO in order to avoid the extra overhead involved
>> with using it.
>
> It is possible that if you have incoming packets 1, 2, 3, 4, 5 for a
> TCP connection, HW_GRO can aggregate packets 1, 2, 3, but cannot
> aggregate packets 4 and 5 due to hardware resource limitation.
> Software GRO aggregates 4 and 5.  So it works well together.

Right. But in the case where 1, 2, 3, 4, and 5 were not aggregated by
hardware GRO because the frames could not be aggregated and then GRO
burns cycles coming to the same conclusion you have waste. Same thing
goes for if hardware GRO aggregates 1 through 5 and then SW GRO tries
to see if it can do more.

They are both doing the same thing, but what I see it as is two
passes, not something where they are working together. The hardware
GRO can rely on software GRO for a second pass, but it doesn't need
to. The fact that it doesn't need to tells me that it isn't a hard
requirement to have GRO in order to make use of software GRO.

>>>> I'm going to back off on my requirement for you to handle propagation
>>>> since after spending a couple hours working on it I did find it was
>>>> more complex then I originally thought it would be. With that said
>>>> however I would want to see this feature implemented in such a way
>>>> that we can deal with propagating the bits in the future if we need to
>>>> and that is what I am basing my comments on.
>>>
>>> Nothing stops anyone from propagating the flag.  Just add
>>> NETIF_F_GRO_HW to NETIF_F_UPPER_DISABLES and it will be propagated
>>> just like LRO.
>>
>> Yes, but the problem then is it doesn't solve the secondary issue of
>> no way to propagate down the desire to disable GRO as well. That is
>> why I am thinking that the new bit could be used to indicate that we
>> want GRO to be supported either in the driver or below it instead of
>> only in "hardware". We are much better off with a generic solution and
>> that is why I think it might be better to use more of a pipeline or
>> staged type definition for this. Basically with GRO it occurs in the
>> GRO logic just after the driver hands off the packet, while this new
>> bit indicates that GRO happens somewhere before then. If we use that
>> definition for this then it becomes usable to deal with things like
>> the stacked devices problem where the stacked devices normally have
>> the GRO flag disabled since we don't want to run GRO multiple times,
>> but as a result the stacked devices have no way of saying they don't
>> want GRO. If we tweak the definition of this bit it solves that
>> problem since it would allow for us disabling GRO, GRO_HW, and LRO on
>> any devices below a given device.
>
> I just don't follow your logic.  First of all, GRO on an upper device
> doesn't mean that we are doing GRO on the upper device.  The bonding
> driver cannot do GRO because it doesn't call napi_gro_receive().  GRO
> always happens on the lower device.  Propagation of GRO can only mean
> that if GRO is set on the upper device, GRO is propagated and allowed
> on lower devices.  Nothing stops you from doing that if you want to do
> that.

If my understanding of things is correct it can mean doing GRO on an
upper device if that device does any sort of decapsulation as a result
of something like vxlan, geneve, or either ipsec or macsec encryption
occurring on top of it. It would be a side effect of the gro_cells
logic.

Admittedly I am more familiar with the segmentation side of things
then the reassembly. So my understanding of this could be incorrect.

>>>> I still disagree with this bit. I think GRO is a pure software
>>>> offload, whereas GRO_HW can represent either a software offload of
>>>> some sort occurring in or before the driver, or in the hardware.
>>>> Basically the difference between the two as I view it is where the GRO
>>>> is occurring. I would like to keep that distinction and make use of
>>>> it. As I mentioned before in the case of bonding we currently have no
>>>> way to disable GRO on the lower devices partially because GRO is a
>>>> pure software feature and always happens at each device along the way.
>>>> The nice thing about this new bit is the assumption is that it is
>>>> pushing GRO to the lowest possible level and not triggering any side
>>>> effects like GRO currently does. I hope to use that logic with stacked
>>>> devices so that we could clear the bit and have it disable GRO,
>>>> GRO_HW, and LRO on all devices below the device that cleared it.
>>>>
>>>> I think this linking of GRO and GRO_HW is something that would be
>>>> better served by moving it into the driver if you are wanting to
>>>> maintain the behavior of how this was previously linked to GRO.
>>>
>>> If you insist, I can move this to the driver's ndo_fix_features().
>>> But I feel it is much better to enforce this dependency system wide.
>>> Once again, GRO_HW is hardware accelerated GRO and should depend on
>>> it.
>>
>> The question I would have is why? Where is the dependency? I don't see
>> it. It is GRO in one spot and/or GRO in the other. The two don't
>> interract directly and I don't believe you can do software GRO on a
>> frame that has already been coalesced in hardware,
>
> Right.  But hardware can do a series of frames and software can do a
> different series of frames that have not been aggregated.

Right, but you have yet to define how the hardware offload would be
dependent on the software offload. I would say it makes more sense to
make LRO dependent on GRO_HW then it does to make GRO_HW dependent on
GRO. If I turn off GSO it doesn't turn off TSO and I would argue there
is a much stronger link there since GSO is the fallback for when TSO
fails, whereas for GRO you don't even necessarily need to have a
fallback.

There is a hierarchy to all of these features. GRO is the software
stack doing reversible aggregation, GRO_HW is allowing the hardware to
perform reversible aggregation, and LRO is allowing the hardware to
perform lossy/non-reversible aggregation. In my mind the
differentiation is the pure software solution is done outside of the
driver/hardware that you directly control. It basically just happens.
For the GRO_HW and LRO it requires the hardware/driver to participate
in it. In addition LRO might produce some frames that look identical
to GRO_HW, but it might also produce some frames that aren't
completely reversible depending on the implementation.

>>> This is a logical feature dependency that Yuval Mintz suggested.  For
>>> GRO_HW to work, hardware must verify the checksum of a packet before
>>> the packet can be merged.
>>>
>>> So if the user does not want to do RXCSUM on this device for whatever
>>> reason, it logically means that he also doesn't want to do GRO_HW with
>>> implied RXCSUM performed on each packet that is merged.
>>>
>>> So I agree with Yuval that this dependency makes sense.
>>
>> Okay then, so if we are going to go that route we may want to be
>> complete on this and just disable GRO_HW and LRO if RXCSUM is not
>> enabled. We might also want to add a comment indicating that we don't
>> support anything that might mangle a packet at the driver level if
>> RXCSUM is not enabled. Comments explaining all this would be a good
>> thing just to keep someone from grabbing GRO and lumping it in at some
>> point in the future.
>>
>> I'm still working on trying to propagate the Rx checksum properly
>> since it should probably follow the same UPPER_DISABLES behavior as
>> LRO, but I will probably only have a few hours over the next week to
>> really work on any code and there end up being a number of stacked
>> drivers that have to be updated. I would be good with just flipping
>> this logic for now and if RXCSUM is not set, and GRO_HW (just noticed
>> the typo in your message) is set, then print your message and clear
>> the bit. I can probably come back later and add LRO once I get the
>> propagation bits worked out.
>
> Just fix the netdev_dbg() typo, right?  I don't understand what you
> mean by flipping the logic.  It's the same whether you check RXCSUM
> first or GRO_HW first.

It just saves me work later when I get the propagation problems solved
and have to add LRO to the list of features dropped if RXCSUM is not
enabled.

> May be you meant put the RXCSUM check in the outer if statement so
> that someone could add more inner checks?  OK, I think that's what you
> meant.

Yes that is what I meant.

>>
>> As far as patch 2 in the set it would probably be better to either
>> drop it and just accept it as an outstanding issue, or you could take
>> on the propagation problems with GRO_HW and RXCSUM since we really
>> need to get those solved in order for this functionality to fully
>> work.
>
> We need patch #2 otherwise generic GRO won't work on these 3 drivers.
> I don't think I fully understand your concerns about propagation.  To
> me propagation is just a usage model where an upper device will
> control the common features of lower devices.  It is more convenient
> to have propagation, but requires upper devices to be aware of all
> features that propagate (GRO, RXCSUM).  Without propagation, it is
> still fine.

I'm not sure if it is. It depends on how much XDP depends on the
frames being non-linear. As-is I am pretty sure this doesn't work for
the stacked case anyway since GRO was still enabled for lower devices
anyway. So you might look at just modifying patch 2 to not worry about
the stacked devices case since I think that was already broken with
GRO anyway.

Feel free to try taking on the propagation setup if you want. I'm
stuck in a number of meetings over the next week so I probably won't
be able to have any patches to try to address the issue until a couple
weeks from now.

- Alex