netdev - Re: [PATCH net-next v3 1/5] net: Introduce NETIF_F_GRO

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKgT0Ue0hLC8PkJm2dpxvbS9hNsSEXxMFbyt0cu3LoZNYpi_ig@mail.gmail.com>
Date:   Sat, 9 Dec 2017 10:50:53 -0800
From:   Alexander Duyck <alexander.duyck@...il.com>
To:     Michael Chan <michael.chan@...adcom.com>
Cc:     David Miller <davem@...emloft.net>,
        Netdev <netdev@...r.kernel.org>,
        Andrew Gospodarek <andrew.gospodarek@...adcom.com>,
        Ariel Elior <Ariel.Elior@...ium.com>,
        everest-linux-l2@...ium.com
Subject: Re: [PATCH net-next v3 1/5] net: Introduce NETIF_F_GRO_HW.

On Fri, Dec 8, 2017 at 10:27 PM, Michael Chan <michael.chan@...adcom.com> wrote:
> Introduce NETIF_F_GRO_HW feature flag for NICs that support hardware
> GRO.  With this flag, we can now independently turn on or off hardware
> GRO when GRO is on.  Previously, drivers were using NETIF_F_GRO to
> control hardware GRO and so it cannot be independently turned on or
> off without affecting GRO.
>
> Hardware GRO (just like GRO) guarantees that packets can be re-segmented
> by TSO/GSO to reconstruct the original packet stream.  It is a subset of
> NETIF_F_GRO and depends on it, as well as NETIF_F_RXCSUM.

So I would disagree with it being a subset of NETIF_F_GRO. If anything
it is an alternative to NETIF_F_GRO. It is performing GRO much earlier
at the device level in the case of hardware drivers. My concern is
this is probably going to end up applying to things other than just
hardware drivers though. For example what is to prevent this from
being applied to something like a virtio/tap interface? It seems like
this should be something that would be easy to implement in software.
In addition as I said in my earlier comments I think we should
probably look at using this new feature bit to indicate that we allow
GRO to occur at or below this device as opposed to just above it as
currently occurs with conventional GRO.

> Since NETIF_F_GRO is not propagated between upper and lower devices,
> NETIF_F_GRO_HW should follow suit since it is a subset of GRO.  In other
> words, a lower device can independent have GRO/GRO_HW enabled or disabled
> and no feature propagation is required.  This will preserve the current
> GRO behavior.

I'm going to back off on my requirement for you to handle propagation
since after spending a couple hours working on it I did find it was
more complex then I originally thought it would be. With that said
however I would want to see this feature implemented in such a way
that we can deal with propagating the bits in the future if we need to
and that is what I am basing my comments on. My concern is when this
ends up breaking we need to have a way to fix it and I don't want that
fix to end up being having to disable GRO across the board.

> Cc: Ariel Elior <Ariel.Elior@...ium.com>
> Cc: everest-linux-l2@...ium.com
> Signed-off-by: Michael Chan <michael.chan@...adcom.com>



> ---
>  Documentation/networking/netdev-features.txt |  8 ++++++++
>  include/linux/netdev_features.h              |  3 +++
>  net/core/dev.c                               | 12 ++++++++++++
>  net/core/ethtool.c                           |  1 +
>  4 files changed, 24 insertions(+)
>
> diff --git a/Documentation/networking/netdev-features.txt b/Documentation/networking/netdev-features.txt
> index 7413eb0..8f36527 100644
> --- a/Documentation/networking/netdev-features.txt
> +++ b/Documentation/networking/netdev-features.txt
> @@ -163,3 +163,11 @@ This requests that the NIC receive all possible frames, including errored
>  frames (such as bad FCS, etc).  This can be helpful when sniffing a link with
>  bad packets on it.  Some NICs may receive more packets if also put into normal
>  PROMISC mode.
> +
> +*  rx-gro-hw
> +
> +This requests that the NIC enables Hardware GRO (generic receive offload).
> +Hardware GRO is basically the exact reverse of TSO, and is generally
> +stricter than Hardware LRO.  A packet stream merged by Hardware GRO must
> +be re-segmentable by GSO or TSO back to the exact original packet stream.
> +Hardware GRO is dependent on GRO and RXCSUM.
> diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
> index b1b0ca7..db84c51 100644
> --- a/include/linux/netdev_features.h
> +++ b/include/linux/netdev_features.h
> @@ -78,6 +78,8 @@ enum {
>         NETIF_F_HW_ESP_TX_CSUM_BIT,     /* ESP with TX checksum offload */
>         NETIF_F_RX_UDP_TUNNEL_PORT_BIT, /* Offload of RX port for UDP tunnels */
>
> +       NETIF_F_GRO_HW_BIT,             /* Hardware Generic receive offload */
> +
>         /*
>          * Add your fresh new feature above and remember to update
>          * netdev_features_strings[] in net/core/ethtool.c and maybe
> @@ -97,6 +99,7 @@ enum {
>  #define NETIF_F_FRAGLIST       __NETIF_F(FRAGLIST)
>  #define NETIF_F_FSO            __NETIF_F(FSO)
>  #define NETIF_F_GRO            __NETIF_F(GRO)
> +#define NETIF_F_GRO_HW         __NETIF_F(GRO_HW)
>  #define NETIF_F_GSO            __NETIF_F(GSO)
>  #define NETIF_F_GSO_ROBUST     __NETIF_F(GSO_ROBUST)
>  #define NETIF_F_HIGHDMA                __NETIF_F(HIGHDMA)
> diff --git a/net/core/dev.c b/net/core/dev.c
> index e32cf5c..6ebd0e7 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -7424,6 +7424,18 @@ static netdev_features_t netdev_fix_features(struct net_device *dev,
>                 features &= ~dev->gso_partial_features;
>         }
>
> +       if (features & NETIF_F_GRO_HW) {
> +               /* Hardware GRO depends on GRO and RXCSUM. */
> +               if (!(features & NETIF_F_GRO)) {
> +                       netdev_dbg(dev, "Dropping NETIF_F_GSO_HW since no GRO feature.\n");
> +                       features &= ~NETIF_F_GRO_HW;
> +               }

I still disagree with this bit. I think GRO is a pure software
offload, whereas GRO_HW can represent either a software offload of
some sort occurring in or before the driver, or in the hardware.
Basically the difference between the two as I view it is where the GRO
is occurring. I would like to keep that distinction and make use of
it. As I mentioned before in the case of bonding we currently have no
way to disable GRO on the lower devices partially because GRO is a
pure software feature and always happens at each device along the way.
The nice thing about this new bit is the assumption is that it is
pushing GRO to the lowest possible level and not triggering any side
effects like GRO currently does. I hope to use that logic with stacked
devices so that we could clear the bit and have it disable GRO,
GRO_HW, and LRO on all devices below the device that cleared it.

I think this linking of GRO and GRO_HW is something that would be
better served by moving it into the driver if you are wanting to
maintain the behavior of how this was previously linked to GRO. It
also makes it so that it is much easier to compare the performance for
GRO_HW against just a pure software GRO since you could then enable
them independently. Software GRO can come at a cost, and leaving it
enabled when you want to do it all in hardware is just adding a
penalty of sorts since I know for many of my routing tests I normally
disable GRO as it has a significant per-packet cost for small packet
workloads.

> +               if (!(features & NETIF_F_RXCSUM)) {
> +                       netdev_dbg(dev, "Dropping NETIF_F_GSO_HW since no RXCSUM feature.\n");
> +                       features &= ~NETIF_F_GRO_HW;
> +               }

So I was thinking about this. For LRO it makes sense to disable it in
the case of RXCSUM being disabled since most implementations leave the
Rx checksum mangled. However for GRO I am not sure it makes complete
sense. For software GRO we perform checksum validation in either
tcp4_gro_receive or tcp6_gro_receive. Why should the hardware
implementation behave differently? When a GRO frame is assembled the
checksum is converted to CHECKSUM_PARTIAL anyway even if Rx checksum
validation is disabled for the driver.

I think this may be a hardware/driver specific implementation detail
and may not be generic enough to belong here. Regular GRO works
without RXCSUM, so why should we make an exception for the hardware
based version? I know in the case of the Intel NICs we don't ever
actually disable the checksum validation, we just don't report the
result we were given from the hardware and hand all the frames up the
stack. If we were implementing something like this we could still
support GRO in the hardware without reporting Rx check-sums otherwise.

The alternative way to look at it is that we shouldn't support any
form of packet mangling at he driver level if RXCSUM is disabled. In
which case, I would say we should probably frame it that way and
disable both LRO and GRO_HW if RXCSUM is enabled because this is
another case where this looks more like LRO than GRO.

> +       }
> +
>         return features;
>  }
>
> diff --git a/net/core/ethtool.c b/net/core/ethtool.c
> index f8fcf45..50a7920 100644
> --- a/net/core/ethtool.c
> +++ b/net/core/ethtool.c
> @@ -73,6 +73,7 @@ int ethtool_op_get_ts_info(struct net_device *dev, struct ethtool_ts_info *info)
>         [NETIF_F_LLTX_BIT] =             "tx-lockless",
>         [NETIF_F_NETNS_LOCAL_BIT] =      "netns-local",
>         [NETIF_F_GRO_BIT] =              "rx-gro",
> +       [NETIF_F_GRO_HW_BIT] =           "rx-gro-hw",
>         [NETIF_F_LRO_BIT] =              "rx-lro",
>
>         [NETIF_F_TSO_BIT] =              "tx-tcp-segmentation",
> --
> 1.8.3.1
>