[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <98efba2f85ac01ba988c62398d31bb77@codeaurora.org>
Date: Mon, 04 Apr 2016 18:38:37 -0600
From: subashab@...eaurora.org
To: Alexander Duyck <aduyck@...antis.com>
Cc: herbert@...dor.apana.org.au, tom@...bertland.com, jesse@...nel.org,
alexander.duyck@...il.com, edumazet@...gle.com,
netdev@...r.kernel.org, davem@...emloft.net,
netdev-owner@...r.kernel.org
Subject: Re: [net PATCH v2 2/2] ipv4/GRO: Make GRO conform to RFC 6864
On 2016-04-04 10:31, Alexander Duyck wrote:
> RFC 6864 states that the IPv4 ID field MUST NOT be used for purposes
> other
> than fragmentation and reassembly. Currently we are looking at this
> field
> as a way of identifying what frames can be aggregated and which cannot
> for
> GRO. While this is valid for frames that do not have DF set, it is
> invalid
> to do so if the bit is set.
>
> In addition we were generating IPv4 ID collisions when 2 or more flows
> were
> interleaved over the same tunnel. To prevent that we store the result
> of
> all IP ID checks via a "|=" instead of overwriting previous values.
>
> With this patch we support two different approaches for the IP ID
> field.
> The first is a non-incrementing IP ID with DF bit set. In such a case
> we
> simply won't write to the flush_id field in the GRO context block. The
> other option is the legacy option in which the IP ID must increment by
> 1
> for every packet we aggregate.
>
> In the case of the non-incrementing IP ID we will end up losing the
> data
> that the IP ID is fixed. However as per RFC 6864 we should be able to
> write any value into the IP ID when the DF bit is set so this should
> cause
> minimal harm.
>
> Signed-off-by: Alexander Duyck <aduyck@...antis.com>
> ---
>
> v2: Updated patch so that we now only support one of two options.
> Either
> the IP ID is fixed with DF bit set, or the IP ID is incrementing.
> That
> allows us to support the fixed ID case as occurs with IPv6 to IPv4
> header translation and what is likely already out there for some
> devices with tunnel headers.
>
> net/core/dev.c | 1 +
> net/ipv4/af_inet.c | 25 ++++++++++++++++++-------
> net/ipv6/ip6_offload.c | 3 ---
> 3 files changed, 19 insertions(+), 10 deletions(-)
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 77a71cd68535..3429632398a4 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4352,6 +4352,7 @@ static void gro_list_prepare(struct napi_struct
> *napi, struct sk_buff *skb)
> unsigned long diffs;
>
> NAPI_GRO_CB(p)->flush = 0;
> + NAPI_GRO_CB(p)->flush_id = 0;
>
> if (hash != skb_get_hash_raw(p)) {
> NAPI_GRO_CB(p)->same_flow = 0;
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index 9e481992dbae..33f6335448a2 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -1324,6 +1324,7 @@ static struct sk_buff **inet_gro_receive(struct
> sk_buff **head,
>
> for (p = *head; p; p = p->next) {
> struct iphdr *iph2;
> + u16 flush_id;
>
> if (!NAPI_GRO_CB(p)->same_flow)
> continue;
> @@ -1347,14 +1348,24 @@ static struct sk_buff
> **inet_gro_receive(struct sk_buff **head,
> (iph->tos ^ iph2->tos) |
> ((iph->frag_off ^ iph2->frag_off) & htons(IP_DF));
>
> - /* Save the IP ID check to be included later when we get to
> - * the transport layer so only the inner most IP ID is checked.
> - * This is because some GSO/TSO implementations do not
> - * correctly increment the IP ID for the outer hdrs.
> - */
> - NAPI_GRO_CB(p)->flush_id =
> - ((u16)(ntohs(iph2->id) + NAPI_GRO_CB(p)->count) ^ id);
> NAPI_GRO_CB(p)->flush |= flush;
> +
> + /* We must save the offset as it is possible to have multiple
> + * flows using the same protocol and address pairs so we
> + * need to wait until we can validate this is part of the
> + * same flow with a 5-tuple or better to avoid unnecessary
> + * collisions between flows. We can support one of two
> + * possible scenarios, either a fixed value with DF bit set
> + * or an incrementing value with DF either set or unset.
> + * In the case of a fixed value we will end up losing the
> + * data that the IP ID was a fixed value, however per RFC
> + * 6864 in such a case the actual value of the IP ID is
> + * meant to be ignored anyway.
> + */
> + flush_id = (u16)(id - ntohs(iph2->id));
> + if (flush_id || !(iph2->frag_off & htons(IP_DF)))
> + NAPI_GRO_CB(p)->flush_id |= flush_id ^
> + NAPI_GRO_CB(p)->count;
> }
>
> NAPI_GRO_CB(skb)->flush |= flush;
> diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
> index 82e9f3076028..9aa53f64dffd 100644
> --- a/net/ipv6/ip6_offload.c
> +++ b/net/ipv6/ip6_offload.c
> @@ -238,9 +238,6 @@ static struct sk_buff **ipv6_gro_receive(struct
> sk_buff **head,
> /* flush if Traffic Class fields are different */
> NAPI_GRO_CB(p)->flush |= !!(first_word & htonl(0x0FF00000));
> NAPI_GRO_CB(p)->flush |= flush;
> -
> - /* Clear flush_id, there's really no concept of ID in IPv6. */
> - NAPI_GRO_CB(p)->flush_id = 0;
> }
>
> NAPI_GRO_CB(skb)->flush |= flush;
I can see coalescing of packets which have IP ID=0 and DF=1 originating
from a IPv6 to IPv4 translator.
Tested-by: Subash Abhinov Kasiviswanathan <subashab@...eaurora.org>
Powered by blists - more mailing lists