linux-kernel - Re: [PATCH net] net: gro: do not keep too many GRO packets in napi->rx

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20210205130238.5741-1-alobakin@pm.me>
Date:   Fri, 05 Feb 2021 13:03:19 +0000
From:   Alexander Lobakin <alobakin@...me>
To:     Eric Dumazet <edumazet@...gle.com>
Cc:     Alexander Lobakin <alobakin@...me>,
        Saeed Mahameed <saeed@...nel.org>,
        Eric Dumazet <eric.dumazet@...il.com>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        John Sperbeck <jsperbeck@...gle.com>,
        Jian Yang <jianyang@...gle.com>,
        Maxim Mikityanskiy <maximmi@...lanox.com>,
        Saeed Mahameed <saeedm@...lanox.com>,
        Edward Cree <ecree@...arflare.com>, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH net] net: gro: do not keep too many GRO packets in napi->rx_list

From: Eric Dumazet <edumazet@...gle.com>
Date: Thu, 4 Feb 2021 23:44:17 +0100

> On Thu, Feb 4, 2021 at 11:14 PM Saeed Mahameed <saeed@...nel.org> wrote:
> >
> > On Thu, 2021-02-04 at 13:31 -0800, Eric Dumazet wrote:
> > > From: Eric Dumazet <edumazet@...gle.com>
> > >
> > > Commit c80794323e82 ("net: Fix packet reordering caused by GRO and
> > > listified RX cooperation") had the unfortunate effect of adding
> > > latencies in common workloads.
> > >
> > > Before the patch, GRO packets were immediately passed to
> > > upper stacks.
> > >
> > > After the patch, we can accumulate quite a lot of GRO
> > > packets (depdending on NAPI budget).
> > >
> >
> > Why napi budget ? looking at the code it seems to be more related to
> > MAX_GRO_SKBS * gro_normal_batch, since we are counting GRO SKBs as 1
>
>
> Simply because we call gro_normal_list() from napi_poll(),
>
> So we flush the napi rx_list every 64 packets under stress.(assuming
> NIC driver uses NAPI_POLL_WEIGHT),
> or more often if napi_complete_done() is called if the budget was not exhausted.

Saeed,

Eric means that if we have e.g. 8 GRO packets with 8 segs each, then
rx_list will be flushed only after processing of 64 ingress frames.

> GRO always has been able to keep MAX_GRO_SKBS in its layer, but no recent patch
> has changed this part.
>
>
> >
> >
> > but maybe i am missing some information about the actual issue you are
> > hitting.
>
>
> Well, the issue is precisely described in the changelog.
>
> >
> >
> > > My fix is counting in napi->rx_count number of segments
> > > instead of number of logical packets.
> > >
> > > Fixes: c80794323e82 ("net: Fix packet reordering caused by GRO and
> > > listified RX cooperation")
> > > Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> > > Bisected-by: John Sperbeck <jsperbeck@...gle.com>
> > > Tested-by: Jian Yang <jianyang@...gle.com>
> > > Cc: Maxim Mikityanskiy <maximmi@...lanox.com>
> > > Cc: Alexander Lobakin <alobakin@...nk.ru>

It's strange why mailmap didn't pick up my active email at pm.me.

Anyways, this fix is correct for me. It restores the original Edward's
logics, but without spurious out-of-order deliveries.
Moreover, the pre-patch behaviour can easily be achieved by increasing
net.core.gro_normal_batch if needed.

Thanks!

Reviewed-by: Alexander Lobakin <alobakin@...me>

> > > Cc: Saeed Mahameed <saeedm@...lanox.com>
> > > Cc: Edward Cree <ecree@...arflare.com>
> > > ---
> > >  net/core/dev.c | 11 ++++++-----
> > >  1 file changed, 6 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > index
> > > a979b86dbacda9dfe31dd8b269024f7f0f5a8ef1..449b45b843d40ece7dd1e2ed6a5
> > > 996ee1db9f591 100644
> > > --- a/net/core/dev.c
> > > +++ b/net/core/dev.c
> > > @@ -5735,10 +5735,11 @@ static void gro_normal_list(struct
> > > napi_struct *napi)
> > >  /* Queue one GRO_NORMAL SKB up for list processing. If batch size
> > > exceeded,
> > >   * pass the whole batch up to the stack.
> > >   */
> > > -static void gro_normal_one(struct napi_struct *napi, struct sk_buff
> > > *skb)
> > > +static void gro_normal_one(struct napi_struct *napi, struct sk_buff
> > > *skb, int segs)
> > >  {
> > >         list_add_tail(&skb->list, &napi->rx_list);
> > > -       if (++napi->rx_count >= gro_normal_batch)
> > > +       napi->rx_count += segs;
> > > +       if (napi->rx_count >= gro_normal_batch)
> > >                 gro_normal_list(napi);
> > >  }
> > >
> > > @@ -5777,7 +5778,7 @@ static int napi_gro_complete(struct napi_struct
> > > *napi, struct sk_buff *skb)
> > >         }
> > >
> > >  out:
> > > -       gro_normal_one(napi, skb);
> > > +       gro_normal_one(napi, skb, NAPI_GRO_CB(skb)->count);
> >
> > Seems correct to me,
> >
> > Reviewed-by: Saeed Mahameed <saeedm@...dia.com>

Al