netdev - Re: [PATCH net-next] sfc: reduce the number of requested xdp ev queues

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20201216094524.0c6e521c@carbon>
Date:   Wed, 16 Dec 2020 09:45:24 +0100
From:   Jesper Dangaard Brouer <brouer@...hat.com>
To:     Edward Cree <ecree.xilinx@...il.com>
Cc:     brouer@...hat.com, Ivan Babrou <ivan@...udflare.com>,
        netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
        bpf@...r.kernel.org, kernel-team@...udflare.com,
        Martin Habets <habetsm.xilinx@...il.com>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Jesper Dangaard Brouer <hawk@...nel.org>,
        John Fastabend <john.fastabend@...il.com>
Subject: Re: [PATCH net-next] sfc: reduce the number of requested xdp ev
 queues

On Tue, 15 Dec 2020 18:49:55 +0000
Edward Cree <ecree.xilinx@...il.com> wrote:

> On 15/12/2020 09:43, Jesper Dangaard Brouer wrote:
> > On Mon, 14 Dec 2020 17:29:06 -0800
> > Ivan Babrou <ivan@...udflare.com> wrote:
> >   
> >> Without this change the driver tries to allocate too many queues,
> >> breaching the number of available msi-x interrupts on machines
> >> with many logical cpus and default adapter settings:
> >>
> >> Insufficient resources for 12 XDP event queues (24 other channels, max 32)
> >>
> >> Which in turn triggers EINVAL on XDP processing:
> >>
> >> sfc 0000:86:00.0 ext0: XDP TX failed (-22)  
> > 
> > I have a similar QA report with XDP_REDIRECT:
> >   sfc 0000:05:00.0 ens1f0np0: XDP redirect failed (-22)
> > 
> > Here we are back to the issue we discussed with ixgbe, that NIC / msi-x
> > interrupts hardware resources are not enough on machines with many
> > logical cpus.
> > 
> > After this fix, what will happen if (cpu >= efx->xdp_tx_queue_count) ?  
>
> Same as happened before: the "failed -22".  But this fix will make that
>  less likely to happen, because it ties more TXQs to each EVQ, and it's
>  the EVQs that are in short supply.
>

So, what I hear is that this fix is just pampering over the real issue.

I suggest that you/we detect the situation, and have a code path that
will take a lock (per 16 packets bulk) and solve the issue.

If you care about maximum performance you can implement this via
changing the ndo_xdp_xmit pointer to the fallback function when needed,
to avoid having a to check for the fallback mode in the fast-path.

>
> (Strictly speaking, I believe the limitation is a software one, that
>  comes from the driver's channel structures having been designed a
>  decade ago when 32 cpus ought to be enough for anybody... AFAIR the
>  hardware is capable of giving us something like 1024 evqs if we ask
>  for them, it just might not have that many msi-x vectors for us.)
> Anyway, the patch looks correct, so
> Acked-by: Edward Cree <ecree.xilinx@...il.com>

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer