netdev - Re: [PATCH v2 bpf-next 00/14] xsk: stop NAPI Rx processing on full XSK RQ

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Tue, 23 Aug 2022 22:18:06 -0700
From:   John Fastabend <john.fastabend@...il.com>
To:     Maxim Mikityanskiy <maximmi@...dia.com>,
        "maciej.fijalkowski@...el.com" <maciej.fijalkowski@...el.com>
Cc:     "magnus.karlsson@...el.com" <magnus.karlsson@...el.com>,
        "alexandr.lobakin@...el.com" <alexandr.lobakin@...el.com>,
        "bjorn@...nel.org" <bjorn@...nel.org>,
        "daniel@...earbox.net" <daniel@...earbox.net>,
        "kuba@...nel.org" <kuba@...nel.org>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "bpf@...r.kernel.org" <bpf@...r.kernel.org>,
        Saeed Mahameed <saeedm@...dia.com>,
        "ast@...nel.org" <ast@...nel.org>
Subject: Re: [PATCH v2 bpf-next 00/14] xsk: stop NAPI Rx processing on full
 XSK RQ

Maxim Mikityanskiy wrote:
> On Tue, 2022-08-23 at 13:24 +0200, Maciej Fijalkowski wrote:
> > On Tue, Aug 23, 2022 at 09:49:43AM +0000, Maxim Mikityanskiy wrote:
> > > Anyone from Intel? Maciej, Björn, Magnus?
> > 
> > Hey Maxim,
> > 
> > how about keeping it simple and going with option 1? This behavior was
> > even proposed in the v1 submission of the patch set we're talking about.
> 
> Yeah, I know it was the behavior in v1. It was me who suggested not
> dropping that packet, and I didn't realize back then that it had this
> undesired side effect - sorry for that!

Just want to reiterate what was said originally, you'll definately confuse
our XDP programs if they ever saw the same pkt twice. It would confuse
metrics and any "tap" and so on.

> 
> Option 1 sounds good to me as the first remedy, we can start with that.
> 
> However, it's not perfect: when NAPI and the application are pinned to
> the same core, if the fill ring is bigger than the RX ring (which makes
> sense in case of multiple sockets on the same UMEM), the driver will
> constantly get into this condition, drop one packet, yield to
> userspace, the application will of course clean up the RX ring, but
> then the process will repeat.

Maybe dumb question haven't followed the entire thread or at least
don't recall it. Could you yield when you hit a high water mark at
some point before pkt drop?

> 
> That means, we'll always have a small percentage of packets dropped,
> which may trigger the congestion control algorithms on the other side,
> slowing down the TX to unacceptable speeds (because packet drops won't
> disappear after slowing down just a little).
> 
> Given the above, we may need a more complex solution for the long term.
> What do you think?
> 
> Also, if the application uses poll(), this whole logic (either v1 or
> v2) seems not needed, because poll() returns to the application when
> something becomes available in the RX ring, but I guess the reason for
> adding it was that fantastic 78% performance improvement mentioned in
> the cover letter?
>