[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a0f4d9d8-86da-41f1-848d-32e53c092b34@intel.com>
Date: Wed, 4 Dec 2024 17:42:50 +0100
From: Alexander Lobakin <aleksander.lobakin@...el.com>
To: Jakub Kicinski <kuba@...nel.org>, Daniel Xu <dxu@...uu.xyz>
CC: Lorenzo Bianconi <lorenzo.bianconi@...hat.com>, Lorenzo Bianconi
<lorenzo@...nel.org>, "bpf@...r.kernel.org" <bpf@...r.kernel.org>, "Alexei
Starovoitov" <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>,
"Andrii Nakryiko" <andrii@...nel.org>, John Fastabend
<john.fastabend@...il.com>, Jesper Dangaard Brouer <hawk@...nel.org>, Martin
KaFai Lau <martin.lau@...ux.dev>, David Miller <davem@...emloft.net>, Eric
Dumazet <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>,
<netdev@...r.kernel.org>
Subject: Re: [RFC/RFT v2 0/3] Introduce GRO support to cpumap codebase
From: Jakub Kicinski <kuba@...nel.org>
Date: Tue, 3 Dec 2024 16:51:57 -0800
> On Tue, 3 Dec 2024 12:01:16 +0100 Alexander Lobakin wrote:
>>>> @ Jakub,
>>>
>>> Context? What doesn't work and why?
>>
>> My tests show the same perf as on Lorenzo's series, but I test with UDP
>> trafficgen. Daniel tests TCP and the results are much worse than with
>> Lorenzo's implementation.
>> I suspect this is related to that how NAPI performs flushes / decides
>> whether to repoll again or exit vs how kthread does that (even though I
>> also try to flush only every 64 frames or when the ring is empty). Or
>> maybe to that part of the kthread happens in process context outside any
>> softirq, while when using NAPI, the whole loop is inside RX softirq.
>>
>> Jesper said that he'd like to see cpumap still using own kthread, so
>> that its priority can be boosted separately from the backlog. That's why
>> we asked you whether it would be fine to have cpumap as threaded NAPI in
>> regards to all this :D
>
> Certainly not without a clear understanding what the problem with
> a kthread is.
Yes, sure thing.
Bad thing's that I can't reproduce Daniel's problem >_< Previously, I
was testing with the UDP trafficgen and got up to 80% improvement over
the baseline. Now I tested TCP and got up to 70% improvement, no
regressions whatsoever =\
I don't know where this regression on Daniel's setup comes from. Is it
multi-thread or single-thread test? What app do you use: iperf, netperf,
neper, Microsoft's app (forgot the name)? Do you have multiple NUMA
nodes on your system, are you sure you didn't cross the node when
redirecting with the GRO patches / no other NUMA mismatches happened?
Some other random stuff like RSS hash key, which affects flow steering?
Thanks,
Olek
Powered by blists - more mailing lists